Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-09-06 Thread aaron morton
Sorry but you will need to provide details of a specific query or workload that 
goes slower in 1.0.11.

As I said tests have shown improvements in performance in every new release. If 
you are seeing a significant decrease in performance it may be a workload that 
has not being considered or a known edge case. Whatever the cause we would need 
more details to help you.  

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/09/2012, at 4:19 PM, Илья Шипицин chipits...@gmail.com wrote:

 all tests use similar data access patterns, so every test on 1.0.11 is slower 
 than 0.7.8
 recent micros confirms that.
 
 2012/9/5 aaron morton aa...@thelastpickle.com
 That's slower.
 
 the Recent* metrics are the best to look at. They recent each time you look 
 at them. So read them, then run the test, then read them again.  
 
 You'll need to narrow it down still. e.g. Is there a single test taking a 
 very long time or are all tests running slower ?  The Histogram stats can 
 help with that as they provide a spread of latencies. 
 
 Cheers
  
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 5/09/2012, at 12:27 AM, Илья Шипицин chipits...@gmail.com wrote:
 
 it was good idea to have a look at StorageProxy :-)
  
  
 1.0.10 Performance Tests
 StorageProxy
 
 RangeOperations: 546
 ReadOperations: 694563
 TotalHints: 0
 TotalRangeLatencyMicros: 4469484
 TotalReadLatencyMicros:245669679
 TotalWriteLatencyMicros: 57819722
 WriteOperations:208741
 
 
 0.7.10 Performance Tests
 StorageProxy
 
 RangeOperations: 520
 ReadOperations: 671476
 TotalRangeLatencyMicros: 2208902
 TotalReadLatencyMicros: 162186009
 TotalWriteLatencyMicros: 33911222
 WriteOperations: 204806
 
 
 2012/9/3 aaron morton aa...@thelastpickle.com
 The whole test run is taking longer ? So it could be slower queries or 
 slower test setup / tear down?
 
 If you are creating and truncate the KS for each of the 500 tests is that 
 taking longer ? (Schema code has changed a lot 0.7  1.0)
 Can you log the execution time for tests and find ones that are taking 
 longer ?
  
 There are full request metrics available on the StorageProxy JMX object. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/08/2012, at 4:45 PM, Илья Шипицин chipits...@gmail.com wrote:
 
 we are using functional tests ( ~500 tests in time).
 it is hard to tell which query is slower, it is slower in general.
  
 same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
 as we are talking about functional tests, so we recreate KS just before 
 tests are run.
  
 I do not know how to record queries (there are a lot of them), if you are 
 interested, I can set up a special stand for you.
 
 2012/8/31 aaron morton aa...@thelastpickle.com
 we are running somewhat queue-like with aggressive write-read patterns.
 We'll need some more details…
 
 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ? 
 How are you measuring that it's slow ? 
 Is there anything unusual in the log ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.
 
 
 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote:
 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but 
 I
 didn't find any.
 
 is there something like thrift-proxy or other query logging/scripting 
 engine
 ?
 
 2012/8/30 aaron morton aa...@thelastpickle.com
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 
 We've not had any reports of a performance drop off. All tests so far 
 have
 show improvements in both read and write performance.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 
 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead 
 in
 this case is minimal.
 
 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.
 
 Bloom filters are used to test if a row key may exist in an SSTable.
 
 any explanation ?
 
 If you can provide some more information on your use case we may be able
 to help.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 

Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-09-04 Thread Илья Шипицин
it was good idea to have a look at StorageProxy :-)


1.0.10 Performance Tests
StorageProxy

RangeOperations: 546
ReadOperations: 694563
TotalHints: 0
TotalRangeLatencyMicros: 4469484
TotalReadLatencyMicros:245669679
TotalWriteLatencyMicros: 57819722
WriteOperations:208741


0.7.10 Performance Tests
StorageProxy

RangeOperations: 520
ReadOperations: 671476
TotalRangeLatencyMicros: 2208902
TotalReadLatencyMicros: 162186009
TotalWriteLatencyMicros: 33911222
WriteOperations: 204806


2012/9/3 aaron morton aa...@thelastpickle.com

 The whole test run is taking longer ? So it could be slower queries or
 slower test setup / tear down?

 If you are creating and truncate the KS for each of the 500 tests is that
 taking longer ? (Schema code has changed a lot 0.7  1.0)
 Can you log the execution time for tests and find ones that are taking
 longer ?

 There are full request metrics available on the StorageProxy JMX object.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 4:45 PM, Илья Шипицин chipits...@gmail.com wrote:

 we are using functional tests ( ~500 tests in time).
 it is hard to tell which query is slower, it is slower in general.

 same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
 as we are talking about functional tests, so we recreate KS just before
 tests are run.

 I do not know how to record queries (there are a lot of them), if you are
 interested, I can set up a special stand for you.

 2012/8/31 aaron morton aa...@thelastpickle.com

 we are running somewhat queue-like with aggressive write-read patterns.

 We'll need some more details...

 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ?
 How are you measuring that it's slow ?
 Is there anything unusual in the log ?

 Cheers

  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.


 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com
 wrote:

 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but
 I
 didn't find any.

 is there something like thrift-proxy or other query logging/scripting
 engine
 ?

 2012/8/30 aaron morton aa...@thelastpickle.com


 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8

 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.

 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead
 in
 this case is minimal.

 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.

 Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

 If you can provide some more information on your use case we may be able
 to help.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)

 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.

 any explanation ?

 Ilya Shipitsin









Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-09-04 Thread aaron morton
That's slower.

the Recent* metrics are the best to look at. They recent each time you look at 
them. So read them, then run the test, then read them again.  

You'll need to narrow it down still. e.g. Is there a single test taking a very 
long time or are all tests running slower ?  The Histogram stats can help with 
that as they provide a spread of latencies. 

Cheers
 
-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 5/09/2012, at 12:27 AM, Илья Шипицин chipits...@gmail.com wrote:

 it was good idea to have a look at StorageProxy :-)
  
  
 1.0.10 Performance Tests
 StorageProxy
 
 RangeOperations: 546
 ReadOperations: 694563
 TotalHints: 0
 TotalRangeLatencyMicros: 4469484
 TotalReadLatencyMicros:245669679
 TotalWriteLatencyMicros: 57819722
 WriteOperations:208741
 
 
 0.7.10 Performance Tests
 StorageProxy
 
 RangeOperations: 520
 ReadOperations: 671476
 TotalRangeLatencyMicros: 2208902
 TotalReadLatencyMicros: 162186009
 TotalWriteLatencyMicros: 33911222
 WriteOperations: 204806
 
 
 2012/9/3 aaron morton aa...@thelastpickle.com
 The whole test run is taking longer ? So it could be slower queries or slower 
 test setup / tear down?
 
 If you are creating and truncate the KS for each of the 500 tests is that 
 taking longer ? (Schema code has changed a lot 0.7  1.0)
 Can you log the execution time for tests and find ones that are taking longer 
 ?
  
 There are full request metrics available on the StorageProxy JMX object. 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/08/2012, at 4:45 PM, Илья Шипицин chipits...@gmail.com wrote:
 
 we are using functional tests ( ~500 tests in time).
 it is hard to tell which query is slower, it is slower in general.
  
 same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
 as we are talking about functional tests, so we recreate KS just before 
 tests are run.
  
 I do not know how to record queries (there are a lot of them), if you are 
 interested, I can set up a special stand for you.
 
 2012/8/31 aaron morton aa...@thelastpickle.com
 we are running somewhat queue-like with aggressive write-read patterns.
 We'll need some more details…
 
 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ? 
 How are you measuring that it's slow ? 
 Is there anything unusual in the log ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.
 
 
 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote:
 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but I
 didn't find any.
 
 is there something like thrift-proxy or other query logging/scripting 
 engine
 ?
 
 2012/8/30 aaron morton aa...@thelastpickle.com
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 
 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 
 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead 
 in
 this case is minimal.
 
 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.
 
 Bloom filters are used to test if a row key may exist in an SSTable.
 
 any explanation ?
 
 If you can provide some more information on your use case we may be able
 to help.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)
 
 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.
 
 any explanation ?
 
 Ilya 

Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-09-04 Thread Илья Шипицин
all tests use similar data access patterns, so every test on 1.0.11 is
slower than 0.7.8
recent micros confirms that.

2012/9/5 aaron morton aa...@thelastpickle.com

 That's slower.

 the Recent* metrics are the best to look at. They recent each time you
 look at them. So read them, then run the test, then read them again.

 You'll need to narrow it down still. e.g. Is there a single test taking a
 very long time or are all tests running slower ?  The Histogram stats can
 help with that as they provide a spread of latencies.

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 5/09/2012, at 12:27 AM, Илья Шипицин chipits...@gmail.com wrote:

 it was good idea to have a look at StorageProxy :-)


 1.0.10 Performance Tests
 StorageProxy

 RangeOperations: 546
 ReadOperations: 694563
 TotalHints: 0
 TotalRangeLatencyMicros: 4469484
 TotalReadLatencyMicros:245669679
 TotalWriteLatencyMicros: 57819722
 WriteOperations:208741


 0.7.10 Performance Tests
 StorageProxy

 RangeOperations: 520
 ReadOperations: 671476
 TotalRangeLatencyMicros: 2208902
 TotalReadLatencyMicros: 162186009
 TotalWriteLatencyMicros: 33911222
 WriteOperations: 204806


 2012/9/3 aaron morton aa...@thelastpickle.com

 The whole test run is taking longer ? So it could be slower queries or
 slower test setup / tear down?

 If you are creating and truncate the KS for each of the 500 tests is that
 taking longer ? (Schema code has changed a lot 0.7  1.0)
 Can you log the execution time for tests and find ones that are taking
 longer ?

 There are full request metrics available on the StorageProxy JMX object.

 Cheers

  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 4:45 PM, Илья Шипицин chipits...@gmail.com wrote:

 we are using functional tests ( ~500 tests in time).
 it is hard to tell which query is slower, it is slower in general.

 same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
 as we are talking about functional tests, so we recreate KS just before
 tests are run.

 I do not know how to record queries (there are a lot of them), if you are
 interested, I can set up a special stand for you.

 2012/8/31 aaron morton aa...@thelastpickle.com

 we are running somewhat queue-like with aggressive write-read patterns.

 We'll need some more details...

 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ?
 How are you measuring that it's slow ?
 Is there anything unusual in the log ?

 Cheers

  -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com
 wrote:

 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.


 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com
 wrote:

 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation,
 but I
 didn't find any.

 is there something like thrift-proxy or other query logging/scripting
 engine
 ?

 2012/8/30 aaron morton aa...@thelastpickle.com


 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8

 We've not had any reports of a performance drop off. All tests so far
 have
 show improvements in both read and write performance.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.

 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead
 in
 this case is minimal.

 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.

 Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

 If you can provide some more information on your use case we may be able
 to help.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1
 extension
 (which are missing for Cassandra-0.7.8)

 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 

Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-09-02 Thread aaron morton
The whole test run is taking longer ? So it could be slower queries or slower 
test setup / tear down?

If you are creating and truncate the KS for each of the 500 tests is that 
taking longer ? (Schema code has changed a lot 0.7  1.0)
Can you log the execution time for tests and find ones that are taking longer ?
 
There are full request metrics available on the StorageProxy JMX object. 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/08/2012, at 4:45 PM, Илья Шипицин chipits...@gmail.com wrote:

 we are using functional tests ( ~500 tests in time).
 it is hard to tell which query is slower, it is slower in general.
  
 same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
 as we are talking about functional tests, so we recreate KS just before tests 
 are run.
  
 I do not know how to record queries (there are a lot of them), if you are 
 interested, I can set up a special stand for you.
 
 2012/8/31 aaron morton aa...@thelastpickle.com
 we are running somewhat queue-like with aggressive write-read patterns.
 We'll need some more details…
 
 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ? 
 How are you measuring that it's slow ? 
 Is there anything unusual in the log ? 
 
 Cheers
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:
 
 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.
 
 
 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote:
 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but I
 didn't find any.
 
 is there something like thrift-proxy or other query logging/scripting engine
 ?
 
 2012/8/30 aaron morton aa...@thelastpickle.com
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 
 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 
 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead in
 this case is minimal.
 
 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.
 
 Bloom filters are used to test if a row key may exist in an SSTable.
 
 any explanation ?
 
 If you can provide some more information on your use case we may be able
 to help.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)
 
 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.
 
 any explanation ?
 
 Ilya Shipitsin
 
 
 
 
 



Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-30 Thread Илья Шипицин
we are running somewhat queue-like with aggressive write-read patterns.
I was looking for scripting queries from live Cassandra installation, but I
didn't find any.

is there something like thrift-proxy or other query logging/scripting
engine ?

2012/8/30 aaron morton aa...@thelastpickle.com

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8

 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.

 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead in
 this case is minimal.

  there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.

 Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

 If you can provide some more information on your use case we may be able
 to help.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)

 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.

 any explanation ?

 Ilya Shipitsin





Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-30 Thread Edward Capriolo
If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
soon as possible. If you have large bloomfilters you can hit a bug
where the bloom filters will not work properly.


On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote:
 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but I
 didn't find any.

 is there something like thrift-proxy or other query logging/scripting engine
 ?

 2012/8/30 aaron morton aa...@thelastpickle.com

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8

 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.

 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead in
 this case is minimal.

  there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.

 Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

 If you can provide some more information on your use case we may be able
 to help.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)

 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.

 any explanation ?

 Ilya Shipitsin





Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-30 Thread aaron morton
 we are running somewhat queue-like with aggressive write-read patterns.
We'll need some more details…

How much data ?
How many machines ?
What is the machine spec ?
How many clients ?
Is there an example of a slow request ? 
How are you measuring that it's slow ? 
Is there anything unusual in the log ? 

Cheers

-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.
 
 
 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com wrote:
 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but I
 didn't find any.
 
 is there something like thrift-proxy or other query logging/scripting engine
 ?
 
 2012/8/30 aaron morton aa...@thelastpickle.com
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 
 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 
 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead in
 this case is minimal.
 
 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.
 
 Bloom filters are used to test if a row key may exist in an SSTable.
 
 any explanation ?
 
 If you can provide some more information on your use case we may be able
 to help.
 
 Cheers
 
 
 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com
 
 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:
 
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)
 
 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.
 
 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.
 
 any explanation ?
 
 Ilya Shipitsin
 
 
 



Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-30 Thread Илья Шипицин
we are using functional tests ( ~500 tests in time).
it is hard to tell which query is slower, it is slower in general.

same hardware. 1 node, 32Gb RAM, 8Gb heap. default cassandra settings.
as we are talking about functional tests, so we recreate KS just before
tests are run.

I do not know how to record queries (there are a lot of them), if you are
interested, I can set up a special stand for you.

2012/8/31 aaron morton aa...@thelastpickle.com

 we are running somewhat queue-like with aggressive write-read patterns.

 We'll need some more details...

 How much data ?
 How many machines ?
 What is the machine spec ?
 How many clients ?
 Is there an example of a slow request ?
 How are you measuring that it's slow ?
 Is there anything unusual in the log ?

 Cheers

 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 31/08/2012, at 3:30 AM, Edward Capriolo edlinuxg...@gmail.com wrote:

 If you move from 7.X to 0.8X or 1.0X you have to rebuild sstables as
 soon as possible. If you have large bloomfilters you can hit a bug
 where the bloom filters will not work properly.


 On Thu, Aug 30, 2012 at 9:44 AM, Илья Шипицин chipits...@gmail.com
 wrote:

 we are running somewhat queue-like with aggressive write-read patterns.
 I was looking for scripting queries from live Cassandra installation, but I
 didn't find any.

 is there something like thrift-proxy or other query logging/scripting
 engine
 ?

 2012/8/30 aaron morton aa...@thelastpickle.com


 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8

 We've not had any reports of a performance drop off. All tests so far have
 show improvements in both read and write performance.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.

 The sha1 is created so we can diagnose corruptions in the -Data component
 of the SSTables. They are not used to save network IO.
 It is calculated while streaming the Memtable to disk so has no impact on
 disk IO. While not the fasted algorithm I would assume it's CPU overhead in
 this case is minimal.

 there's already relatively small Bloom filter file, which can be used for
 saving network traffic instead of sha1 digest.

 Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

 If you can provide some more information on your use case we may be able
 to help.

 Cheers


 -
 Aaron Morton
 Freelance Developer
 @aaronmorton
 http://www.thelastpickle.com

 On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension
 (which are missing for Cassandra-0.7.8)

 in maybeWriteDigest() function I see no option fot switching sha1 digests
 off.

 I agree, such digests save some network IO, but they seem to be very bad
 in terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already
 relatively small Bloom filter file, which can be used for saving network
 traffic instead of sha1 digest.

 any explanation ?

 Ilya Shipitsin







performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-29 Thread Илья Шипицин
in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!)
times slower than cassandra-0.7.8
after some investigation carried out I noticed files with sha1 extension
(which are missing for Cassandra-0.7.8)

in maybeWriteDigest() function I see no option fot switching sha1 digests
off.

I agree, such digests save some network IO, but they seem to be very bad in
terms of CPU and disk IO.
why to use one more digest (which have to be calculated), there's already
relatively small Bloom filter file, which can be used for saving network
traffic instead of sha1 digest.

any explanation ?

Ilya Shipitsin


Re: performance is drastically degraded after 0.7.8 -- 1.0.11 upgrade

2012-08-29 Thread aaron morton
 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!) 
 times slower than cassandra-0.7.8
We've not had any reports of a performance drop off. All tests so far have show 
improvements in both read and write performance. 

 I agree, such digests save some network IO, but they seem to be very bad in 
 terms of CPU and disk IO.
The sha1 is created so we can diagnose corruptions in the -Data component of 
the SSTables. They are not used to save network IO.
It is calculated while streaming the Memtable to disk so has no impact on disk 
IO. While not the fasted algorithm I would assume it's CPU overhead in this 
case is minimal.

  there's already relatively small Bloom filter file, which can be used for 
 saving network traffic instead of sha1 digest.
Bloom filters are used to test if a row key may exist in an SSTable.

 any explanation ?

If you can provide some more information on your use case we may be able to 
help. 

Cheers


-
Aaron Morton
Freelance Developer
@aaronmorton
http://www.thelastpickle.com

On 30/08/2012, at 5:18 AM, Илья Шипицин chipits...@gmail.com wrote:

 in terms of our high-rate write load cassandra1.0.11 is about 3 (three!!) 
 times slower than cassandra-0.7.8
 after some investigation carried out I noticed files with sha1 extension 
 (which are missing for Cassandra-0.7.8)
 
 in maybeWriteDigest() function I see no option fot switching sha1 digests off.
 
 I agree, such digests save some network IO, but they seem to be very bad in 
 terms of CPU and disk IO.
 why to use one more digest (which have to be calculated), there's already 
 relatively small Bloom filter file, which can be used for saving network 
 traffic instead of sha1 digest.
 
 any explanation ?
 
 Ilya Shipitsin