Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
lz4 is supposed to achieve similar compression while using less resources
then snappy. It is easy to test, just change then run a 'nodetool rebuild'
. Not sure when lz4 was introduced but being that it is new to cassandra
there may not be many large deployments running it yet.


On Thu, May 16, 2013 at 4:40 PM, Keith Wright  wrote:

> Thank you for that.  I did not have trickle_fsync enabled and will give it
> a try.  I just noticed that when running a describe on my table, I do not
> see the sstable size parameter (compaction_strategy_options =
> {'sstable_size_in_mb':5}) included.  Is that expected?  Does it mean its
> using the defaults?
>
> Assuming none of the tuning here makes a noticeable difference, my next
> step is to try switching from LZ4 to Snappy.  Any opinions on that?
>
> Thanks!
>
> CREATE TABLE global_user (
>   user_id bigint,
>   app_id int,
>   type text,
>   name text,
>   extra_param map,
>   last timestamp,
>   paid boolean,
>   sku_time map,
>   values map,
>   PRIMARY KEY (user_id, app_id, type, name)
> ) WITH
>   bloom_filter_fp_chance=0.10 AND
>   caching='KEYS_ONLY' AND
>   comment='' AND
>   dclocal_read_repair_chance=0.00 AND
>   gc_grace_seconds=86400 AND
>   read_repair_chance=0.10 AND
>   replicate_on_write='true' AND
>   populate_io_cache_on_flush='false' AND
>   compaction={'class': 'LeveledCompactionStrategy'} AND
>   compression={'chunk_length_kb': '8', 'crc_check_chance': '0.1',
> 'sstable_compression': 'LZ4Compressor'};
>
> From: Igor 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, May 16, 2013 4:27 PM
> To: "user@cassandra.apache.org" 
> Subject: Re: SSTable size versus read performance
>
> just in case it will be useful to somebody - here is my checklist for
> better read performance from SSD
>
> 1. limit read-ahead to 16 or 32
> 2. enable 'trickle_fsync' (available starting from cassandra 1.1.x)
> 3. use 'deadline' io-scheduler (much more important for rotational drives
> then for SSD)
> 4. format data partition starting on 2048 sector boundary
> 5. use ext4 with noatime,nodiratime,discard mount options
>
> On 05/16/2013 10:48 PM, Edward Capriolo wrote:
>
> I was going to say something similar I feel like the SSD drives read much
> "more" then the standard drive. Read Ahead/arge sectors could and probably
> does explain it.
>
>
> On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot wrote:
>
>> 512 sectors for read-ahead.  Are your new fancy SSD drives using large
>> sectors?  If your read-ahead is really reading 512 x 4KB per random IO,
>> then that 2 MB per read seems like a lot of extra overhead.
>>
>> -Bryan
>>
>>
>>
>>
>> On Thu, May 16, 2013 at 12:35 PM, Keith Wright wrote:
>>
>>> We actually have it set to 512.  I have tried decreasing my SSTable size
>>> to 5 MB and changing the chunk size to 8 kb
>>>
>>> From: Igor 
>>> Reply-To: "user@cassandra.apache.org" 
>>> Date: Thursday, May 16, 2013 1:55 PM
>>>
>>> To: "user@cassandra.apache.org" 
>>> Subject: Re: SSTable size versus read performance
>>>
>>> My 5 cents: I'd check blockdev --getra for data drives - too high values
>>> for readahead (default to 256 for debian) can hurt read performance.
>>>
>>>
>
>


Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
Thank you for that.  I did not have trickle_fsync enabled and will give it a 
try.  I just noticed that when running a describe on my table, I do not see the 
sstable size parameter (compaction_strategy_options = {'sstable_size_in_mb':5}) 
included.  Is that expected?  Does it mean its using the defaults?

Assuming none of the tuning here makes a noticeable difference, my next step is 
to try switching from LZ4 to Snappy.  Any opinions on that?

Thanks!

CREATE TABLE global_user (
  user_id bigint,
  app_id int,
  type text,
  name text,
  extra_param map,
  last timestamp,
  paid boolean,
  sku_time map,
  values map,
  PRIMARY KEY (user_id, app_id, type, name)
) WITH
  bloom_filter_fp_chance=0.10 AND
  caching='KEYS_ONLY' AND
  comment='' AND
  dclocal_read_repair_chance=0.00 AND
  gc_grace_seconds=86400 AND
  read_repair_chance=0.10 AND
  replicate_on_write='true' AND
  populate_io_cache_on_flush='false' AND
  compaction={'class': 'LeveledCompactionStrategy'} AND
  compression={'chunk_length_kb': '8', 'crc_check_chance': '0.1', 
'sstable_compression': 'LZ4Compressor'};

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 4:27 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

just in case it will be useful to somebody - here is my checklist for better 
read performance from SSD

1. limit read-ahead to 16 or 32
2. enable 'trickle_fsync' (available starting from cassandra 1.1.x)
3. use 'deadline' io-scheduler (much more important for rotational drives then 
for SSD)
4. format data partition starting on 2048 sector boundary
5. use ext4 with noatime,nodiratime,discard mount options

On 05/16/2013 10:48 PM, Edward Capriolo wrote:
I was going to say something similar I feel like the SSD drives read much 
"more" then the standard drive. Read Ahead/arge sectors could and probably does 
explain it.


On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot 
mailto:btal...@aeriagames.com>> wrote:
512 sectors for read-ahead.  Are your new fancy SSD drives using large sectors? 
 If your read-ahead is really reading 512 x 4KB per random IO, then that 2 MB 
per read seems like a lot of extra overhead.

-Bryan




On Thu, May 16, 2013 at 12:35 PM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:
We actually have it set to 512.  I have tried decreasing my SSTable size to 5 
MB and changing the chunk size to 8 kb

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 1:55 PM

To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

My 5 cents: I'd check blockdev --getra for data drives - too high values for 
readahead (default to 256 for debian) can hurt read performance.





Re: SSTable size versus read performance

2013-05-16 Thread Igor
just in case it will be useful to somebody - here is my checklist for 
better read performance from SSD


1. limit read-ahead to 16 or 32
2. enable 'trickle_fsync' (available starting from cassandra 1.1.x)
3. use 'deadline' io-scheduler (much more important for rotational 
drives then for SSD)

4. format data partition starting on 2048 sector boundary
5. use ext4 with noatime,nodiratime,discard mount options

On 05/16/2013 10:48 PM, Edward Capriolo wrote:
I was going to say something similar I feel like the SSD drives read 
much "more" then the standard drive. Read Ahead/arge sectors could and 
probably does explain it.



On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot <mailto:btal...@aeriagames.com>> wrote:


512 sectors for read-ahead.  Are your new fancy SSD drives using
large sectors?  If your read-ahead is really reading 512 x 4KB per
random IO, then that 2 MB per read seems like a lot of extra overhead.

-Bryan




On Thu, May 16, 2013 at 12:35 PM, Keith Wright
mailto:kwri...@nanigans.com>> wrote:

We actually have it set to 512.  I have tried decreasing my
SSTable size to 5 MB and changing the chunk size to 8 kb

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 1:55 PM

To: "user@cassandra.apache.org
<mailto:user@cassandra.apache.org>" mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

My 5 cents: I'd check blockdev --getra for data drives - too
high values for readahead (default to 256 for debian) can hurt
read performance.






Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
I was going to say something similar I feel like the SSD drives read much
"more" then the standard drive. Read Ahead/arge sectors could and probably
does explain it.


On Thu, May 16, 2013 at 3:43 PM, Bryan Talbot wrote:

> 512 sectors for read-ahead.  Are your new fancy SSD drives using large
> sectors?  If your read-ahead is really reading 512 x 4KB per random IO,
> then that 2 MB per read seems like a lot of extra overhead.
>
> -Bryan
>
>
>
>
> On Thu, May 16, 2013 at 12:35 PM, Keith Wright wrote:
>
>> We actually have it set to 512.  I have tried decreasing my SSTable size
>> to 5 MB and changing the chunk size to 8 kb
>>
>> From: Igor 
>> Reply-To: "user@cassandra.apache.org" 
>> Date: Thursday, May 16, 2013 1:55 PM
>>
>> To: "user@cassandra.apache.org" 
>> Subject: Re: SSTable size versus read performance
>>
>> My 5 cents: I'd check blockdev --getra for data drives - too high values
>> for readahead (default to 256 for debian) can hurt read performance.
>>
>>


Re: SSTable size versus read performance

2013-05-16 Thread Bryan Talbot
512 sectors for read-ahead.  Are your new fancy SSD drives using large
sectors?  If your read-ahead is really reading 512 x 4KB per random IO,
then that 2 MB per read seems like a lot of extra overhead.

-Bryan




On Thu, May 16, 2013 at 12:35 PM, Keith Wright  wrote:

> We actually have it set to 512.  I have tried decreasing my SSTable size
> to 5 MB and changing the chunk size to 8 kb
>
> From: Igor 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, May 16, 2013 1:55 PM
>
> To: "user@cassandra.apache.org" 
> Subject: Re: SSTable size versus read performance
>
> My 5 cents: I'd check blockdev --getra for data drives - too high values
> for readahead (default to 256 for debian) can hurt read performance.
>
>


Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
We actually have it set to 512.  I have tried decreasing my SSTable size to 5 
MB and changing the chunk size to 8 kb (and run an sstableupgrade to ensure 
they took effect) but am still seeing similar performance.  Is anyone running 
lz4 compression in production?  I'm thinking of reverting back to snappy to see 
if that makes a difference.

I appreciate all of the help!

From: Igor mailto:i...@4friends.od.ua>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 1:55 PM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

My 5 cents: I'd check blockdev --getra for data drives - too high values for 
readahead (default to 256 for debian) can hurt read performance.

On 05/16/2013 05:14 PM, Keith Wright wrote:
Hi all,

I currently have 2 clusters, one running on 1.1.10 using CQL2 and one 
running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster are 
expected to have better IO performance as we are going from 1 SSD data disk per 
node in the 1.1 cluster to 3 SSD data disks per node in the 1.2 cluster with 
higher end drives (commit logs are on their own disk shared with the OS).  I am 
doing some stress testing on the 1.2 cluster and have found that although the 
reads / sec as seen from iostat are approximately the same (3K / sec) in both 
clusters, the MB/s read in the new cluster is MUCH higher (7 MB/s in 1.1 as 
compared to 30-50 MB/s in 1.2).  As a result, I am seeing excessive iowait in 
the 1.2 cluster causing high average read times of 30 ms under the same load 
(1.1 cluster sees around 5 ms).  They are both using Leveled compaction but one 
thing I did change in the new cluster was to increase the sstable size from the 
OOTB setting to 32 MB.  Note that my reads are by definition highly random as 
we are running memcached in front for various reasons.  Does cassandra need to 
read the entire SSTable when fetching a row or only the relevant chunk (I have 
the OOTB chunk size and BF settings)?  I just decreased the sstable size to 5 
MB and am waiting for compactions to complete to see if that makes a difference.

Thanks!

Relevant table definition if helpful (note that I also changed to the LZ4 
compressor expecting better read performance and I decreased the crc change 
again to minimize read latency):

CREATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values map,
sku_time map,
extra_param map,
PRIMARY KEY (user_id, app_id, type, name)
) with 
compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and
compaction={'class':'LeveledCompactionStrategy'} and
compaction_strategy_options = {'sstable_size_in_mb':5} and
gc_grace_seconds = 86400;



Re: SSTable size versus read performance

2013-05-16 Thread Igor
My 5 cents: I'd check blockdev --getra for data drives - too high values 
for readahead (default to 256 for debian) can hurt read performance.


On 05/16/2013 05:14 PM, Keith Wright wrote:

Hi all,

I currently have 2 clusters, one running on 1.1.10 using CQL2 and 
one running on 1.2.4 using CQL3 and Vnodes.   The machines in the 
1.2.4 cluster are expected to have better IO performance as we are 
going from 1 SSD data disk per node in the 1.1 cluster to 3 SSD data 
disks per node in the 1.2 cluster with higher end drives (commit logs 
are on their own disk shared with the OS).  I am doing some stress 
testing on the 1.2 cluster and have found that although the reads / 
sec as seen from iostat are approximately the same (3K / sec) in both 
clusters, the MB/s read in the new cluster is MUCH higher (7 MB/s in 
1.1 as compared to 30-50 MB/s in 1.2).  As a result, I am seeing 
excessive iowait in the 1.2 cluster causing high average read times of 
30 ms under the same load (1.1 cluster sees around 5 ms).  They are 
both using Leveled compaction but one thing I did change in the new 
cluster was to increase the sstable size from the OOTB setting to 32 
MB.  Note that my reads are by definition highly random as we are 
running memcached in front for various reasons.  Does cassandra need 
to read the entire SSTable when fetching a row or only the relevant 
chunk (I have the OOTB chunk size and BF settings)?  I just decreased 
the sstable size to 5 MB and am waiting for compactions to complete to 
see if that makes a difference.


Thanks!

Relevant table definition if helpful (note that I also changed to the 
LZ4 compressor expecting better read performance and I decreased the 
crc change again to minimize read latency):


CREATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values map,
sku_time map,
extra_param map,
PRIMARY KEY (user_id, app_id, type, name)
) with compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} 
and

compaction={'class':'LeveledCompactionStrategy'} and
compaction_strategy_options = {'sstable_size_in_mb':5} and
gc_grace_seconds = 86400;




Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
Does Cassandra need to load the entire SSTable into memory to uncompress it or 
does it only load the relevant block?  I ask because if its the latter, that 
would not explain why I'm seeing so much higher read MB/s in the 1.2 cluster as 
the block sizes are the same in both.

From: Edward Capriolo mailto:edlinuxg...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 10:47 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

With you use compression you should play with your block size. I believe the 
default may be 32K but I had more success with 8K, nearly same compression 
ratio, less young gen memory pressure.


On Thu, May 16, 2013 at 10:42 AM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:
The biggest reason I'm using compression here is that my data lends itself well 
to it due to the composite columns.  My current compression ratio is 30.5%.  
Not sure it matters but my BF false positive ration os 0.048.

From: Edward Capriolo mailto:edlinuxg...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 10:23 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

I am not sure of the new default is to use compression, but I do not believe 
compression is a good default. I find compression is better for larger column 
families that are sparsely read. For high throughput CF's I feel that 
decompressing larger blocks hurts performance more then compression adds.


On Thu, May 16, 2013 at 10:14 AM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:
Hi all,

I currently have 2 clusters, one running on 1.1.10 using CQL2 and one 
running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster are 
expected to have better IO performance as we are going from 1 SSD data disk per 
node in the 1.1 cluster to 3 SSD data disks per node in the 1.2 cluster with 
higher end drives (commit logs are on their own disk shared with the OS).  I am 
doing some stress testing on the 1.2 cluster and have found that although the 
reads / sec as seen from iostat are approximately the same (3K / sec) in both 
clusters, the MB/s read in the new cluster is MUCH higher (7 MB/s in 1.1 as 
compared to 30-50 MB/s in 1.2).  As a result, I am seeing excessive iowait in 
the 1.2 cluster causing high average read times of 30 ms under the same load 
(1.1 cluster sees around 5 ms).  They are both using Leveled compaction but one 
thing I did change in the new cluster was to increase the sstable size from the 
OOTB setting to 32 MB.  Note that my reads are by definition highly random as 
we are running memcached in front for various reasons.  Does cassandra need to 
read the entire SSTable when fetching a row or only the relevant chunk (I have 
the OOTB chunk size and BF settings)?  I just decreased the sstable size to 5 
MB and am waiting for compactions to complete to see if that makes a difference.

Thanks!

Relevant table definition if helpful (note that I also changed to the LZ4 
compressor expecting better read performance and I decreased the crc change 
again to minimize read latency):

CREATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values map,
sku_time map,
extra_param map,
PRIMARY KEY (user_id, app_id, type, name)
) with 
compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and
compaction={'class':'LeveledCompactionStrategy'} and
compaction_strategy_options = {'sstable_size_in_mb':5} and
gc_grace_seconds = 86400;




Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
With you use compression you should play with your block size. I believe
the default may be 32K but I had more success with 8K, nearly same
compression ratio, less young gen memory pressure.


On Thu, May 16, 2013 at 10:42 AM, Keith Wright  wrote:

> The biggest reason I'm using compression here is that my data lends itself
> well to it due to the composite columns.  My current compression ratio is
> 30.5%.  Not sure it matters but my BF false positive ration os 0.048.
>
> From: Edward Capriolo 
> Reply-To: "user@cassandra.apache.org" 
> Date: Thursday, May 16, 2013 10:23 AM
> To: "user@cassandra.apache.org" 
> Subject: Re: SSTable size versus read performance
>
> I am not sure of the new default is to use compression, but I do not
> believe compression is a good default. I find compression is better for
> larger column families that are sparsely read. For high throughput CF's I
> feel that decompressing larger blocks hurts performance more then
> compression adds.
>
>
> On Thu, May 16, 2013 at 10:14 AM, Keith Wright wrote:
>
>> Hi all,
>>
>> I currently have 2 clusters, one running on 1.1.10 using CQL2 and one
>> running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster
>> are expected to have better IO performance as we are going from 1 SSD data
>> disk per node in the 1.1 cluster to 3 SSD data disks per node in the 1.2
>> cluster with higher end drives (commit logs are on their own disk shared
>> with the OS).  I am doing some stress testing on the 1.2 cluster and have
>> found that although the reads / sec as seen from iostat are approximately
>> the same (3K / sec) in both clusters, the MB/s read in the new cluster is
>> MUCH higher (7 MB/s in 1.1 as compared to 30-50 MB/s in 1.2).  As a result,
>> I am seeing excessive iowait in the 1.2 cluster causing high average read
>> times of 30 ms under the same load (1.1 cluster sees around 5 ms).  They
>> are both using Leveled compaction but one thing I did change in the new
>> cluster was to increase the sstable size from the OOTB setting to 32 MB.
>>  Note that my reads are by definition highly random as we are running
>> memcached in front for various reasons.  Does cassandra need to read the
>> entire SSTable when fetching a row or only the relevant chunk (I have the
>> OOTB chunk size and BF settings)?  I just decreased the sstable size to 5
>> MB and am waiting for compactions to complete to see if that makes a
>> difference.
>>
>> Thanks!
>>
>> Relevant table definition if helpful (note that I also changed to the LZ4
>> compressor expecting better read performance and I decreased the crc change
>> again to minimize read latency):
>>
>> CREATE TABLE global_user (
>> user_id BIGINT,
>> app_id INT,
>> type TEXT,
>> name TEXT,
>> last TIMESTAMP,
>> paid BOOLEAN,
>> values map,
>> sku_time map,
>> extra_param map,
>> PRIMARY KEY (user_id, app_id, type, name)
>> ) with 
>> compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'}
>> and
>> compaction={'class':'LeveledCompactionStrategy'} and
>> compaction_strategy_options = {'sstable_size_in_mb':5} and
>> gc_grace_seconds = 86400;
>>
>
>


Re: SSTable size versus read performance

2013-05-16 Thread Keith Wright
The biggest reason I'm using compression here is that my data lends itself well 
to it due to the composite columns.  My current compression ratio is 30.5%.  
Not sure it matters but my BF false positive ration os 0.048.

From: Edward Capriolo mailto:edlinuxg...@gmail.com>>
Reply-To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Date: Thursday, May 16, 2013 10:23 AM
To: "user@cassandra.apache.org<mailto:user@cassandra.apache.org>" 
mailto:user@cassandra.apache.org>>
Subject: Re: SSTable size versus read performance

I am not sure of the new default is to use compression, but I do not believe 
compression is a good default. I find compression is better for larger column 
families that are sparsely read. For high throughput CF's I feel that 
decompressing larger blocks hurts performance more then compression adds.


On Thu, May 16, 2013 at 10:14 AM, Keith Wright 
mailto:kwri...@nanigans.com>> wrote:
Hi all,

I currently have 2 clusters, one running on 1.1.10 using CQL2 and one 
running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster are 
expected to have better IO performance as we are going from 1 SSD data disk per 
node in the 1.1 cluster to 3 SSD data disks per node in the 1.2 cluster with 
higher end drives (commit logs are on their own disk shared with the OS).  I am 
doing some stress testing on the 1.2 cluster and have found that although the 
reads / sec as seen from iostat are approximately the same (3K / sec) in both 
clusters, the MB/s read in the new cluster is MUCH higher (7 MB/s in 1.1 as 
compared to 30-50 MB/s in 1.2).  As a result, I am seeing excessive iowait in 
the 1.2 cluster causing high average read times of 30 ms under the same load 
(1.1 cluster sees around 5 ms).  They are both using Leveled compaction but one 
thing I did change in the new cluster was to increase the sstable size from the 
OOTB setting to 32 MB.  Note that my reads are by definition highly random as 
we are running memcached in front for various reasons.  Does cassandra need to 
read the entire SSTable when fetching a row or only the relevant chunk (I have 
the OOTB chunk size and BF settings)?  I just decreased the sstable size to 5 
MB and am waiting for compactions to complete to see if that makes a difference.

Thanks!

Relevant table definition if helpful (note that I also changed to the LZ4 
compressor expecting better read performance and I decreased the crc change 
again to minimize read latency):

CREATE TABLE global_user (
user_id BIGINT,
app_id INT,
type TEXT,
name TEXT,
last TIMESTAMP,
paid BOOLEAN,
values map,
sku_time map,
extra_param map,
PRIMARY KEY (user_id, app_id, type, name)
) with 
compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'} and
compaction={'class':'LeveledCompactionStrategy'} and
compaction_strategy_options = {'sstable_size_in_mb':5} and
gc_grace_seconds = 86400;



Re: SSTable size versus read performance

2013-05-16 Thread Edward Capriolo
I am not sure of the new default is to use compression, but I do not
believe compression is a good default. I find compression is better for
larger column families that are sparsely read. For high throughput CF's I
feel that decompressing larger blocks hurts performance more then
compression adds.


On Thu, May 16, 2013 at 10:14 AM, Keith Wright  wrote:

> Hi all,
>
> I currently have 2 clusters, one running on 1.1.10 using CQL2 and one
> running on 1.2.4 using CQL3 and Vnodes.   The machines in the 1.2.4 cluster
> are expected to have better IO performance as we are going from 1 SSD data
> disk per node in the 1.1 cluster to 3 SSD data disks per node in the 1.2
> cluster with higher end drives (commit logs are on their own disk shared
> with the OS).  I am doing some stress testing on the 1.2 cluster and have
> found that although the reads / sec as seen from iostat are approximately
> the same (3K / sec) in both clusters, the MB/s read in the new cluster is
> MUCH higher (7 MB/s in 1.1 as compared to 30-50 MB/s in 1.2).  As a result,
> I am seeing excessive iowait in the 1.2 cluster causing high average read
> times of 30 ms under the same load (1.1 cluster sees around 5 ms).  They
> are both using Leveled compaction but one thing I did change in the new
> cluster was to increase the sstable size from the OOTB setting to 32 MB.
>  Note that my reads are by definition highly random as we are running
> memcached in front for various reasons.  Does cassandra need to read the
> entire SSTable when fetching a row or only the relevant chunk (I have the
> OOTB chunk size and BF settings)?  I just decreased the sstable size to 5
> MB and am waiting for compactions to complete to see if that makes a
> difference.
>
> Thanks!
>
> Relevant table definition if helpful (note that I also changed to the LZ4
> compressor expecting better read performance and I decreased the crc change
> again to minimize read latency):
>
> CREATE TABLE global_user (
> user_id BIGINT,
> app_id INT,
> type TEXT,
> name TEXT,
> last TIMESTAMP,
> paid BOOLEAN,
> values map,
> sku_time map,
> extra_param map,
> PRIMARY KEY (user_id, app_id, type, name)
> ) with 
> compression={'crc_check_chance':0.1,'sstable_compression':'LZ4Compressor'}
> and
> compaction={'class':'LeveledCompactionStrategy'} and
> compaction_strategy_options = {'sstable_size_in_mb':5} and
> gc_grace_seconds = 86400;
>