Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
I didn't know you use actual key instead its md5 (for random patitioner) in
KCF.  It's good point that I'll watch hit ratio of KCF to determine whether
it needs to be increased.

Thanks,
-Weijun

On Tue, Feb 16, 2010 at 5:34 PM, Jonathan Ellis  wrote:

> On Tue, Feb 16, 2010 at 7:27 PM, Weijun Li  wrote:
> > Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk
> i/o.
> > I compacted the data to be a 60GB (took quite a while to finish and it
> > increased latency as expected) one but doesn't help much either.
> >
> > If I set KCF to 1 (meaning to cache all sstable index), how much memory
> will
> > it take for 50mil keys?
>
> 10/3 what 0.3 takes :)
>
> >Is the index a straight key-offset map? I guess key
> > is 16 bytes and offset is 8 bytes.
>
> key length depends on your data, of course.
>
> > Will KCF=1 help to reduce disk i/o?
>
> depends.  w/ trunk you can look at your cache hit rate w/ jconsole to
> see if increasing it more would help.
>
> -Jonathan
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Jonathan Ellis
On Tue, Feb 16, 2010 at 7:27 PM, Weijun Li  wrote:
> Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o.
> I compacted the data to be a 60GB (took quite a while to finish and it
> increased latency as expected) one but doesn't help much either.
>
> If I set KCF to 1 (meaning to cache all sstable index), how much memory will
> it take for 50mil keys?

10/3 what 0.3 takes :)

>Is the index a straight key-offset map? I guess key
> is 16 bytes and offset is 8 bytes.

key length depends on your data, of course.

> Will KCF=1 help to reduce disk i/o?

depends.  w/ trunk you can look at your cache hit rate w/ jconsole to
see if increasing it more would help.

-Jonathan


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o.
I compacted the data to be a 60GB (took quite a while to finish and it
increased latency as expected) one but doesn't help much either.

If I set KCF to 1 (meaning to cache all sstable index), how much memory will
it take for 50mil keys? Is the index a straight key-offset map? I guess key
is 16 bytes and offset is 8 bytes. Will KCF=1 help to reduce disk i/o?

-Weijun

On Tue, Feb 16, 2010 at 5:18 PM, Jonathan Ellis  wrote:

> Have you tried increasing KeysCachedFraction?
>
> On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li  wrote:
> > Still have high read latency with 50mil records in the 2-node cluster
> > (replica 2). I restarted both nodes but read latency is still above 60ms
> and
> > disk i/o saturation is high. Tried compact and repair but doesn't help
> much.
> > When I reduced the client threads from 15 to 5 it looks a lot better but
> > throughput is kind of low. I changed using flushing thread of 16 instead
> the
> > defaulted 8, could that cause the disk saturation issue?
> >
> > For benchmark with decent throughput and latency, how many client threads
> do
> > they use? Can anyone share your storage-conf.xml in well-tuned high
> volume
> > cluster?
> >
> > -Weijun
> >
> > On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood 
> wrote:
> >>
> >> > After I ran "nodeprobe compact" on node B its read latency went up to
> >> > 150ms.
> >> The compaction process can take a while to finish... in 0.5 you need to
> >> watch the logs to figure out when it has actually finished, and then you
> >> should start seeing the improvement in read latency.
> >>
> >> > Is there any way to utilize all of the heap space to decrease the read
> >> > latency?
> >> In 0.5 you can adjust the number of keys that are cached by changing the
> >> 'KeysCachedFraction' parameter in your config file. In 0.6 you can
> >> additionally cache rows. You don't want to use up all of the memory on
> your
> >> box for those caches though: you'll want to leave at least 50% for your
> OS's
> >> disk cache, which will store the full row content.
> >>
> >>
> >> -Original Message-
> >> From: "Weijun Li" 
> >> Sent: Tuesday, February 16, 2010 12:16pm
> >> To: cassandra-user@incubator.apache.org
> >> Subject: Re: Cassandra benchmark shows OK throughput but high read
> latency
> >> (> 100ms)?
> >>
> >> Thanks for for DataFileDirectory trick and I'll give a try.
> >>
> >> Just noticed the impact of number of data files: node A has 13 data
> files
> >> with read latency of 20ms and node B has 27 files with read latency of
> >> 60ms.
> >> After I ran "nodeprobe compact" on node B its read latency went up to
> >> 150ms.
> >> The read latency of node A became as low as 10ms. Is this normal
> behavior?
> >> I'm using random partitioner and the hardware/JVM settings are exactly
> the
> >> same for these two nodes.
> >>
> >> Another problem is that Java heap usage is always 900mb out of 6GB? Is
> >> there
> >> any way to utilize all of the heap space to decrease the read latency?
> >>
> >> -Weijun
> >>
> >> On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams 
> >> wrote:
> >>
> >> > On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li 
> wrote:
> >> >
> >> >> One more thoughts about Martin's suggestion: is it possible to put
> the
> >> >> data files into multiple directories that are located in different
> >> >> physical
> >> >> disks? This should help to improve the i/o bottleneck issue.
> >> >>
> >> >>
> >> > Yes, you can already do this, just add more 
> >> > directives
> >> > pointed at multiple drives.
> >> >
> >> >
> >> >> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
> >> >
> >> >
> >> > Row cache and key cache both help tremendously if your read pattern
> has
> >> > a
> >> > decent repeat rate.  Completely random io can only be so fast,
> however.
> >> >
> >> > -Brandon
> >> >
> >>
> >>
> >
> >
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Jonathan Ellis
Have you tried increasing KeysCachedFraction?

On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li  wrote:
> Still have high read latency with 50mil records in the 2-node cluster
> (replica 2). I restarted both nodes but read latency is still above 60ms and
> disk i/o saturation is high. Tried compact and repair but doesn't help much.
> When I reduced the client threads from 15 to 5 it looks a lot better but
> throughput is kind of low. I changed using flushing thread of 16 instead the
> defaulted 8, could that cause the disk saturation issue?
>
> For benchmark with decent throughput and latency, how many client threads do
> they use? Can anyone share your storage-conf.xml in well-tuned high volume
> cluster?
>
> -Weijun
>
> On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood  wrote:
>>
>> > After I ran "nodeprobe compact" on node B its read latency went up to
>> > 150ms.
>> The compaction process can take a while to finish... in 0.5 you need to
>> watch the logs to figure out when it has actually finished, and then you
>> should start seeing the improvement in read latency.
>>
>> > Is there any way to utilize all of the heap space to decrease the read
>> > latency?
>> In 0.5 you can adjust the number of keys that are cached by changing the
>> 'KeysCachedFraction' parameter in your config file. In 0.6 you can
>> additionally cache rows. You don't want to use up all of the memory on your
>> box for those caches though: you'll want to leave at least 50% for your OS's
>> disk cache, which will store the full row content.
>>
>>
>> -Original Message-----
>> From: "Weijun Li" 
>> Sent: Tuesday, February 16, 2010 12:16pm
>> To: cassandra-user@incubator.apache.org
>> Subject: Re: Cassandra benchmark shows OK throughput but high read latency
>> (> 100ms)?
>>
>> Thanks for for DataFileDirectory trick and I'll give a try.
>>
>> Just noticed the impact of number of data files: node A has 13 data files
>> with read latency of 20ms and node B has 27 files with read latency of
>> 60ms.
>> After I ran "nodeprobe compact" on node B its read latency went up to
>> 150ms.
>> The read latency of node A became as low as 10ms. Is this normal behavior?
>> I'm using random partitioner and the hardware/JVM settings are exactly the
>> same for these two nodes.
>>
>> Another problem is that Java heap usage is always 900mb out of 6GB? Is
>> there
>> any way to utilize all of the heap space to decrease the read latency?
>>
>> -Weijun
>>
>> On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams 
>> wrote:
>>
>> > On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li  wrote:
>> >
>> >> One more thoughts about Martin's suggestion: is it possible to put the
>> >> data files into multiple directories that are located in different
>> >> physical
>> >> disks? This should help to improve the i/o bottleneck issue.
>> >>
>> >>
>> > Yes, you can already do this, just add more 
>> > directives
>> > pointed at multiple drives.
>> >
>> >
>> >> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
>> >
>> >
>> > Row cache and key cache both help tremendously if your read pattern has
>> > a
>> > decent repeat rate.  Completely random io can only be so fast, however.
>> >
>> > -Brandon
>> >
>>
>>
>
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Still have high read latency with 50mil records in the 2-node cluster
(replica 2). I restarted both nodes but read latency is still above 60ms and
disk i/o saturation is high. Tried compact and repair but doesn't help much.
When I reduced the client threads from 15 to 5 it looks a lot better but
throughput is kind of low. I changed using flushing thread of 16 instead the
defaulted 8, could that cause the disk saturation issue?

For benchmark with decent throughput and latency, how many client threads do
they use? Can anyone share your storage-conf.xml in well-tuned high volume
cluster?

-Weijun

On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood  wrote:

> > After I ran "nodeprobe compact" on node B its read latency went up to
> 150ms.
> The compaction process can take a while to finish... in 0.5 you need to
> watch the logs to figure out when it has actually finished, and then you
> should start seeing the improvement in read latency.
>
> > Is there any way to utilize all of the heap space to decrease the read
> latency?
> In 0.5 you can adjust the number of keys that are cached by changing the
> 'KeysCachedFraction' parameter in your config file. In 0.6 you can
> additionally cache rows. You don't want to use up all of the memory on your
> box for those caches though: you'll want to leave at least 50% for your OS's
> disk cache, which will store the full row content.
>
>
> -Original Message-
> From: "Weijun Li" 
> Sent: Tuesday, February 16, 2010 12:16pm
> To: cassandra-user@incubator.apache.org
> Subject: Re: Cassandra benchmark shows OK throughput but high read latency
> (> 100ms)?
>
> Thanks for for DataFileDirectory trick and I'll give a try.
>
> Just noticed the impact of number of data files: node A has 13 data files
> with read latency of 20ms and node B has 27 files with read latency of
> 60ms.
> After I ran "nodeprobe compact" on node B its read latency went up to
> 150ms.
> The read latency of node A became as low as 10ms. Is this normal behavior?
> I'm using random partitioner and the hardware/JVM settings are exactly the
> same for these two nodes.
>
> Another problem is that Java heap usage is always 900mb out of 6GB? Is
> there
> any way to utilize all of the heap space to decrease the read latency?
>
> -Weijun
>
> On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams 
> wrote:
>
> > On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li  wrote:
> >
> >> One more thoughts about Martin's suggestion: is it possible to put the
> >> data files into multiple directories that are located in different
> physical
> >> disks? This should help to improve the i/o bottleneck issue.
> >>
> >>
> > Yes, you can already do this, just add more 
> directives
> > pointed at multiple drives.
> >
> >
> >> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
> >
> >
> > Row cache and key cache both help tremendously if your read pattern has a
> > decent repeat rate.  Completely random io can only be so fast, however.
> >
> > -Brandon
> >
>
>
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Stu Hood
> After I ran "nodeprobe compact" on node B its read latency went up to 150ms.
The compaction process can take a while to finish... in 0.5 you need to watch 
the logs to figure out when it has actually finished, and then you should start 
seeing the improvement in read latency.

> Is there any way to utilize all of the heap space to decrease the read 
> latency?
In 0.5 you can adjust the number of keys that are cached by changing the 
'KeysCachedFraction' parameter in your config file. In 0.6 you can additionally 
cache rows. You don't want to use up all of the memory on your box for those 
caches though: you'll want to leave at least 50% for your OS's disk cache, 
which will store the full row content.


-Original Message-
From: "Weijun Li" 
Sent: Tuesday, February 16, 2010 12:16pm
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency (> 
100ms)?

Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran "nodeprobe compact" on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams  wrote:

> On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li  wrote:
>
>> One more thoughts about Martin's suggestion: is it possible to put the
>> data files into multiple directories that are located in different physical
>> disks? This should help to improve the i/o bottleneck issue.
>>
>>
> Yes, you can already do this, just add more  directives
> pointed at multiple drives.
>
>
>> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
>
>
> Row cache and key cache both help tremendously if your read pattern has a
> decent repeat rate.  Completely random io can only be so fast, however.
>
> -Brandon
>




Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 12:16 PM, Weijun Li  wrote:

> Thanks for for DataFileDirectory trick and I'll give a try.
>
> Just noticed the impact of number of data files: node A has 13 data files
> with read latency of 20ms and node B has 27 files with read latency of 60ms.
> After I ran "nodeprobe compact" on node B its read latency went up to 150ms.
> The read latency of node A became as low as 10ms. Is this normal behavior?
> I'm using random partitioner and the hardware/JVM settings are exactly the
> same for these two nodes.
>

It sounds like the latency jumped to 150ms because the newly written file
was not in the OS cache.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
> any way to utilize all of the heap space to decrease the read latency?


By default, Cassandra will use a 1GB heap, as set in bin/cassandra.in.sh.
 You can adjust the jvm heap there via the -Xmx option, but generally you
want to balance the jvm vs the OS cache.  With 6GB, I would probably give
2GB to the jvm, but if you aren't having issues now increasing the jvm's
memory probably won't provide any performance gains, but it's worth noting
that with row cache in 0.6 this may change.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Thanks for for DataFileDirectory trick and I'll give a try.

Just noticed the impact of number of data files: node A has 13 data files
with read latency of 20ms and node B has 27 files with read latency of 60ms.
After I ran "nodeprobe compact" on node B its read latency went up to 150ms.
The read latency of node A became as low as 10ms. Is this normal behavior?
I'm using random partitioner and the hardware/JVM settings are exactly the
same for these two nodes.

Another problem is that Java heap usage is always 900mb out of 6GB? Is there
any way to utilize all of the heap space to decrease the read latency?

-Weijun

On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams  wrote:

> On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li  wrote:
>
>> One more thoughts about Martin's suggestion: is it possible to put the
>> data files into multiple directories that are located in different physical
>> disks? This should help to improve the i/o bottleneck issue.
>>
>>
> Yes, you can already do this, just add more  directives
> pointed at multiple drives.
>
>
>> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
>
>
> Row cache and key cache both help tremendously if your read pattern has a
> decent repeat rate.  Completely random io can only be so fast, however.
>
> -Brandon
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li  wrote:

> One more thoughts about Martin's suggestion: is it possible to put the data
> files into multiple directories that are located in different physical
> disks? This should help to improve the i/o bottleneck issue.
>
>
Yes, you can already do this, just add more  directives
pointed at multiple drives.


> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?


Row cache and key cache both help tremendously if your read pattern has a
decent repeat rate.  Completely random io can only be so fast, however.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 11:50 AM, Weijun Li  wrote:

> Dumped 50mil records into my 2-node cluster overnight, made sure that
> there's not many data files (around 30 only) per Martin's suggestion. The
> size of the data directory is 63GB. Now when I read records from the cluster
> the read latency is still ~44ms, --there's no write happening during the
> read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
> saturated:
>
> Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
> 5.24   25.25   4.64  96.17
> sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
> 0.000.00   0.00   0.00
> sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
> 5.24   25.25   4.64  96.17
> sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
> 0.000.00   0.00   0.00
>
> CPU usage is low.
>
> Does this mean disk i/o is the bottleneck for my case? Will it help if I
> increase KCF to cache all sstable index?
>
>
That's exactly what this means.  Disk is slow :(


> Also, this is the almost a read-only mode test, and in reality, our
> write/read ratio is close to 1:1 so I'm guessing read latency will even go
> higher in that case because there will be difficult for cassandra to find a
> good moment to compact the data files that are being busy written.
>

Reads that cause disk seeks are always going to slow things down, since disk
seeks are inherently the slowest operation in a machine.  Writes in
Cassandra should always be fast, as they do not cause any disk seeks.

-Brandon


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
One more thoughts about Martin's suggestion: is it possible to put the data
files into multiple directories that are located in different physical
disks? This should help to improve the i/o bottleneck issue.

Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?

-Weijun

On Tue, Feb 16, 2010 at 9:50 AM, Weijun Li  wrote:

> Dumped 50mil records into my 2-node cluster overnight, made sure that
> there's not many data files (around 30 only) per Martin's suggestion. The
> size of the data directory is 63GB. Now when I read records from the cluster
> the read latency is still ~44ms, --there's no write happening during the
> read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
> saturated:
>
> Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
> avgqu-sz   await  svctm  %util
> sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
> 5.24   25.25   4.64  96.17
> sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
> 0.000.00   0.00   0.00
> sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
> 5.24   25.25   4.64  96.17
> sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
> 0.000.00   0.00   0.00
>
> CPU usage is low.
>
> Does this mean disk i/o is the bottleneck for my case? Will it help if I
> increase KCF to cache all sstable index?
>
> Also, this is the almost a read-only mode test, and in reality, our
> write/read ratio is close to 1:1 so I'm guessing read latency will even go
> higher in that case because there will be difficult for cassandra to find a
> good moment to compact the data files that are being busy written.
>
> Thanks,
> -Weijun
>
>
>
> On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams wrote:
>
>> On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller <
>> martin.grabmuel...@eleven.de> wrote:
>>
>>> In my tests I have observed that good read latency depends on keeping
>>> the number of data files low.  In my current test setup, I have stored
>>> 1.9 TB of data on a single node, which is in 21 data files, and read
>>> latency is between 10 and 60ms (for small reads, larger read of course
>>> take more time).  In earlier stages of my test, I had up to 5000
>>> data files, and read performance was quite bad: my configured 10-second
>>> RPC timeout was regularly encountered.
>>>
>>
>> I believe it is known that crossing sstables is O(NlogN) but I'm unable to
>> find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
>> enlighten me, but in any case I believe
>> https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
>> it.
>>
>> Keeping write volume low enough that compaction can keep up is one
>> solution, and throwing hardware at the problem is another, if necessary.
>>  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
>> repeat hits.
>>
>> -Brandon
>>
>
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Weijun Li
Dumped 50mil records into my 2-node cluster overnight, made sure that
there's not many data files (around 30 only) per Martin's suggestion. The
size of the data directory is 63GB. Now when I read records from the cluster
the read latency is still ~44ms, --there's no write happening during the
read. And iostats shows that the disk (RAID10, 4 250GB 15k SAS) is
saturated:

Device: rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s avgrq-sz
avgqu-sz   await  svctm  %util
sda  47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda1  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00
sda2 47.6767.67 190.33 17.00 23933.33   677.33   118.70
5.24   25.25   4.64  96.17
sda3  0.00 0.00  0.00  0.00 0.00 0.00 0.00
0.000.00   0.00   0.00

CPU usage is low.

Does this mean disk i/o is the bottleneck for my case? Will it help if I
increase KCF to cache all sstable index?

Also, this is the almost a read-only mode test, and in reality, our
write/read ratio is close to 1:1 so I'm guessing read latency will even go
higher in that case because there will be difficult for cassandra to find a
good moment to compact the data files that are being busy written.

Thanks,
-Weijun


On Tue, Feb 16, 2010 at 6:06 AM, Brandon Williams  wrote:

> On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller <
> martin.grabmuel...@eleven.de> wrote:
>
>> In my tests I have observed that good read latency depends on keeping
>> the number of data files low.  In my current test setup, I have stored
>> 1.9 TB of data on a single node, which is in 21 data files, and read
>> latency is between 10 and 60ms (for small reads, larger read of course
>> take more time).  In earlier stages of my test, I had up to 5000
>> data files, and read performance was quite bad: my configured 10-second
>> RPC timeout was regularly encountered.
>>
>
> I believe it is known that crossing sstables is O(NlogN) but I'm unable to
> find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
> enlighten me, but in any case I believe
> https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
> it.
>
> Keeping write volume low enough that compaction can keep up is one
> solution, and throwing hardware at the problem is another, if necessary.
>  Also, the row caching in trunk (soon to be 0.6 we hope) helps greatly for
> repeat hits.
>
> -Brandon
>


Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Brandon Williams
On Tue, Feb 16, 2010 at 2:32 AM, Dr. Martin Grabmüller <
martin.grabmuel...@eleven.de> wrote:

> In my tests I have observed that good read latency depends on keeping
> the number of data files low.  In my current test setup, I have stored
> 1.9 TB of data on a single node, which is in 21 data files, and read
> latency is between 10 and 60ms (for small reads, larger read of course
> take more time).  In earlier stages of my test, I had up to 5000
> data files, and read performance was quite bad: my configured 10-second
> RPC timeout was regularly encountered.
>

I believe it is known that crossing sstables is O(NlogN) but I'm unable to
find the ticket on this at the moment.  Perhaps Stu Hood will jump in and
enlighten me, but in any case I believe
https://issues.apache.org/jira/browse/CASSANDRA-674 will eventually solve
it.

Keeping write volume low enough that compaction can keep up is one solution,
and throwing hardware at the problem is another, if necessary.  Also, the
row caching in trunk (soon to be 0.6 we hope) helps greatly for repeat hits.

-Brandon


RE: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-16 Thread Dr . Martin Grabmüller
> The other problem is: if I keep mixed write and read (e.g, 8 
> write threads
> plus 7 read threads) against the 2-nodes cluster 
> continuously, the read
> latency will go up gradually (along with the size of 
> Cassandra data file),
> and at the end it will become ~40ms (up from ~20ms) even with only 15
> threads. During this process the data file grew from 1.6GB to 
> over 3GB even
> if I kept writing the same key/values to Cassandra. It seems 
> that Cassandra
> keeps appending to sstable data files and will only clean up 
> them during
> node cleanup or compact (please correct me if this is incorrect). 

In my tests I have observed that good read latency depends on keeping
the number of data files low.  In my current test setup, I have stored
1.9 TB of data on a single node, which is in 21 data files, and read
latency is between 10 and 60ms (for small reads, larger read of course
take more time).  In earlier stages of my test, I had up to 5000
data files, and read performance was quite bad: my configured 10-second
RPC timeout was regularly encountered.

The number of data files is reduced whenever Cassandra compacts them,
which is either automatically, when enough datafiles are generated by
continuous writing, or when triggered by nodeprobe compact, cleanup etc.

So my advice is to keep the write throughput low enough so that Cassandra
can keep up compacting the data files.  For high write throughput, you need
fast drives, if possible on different RAIDs, which are configured as
different DataDirectories for Cassandra.  On my setup (6 drives in a single
RAID-5 configuration), compaction is quite slow: sequential reads/writes
are done at 150 MB/s, whereas during compaction, read/write-performance
drops to a few MB/s.  You definitively want more than one logical drive,
so that Cassandra can alternate between them when flushin memtables and
when compacting.

I would really be interested whether my observations are shared by other
people on this list.

Thanks!

Martin


RE: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-15 Thread Weijun Li
It seems that read latency is sensitive to number of threads (or thrift
clients): after reducing number of threads to 15 and read latency decreased
to ~20ms. 

The other problem is: if I keep mixed write and read (e.g, 8 write threads
plus 7 read threads) against the 2-nodes cluster continuously, the read
latency will go up gradually (along with the size of Cassandra data file),
and at the end it will become ~40ms (up from ~20ms) even with only 15
threads. During this process the data file grew from 1.6GB to over 3GB even
if I kept writing the same key/values to Cassandra. It seems that Cassandra
keeps appending to sstable data files and will only clean up them during
node cleanup or compact (please correct me if this is incorrect). 
 
Here's my test settings:

JVM xmx: 6GB
KCF: 0.3
Memtable: 512MB.
Number of records: 1 millon (payload is 1000 bytes)

I used JMX and iostat to watch the cluster but can't find any clue for the
increasing read latency issue: JVM memory, GC, CPU usage, tpstats and io
saturation all seem to be clean. One exception is that the wait time in
iostat goes up quickly once a while but is a small number for most of the
time. Another thing I noticed is that JVM doesn't use more than 1GB of
memory (out of the 6GB I specified for JVM) even if I set KCF to 0.3 and
increased memtable size to 512MB.

Did I miss anything here? How can I diagnose this kind of increasing read
latency issue? Is there any performance tuning guide available?

Thanks,
-Weijun


-Original Message-
From: Jonathan Ellis [mailto:jbel...@gmail.com] 
Sent: Sunday, February 14, 2010 6:22 PM
To: cassandra-user@incubator.apache.org
Subject: Re: Cassandra benchmark shows OK throughput but high read latency
(> 100ms)?

are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li  wrote:
> Hello,
>
>
>
> I saw some Cassandra benchmark reports mentioning read latency that is
less
> than 50ms or even 30ms. But my benchmark with 0.5 doesn't seem to support
> that. Here's my settings:
>
>
>
> Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM
>
> ReplicationFactor=2 Partitioner=Random
>
> JVM Xmx: 4GB
>
> Memory table size: 512MB (haven't figured out how to enable binary
memtable
> so I set both memtable number to 512mb)
>
> Flushing threads: 2-4
>
> Payload: ~1000 bytes, 3 columns in one CF.
>
> Read/write time measure: get startTime right before each Java thrift call,
> transport objects are pre-created upon creation of each thread.
>
>
>
> The result shows that total write throughput is around 2000/sec (for 2
nodes
> in the cluster) which is not bad, and read throughput is just around
> 750/sec. However for each thread the average read latency is more than
> 100ms. I'm running 100 threads for the testing and each thread randomly
pick
> a node for thrift call. So the read/sec of each thread is just around 7.5,
> meaning duration of each thrift call is 1000/7.5=133ms. Without
replication
> the cluster write throughput is around 3300/s, and read throughput is
around
> 1400/s, so the read latency is still around 70ms without replication.
>
>
>
> Is there anything wrong in my benchmark test? How can I achieve a
reasonable
> read latency (< 30ms)?
>
>
>
> Thanks,
>
> -Weijun
>
>
>
>



Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

2010-02-14 Thread Jonathan Ellis
are you i/o bound?  what is your on-disk data set size?  what does
iostats tell you?
http://spyced.blogspot.com/2010/01/linux-performance-basics.html

do you have a lot of pending compactions?  (tpstats will tell you)

have you increased KeysCachedFraction?

On Sun, Feb 14, 2010 at 8:18 PM, Weijun Li  wrote:
> Hello,
>
>
>
> I saw some Cassandra benchmark reports mentioning read latency that is less
> than 50ms or even 30ms. But my benchmark with 0.5 doesn’t seem to support
> that. Here’s my settings:
>
>
>
> Nodes: 2 machines. 2x2.5GHZ Xeon Quad Core (thus 8 cores), 8GB RAM
>
> ReplicationFactor=2 Partitioner=Random
>
> JVM Xmx: 4GB
>
> Memory table size: 512MB (haven’t figured out how to enable binary memtable
> so I set both memtable number to 512mb)
>
> Flushing threads: 2-4
>
> Payload: ~1000 bytes, 3 columns in one CF.
>
> Read/write time measure: get startTime right before each Java thrift call,
> transport objects are pre-created upon creation of each thread.
>
>
>
> The result shows that total write throughput is around 2000/sec (for 2 nodes
> in the cluster) which is not bad, and read throughput is just around
> 750/sec. However for each thread the average read latency is more than
> 100ms. I’m running 100 threads for the testing and each thread randomly pick
> a node for thrift call. So the read/sec of each thread is just around 7.5,
> meaning duration of each thrift call is 1000/7.5=133ms. Without replication
> the cluster write throughput is around 3300/s, and read throughput is around
> 1400/s, so the read latency is still around 70ms without replication.
>
>
>
> Is there anything wrong in my benchmark test? How can I achieve a reasonable
> read latency (< 30ms)?
>
>
>
> Thanks,
>
> -Weijun
>
>
>
>