Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

Weijun Li Tue, 16 Feb 2010 17:28:10 -0800

Yes my KeysCachedFraction is already 0.3 but it doesn't relief the disk i/o.
I compacted the data to be a 60GB (took quite a while to finish and it
increased latency as expected) one but doesn't help much either.


If I set KCF to 1 (meaning to cache all sstable index), how much memory will
it take for 50mil keys? Is the index a straight key-offset map? I guess key
is 16 bytes and offset is 8 bytes. Will KCF=1 help to reduce disk i/o?

-Weijun

On Tue, Feb 16, 2010 at 5:18 PM, Jonathan Ellis <jbel...@gmail.com> wrote:

> Have you tried increasing KeysCachedFraction?
>
> On Tue, Feb 16, 2010 at 6:15 PM, Weijun Li <weiju...@gmail.com> wrote:
> > Still have high read latency with 50mil records in the 2-node cluster
> > (replica 2). I restarted both nodes but read latency is still above 60ms
> and
> > disk i/o saturation is high. Tried compact and repair but doesn't help
> much.
> > When I reduced the client threads from 15 to 5 it looks a lot better but
> > throughput is kind of low. I changed using flushing thread of 16 instead
> the
> > defaulted 8, could that cause the disk saturation issue?
> >
> > For benchmark with decent throughput and latency, how many client threads
> do
> > they use? Can anyone share your storage-conf.xml in well-tuned high
> volume
> > cluster?
> >
> > -Weijun
> >
> > On Tue, Feb 16, 2010 at 10:31 AM, Stu Hood <stu.h...@rackspace.com>
> wrote:
> >>
> >> > After I ran "nodeprobe compact" on node B its read latency went up to
> >> > 150ms.
> >> The compaction process can take a while to finish... in 0.5 you need to
> >> watch the logs to figure out when it has actually finished, and then you
> >> should start seeing the improvement in read latency.
> >>
> >> > Is there any way to utilize all of the heap space to decrease the read
> >> > latency?
> >> In 0.5 you can adjust the number of keys that are cached by changing the
> >> 'KeysCachedFraction' parameter in your config file. In 0.6 you can
> >> additionally cache rows. You don't want to use up all of the memory on
> your
> >> box for those caches though: you'll want to leave at least 50% for your
> OS's
> >> disk cache, which will store the full row content.
> >>
> >>
> >> -----Original Message-----
> >> From: "Weijun Li" <weiju...@gmail.com>
> >> Sent: Tuesday, February 16, 2010 12:16pm
> >> To: cassandra-user@incubator.apache.org
> >> Subject: Re: Cassandra benchmark shows OK throughput but high read
> latency
> >> (> 100ms)?
> >>
> >> Thanks for for DataFileDirectory trick and I'll give a try.
> >>
> >> Just noticed the impact of number of data files: node A has 13 data
> files
> >> with read latency of 20ms and node B has 27 files with read latency of
> >> 60ms.
> >> After I ran "nodeprobe compact" on node B its read latency went up to
> >> 150ms.
> >> The read latency of node A became as low as 10ms. Is this normal
> behavior?
> >> I'm using random partitioner and the hardware/JVM settings are exactly
> the
> >> same for these two nodes.
> >>
> >> Another problem is that Java heap usage is always 900mb out of 6GB? Is
> >> there
> >> any way to utilize all of the heap space to decrease the read latency?
> >>
> >> -Weijun
> >>
> >> On Tue, Feb 16, 2010 at 10:01 AM, Brandon Williams <dri...@gmail.com>
> >> wrote:
> >>
> >> > On Tue, Feb 16, 2010 at 11:56 AM, Weijun Li <weiju...@gmail.com>
> wrote:
> >> >
> >> >> One more thoughts about Martin's suggestion: is it possible to put
> the
> >> >> data files into multiple directories that are located in different
> >> >> physical
> >> >> disks? This should help to improve the i/o bottleneck issue.
> >> >>
> >> >>
> >> > Yes, you can already do this, just add more <DataFileDirectory>
> >> > directives
> >> > pointed at multiple drives.
> >> >
> >> >
> >> >> Has anybody tested the row-caching feature in trunk (shoot for 0.6?)?
> >> >
> >> >
> >> > Row cache and key cache both help tremendously if your read pattern
> has
> >> > a
> >> > decent repeat rate.  Completely random io can only be so fast,
> however.
> >> >
> >> > -Brandon
> >> >
> >>
> >>
> >
> >
>

Re: Cassandra benchmark shows OK throughput but high read latency (> 100ms)?

Reply via email to