Re: Testing row cache feature in trunk: write should put record in cache

Jonathan Ellis Fri, 19 Feb 2010 13:30:29 -0800

yes.

mmap isn't magic, it's just better than managing that yourself.


On Fri, Feb 19, 2010 at 4:25 PM, Weijun Li <weiju...@gmail.com> wrote:
> Is it in trunk too? I'm running trunk build (got from the end of last week)
> in cluster and saw the disk i/o bottleneck.
>
> On Fri, Feb 19, 2010 at 1:03 PM, Jonathan Ellis <jbel...@gmail.com> wrote:
>>
>> mmap is designed to handle that case, yes.  it is already in 0.6 branch.
>>
>> On Fri, Feb 19, 2010 at 2:44 PM, Weijun Li <weiju...@gmail.com> wrote:
>> > I see. How much is the overhead of java serialization? Does it slow down
>> > the
>> > system a lot? It seems to be a tradeoff between CPU usage and memory.
>> >
>> > As for mmap of 0.6, do you mmap the sstable data file even it is a lot
>> > larger than the available memory (e.g., the data file is over 100GB
>> > while
>> > you have only 8GB ram)? How efficient is mmap in this case? Is mmap
>> > already
>> > checked into 0.6 branch?
>> >
>> > -Weijun
>> >
>> > On Fri, Feb 19, 2010 at 4:56 AM, Jonathan Ellis <jbel...@gmail.com>
>> > wrote:
>> >>
>> >> The whole point of rowcache is to avoid the serialization overhead,
>> >> though.  If we just wanted the serialized form cached, we would let
>> >> the os block cache handle that without adding an extra layer.  (0.6
>> >> uses mmap'd i/o by default on 64bit JVMs so this is very efficient.)
>> >>
>> >> On Fri, Feb 19, 2010 at 3:29 AM, Weijun Li <weiju...@gmail.com> wrote:
>> >> > The memory overhead issue is not directly related to GC because when
>> >> > JVM
>> >> > ran
>> >> > out of memory the GC has been very busy for quite a while. In my case
>> >> > JVM
>> >> > consumed all of the 6GB when the row cache size hit 1.4mil.
>> >> >
>> >> > I haven't started test the row cache feature yet. But I think data
>> >> > compression is useful to reduce memory consumption because in my
>> >> > impression
>> >> > disk i/o is always the bottleneck for Cassandra while its CPU usage
>> >> > is
>> >> > usually low all the time. In addition to this, compression should
>> >> > also
>> >> > help
>> >> > to reduce the number of java objects dramatically (correct me if I'm
>> >> > wrong),
>> >> > --especially in case we need to cache most of the data to achieve
>> >> > decent
>> >> > read latency.
>> >> >
>> >> > If ColumnFamily is serializable it shouldn't be that hard to
>> >> > implement
>> >> > the
>> >> > compression feature which can be controlled by an option (again :-)
>> >> > in
>> >> > storage conf xml.
>> >> >
>> >> > When I get to that point you can instruct me to implement this
>> >> > feature
>> >> > along
>> >> > with the row-cache-write-through. Our goal is straightforward: to
>> >> > support
>> >> > short read latency in high volume web application with write/read
>> >> > ratio
>> >> > to
>> >> > be 1:1.
>> >> >
>> >> > -Weijun
>> >> >
>> >> > -----Original Message-----
>> >> > From: Jonathan Ellis [mailto:jbel...@gmail.com]
>> >> > Sent: Thursday, February 18, 2010 12:04 PM
>> >> > To: cassandra-user@incubator.apache.org
>> >> > Subject: Re: Testing row cache feature in trunk: write should put
>> >> > record
>> >> > in
>> >> > cache
>> >> >
>> >> > Did you force a GC from jconsole to make sure you weren't just
>> >> > measuring uncollected garbage?
>> >> >
>> >> > On Wed, Feb 17, 2010 at 2:51 PM, Weijun Li <weiju...@gmail.com>
>> >> > wrote:
>> >> >> OK I'll work on the change later because there's another problem to
>> >> >> solve:
>> >> >> the overhead for cache is too big that 1.4mil records (1k each)
>> >> >> consumed
>> >> > all
>> >> >> of the 6gb memory of JVM (I guess 4gb are consumed by the row
>> >> >> cache).
>> >> >> I'm
>> >> >> thinking that ConcurrentHashMap is not a good choice for LRU and the
>> >> >> row
>> >> >> cache needs to store compressed key data to reduce memory usage.
>> >> >> I'll
>> >> >> do
>> >> >> more investigation on this and let you know.
>> >> >>
>> >> >> -Weijun
>> >> >>
>> >> >> On Tue, Feb 16, 2010 at 9:22 PM, Jonathan Ellis <jbel...@gmail.com>
>> >> >> wrote:
>> >> >>>
>> >> >>> ... tell you what, if you write the option-processing part in
>> >> >>> DatabaseDescriptor I will do the actual cache part. :)
>> >> >>>
>> >> >>> On Tue, Feb 16, 2010 at 11:07 PM, Jonathan Ellis
>> >> >>> <jbel...@gmail.com>
>> >> >>> wrote:
>> >> >>> > https://issues.apache.org/jira/secure/CreateIssue!default.jspa,
>> >> >>> > but
>> >> >>> > this is pretty low priority for me.
>> >> >>> >
>> >> >>> > On Tue, Feb 16, 2010 at 8:37 PM, Weijun Li <weiju...@gmail.com>
>> >> >>> > wrote:
>> >> >>> >> Just tried to make quick change to enable it but it didn't work
>> >> >>> >> out
>> >> > :-(
>> >> >>> >>
>> >> >>> >>                ColumnFamily cachedRow =
>> >> >>> >> cfs.getRawCachedRow(mutation.key());
>> >> >>> >>
>> >> >>> >>                 // What I modified
>> >> >>> >>                 if( cachedRow == null ) {
>> >> >>> >>                     cfs.cacheRow(mutation.key());
>> >> >>> >>                     cachedRow =
>> >> >>> >> cfs.getRawCachedRow(mutation.key());
>> >> >>> >>                 }
>> >> >>> >>
>> >> >>> >>                 if (cachedRow != null)
>> >> >>> >>                     cachedRow.addAll(columnFamily);
>> >> >>> >>
>> >> >>> >> How can I open a ticket for you to make the change (enable row
>> >> >>> >> cache
>> >> >>> >> write
>> >> >>> >> through with an option)?
>> >> >>> >>
>> >> >>> >> Thanks,
>> >> >>> >> -Weijun
>> >> >>> >>
>> >> >>> >> On Tue, Feb 16, 2010 at 5:20 PM, Jonathan Ellis
>> >> >>> >> <jbel...@gmail.com>
>> >> >>> >> wrote:
>> >> >>> >>>
>> >> >>> >>> On Tue, Feb 16, 2010 at 7:17 PM, Jonathan Ellis
>> >> >>> >>> <jbel...@gmail.com>
>> >> >>> >>> wrote:
>> >> >>> >>> > On Tue, Feb 16, 2010 at 7:11 PM, Weijun Li
>> >> >>> >>> > <weiju...@gmail.com>
>> >> >>> >>> > wrote:
>> >> >>> >>> >> Just started to play with the row cache feature in trunk: it
>> >> >>> >>> >> seems
>> >> >>> >>> >> to
>> >> >>> >>> >> be
>> >> >>> >>> >> working fine so far except that for RowsCached parameter you
>> >> >>> >>> >> need
>> >> >>> >>> >> to
>> >> >>> >>> >> specify
>> >> >>> >>> >> number of rows rather than a percentage (e.g., "20%" doesn't
>> >> > work).
>> >> >>> >>> >
>> >> >>> >>> > 20% works, but it's 20% of the rows at server startup.  So on
>> >> >>> >>> > a
>> >> >>> >>> > fresh
>> >> >>> >>> > start that is zero.
>> >> >>> >>> >
>> >> >>> >>> > Maybe we should just get rid of the % feature...
>> >> >>> >>>
>> >> >>> >>> (Actually, it shouldn't be hard to update this on flush, if you
>> >> >>> >>> want
>> >> >>> >>> to open a ticket.)
>> >> >>> >>
>> >> >>> >>
>> >> >>> >
>> >> >>
>> >> >>
>> >> >
>> >> >
>> >
>> >
>
>

Re: Testing row cache feature in trunk: write should put record in cache

Reply via email to