Re: keyvalue cache

Andrew Purtell Wed, 04 Apr 2012 15:10:46 -0700

I thought about trying this out once with a coprocessor, hooking the Gets, with 
an embedded Hazelcast. That would just be a proof of concept. The idea is to 
scale the KV cache independent of regionserver limits (maybe we're only giving 
1 GB per RS to the value cache and a 10 GB region is hot) and the next step 
could be modifying the client to spread read load over replicas (HBASE-2357). 
This doesn't consider scans either.



Best regards,

    - Andy


Problems worthy of attack prove their worth by hitting back. - Piet Hein (via 
Tom White)



>________________________________
> From: Matt Corgan <mcor...@hotpads.com>
>To: dev@hbase.apache.org 
>Sent: Wednesday, April 4, 2012 2:46 PM
>Subject: Re: keyvalue cache
> 
>It could act like a HashSet of KeyValues keyed on the
>rowKey+family+qualifier but not including the timestamp.  As writes come in
>it would evict or overwrite previous versions (read-through vs
>write-through).  It would only service point queries where the
>row+fam+qualifier are specified, returning the latest version.  Wouldn't be
>able to do a typical rowKey-only Get (scan behind the scenes) because it
>wouldn't know if it contained all the cells in the row, but if you could
>specify all your row's qualifiers up-front it could work.
>
>
>On Wed, Apr 4, 2012 at 2:30 PM, Vladimir Rodionov
><vrodio...@carrieriq.com>wrote:
>
>> 1. 2KB can be too large for some applications. For example, some of our
>> k-v sizes < 100 bytes combined.
>> 2. These tables (from 1.) do not benefit from block cache at all (we did
>> not try 100 B block size yet :)
>> 3. And Matt is absolutely right: small block size is expensive
>>
>> How about doing point queries on K-V cache and  bypass K-V cache on all
>> Scans (when someone really need this)?
>> Implement K-V cache as a coprocessor application?
>>
>> Invalidation of K-V entry is not necessary if all upserts operations go
>> through K-V cache firstly if it sits in front of MemStore.
>> There will be no "stale or invalid" data situation in this case. Correct?
>> No need for data to be sorted and no need for data to be merged
>> into a scan (we do not use K-V cache for Scans)
>>
>>
>> Best regards,
>> Vladimir Rodionov
>> Principal Platform Engineer
>> Carrier IQ, www.carrieriq.com
>> e-mail: vrodio...@carrieriq.com
>>
>> ________________________________________
>> From: Matt Corgan [mcor...@hotpads.com]
>> Sent: Wednesday, April 04, 2012 11:40 AM
>> To: dev@hbase.apache.org
>> Subject: Re: keyvalue cache
>>
>> I guess the benefit of the KV cache is that you are not holding entire 64K
>> blocks in memory when you only care about 200 bytes of them.  Would an
>> alternative be to set a small block size (2KB or less)?
>>
>> The problems with small block sizes would be expensive block cache
>> management overhead and inefficient scanning IO due to lack of read-ahead.
>>  Maybe improving the cache management and read-ahead would be more general
>> improvements that don't add as much complexity?
>>
>> I'm having a hard time envisioning how you would do invalidations on the KV
>> cache and how you would merge its entries into a scan, etc.  Would it
>> basically be a memstore in front of the memstore where KVs get individually
>> invalidated instead of bulk-flushed?  Would it be sorted or hashed?
>>
>> Matt
>>
>> On Wed, Apr 4, 2012 at 10:35 AM, Enis Söztutar <e...@apache.org> wrote:
>>
>> > As you said, caching the entire row does not make much sense, given that
>> > the families are by contract the access boundaries. But caching column
>> > families might be a good trade of for dealing with the per-item overhead.
>> >
>> > Also agreed on cache being configurable at the table or better cf level.
>> I
>> > think we can do something like enable_block_cache = true,
>> > enable_kv_cache=false, per column family.
>> >
>> > Enis
>> >
>> > On Tue, Apr 3, 2012 at 11:03 PM, Vladimir Rodionov
>> > <vrodio...@carrieriq.com>wrote:
>> >
>> > > Usually make sense for tables with random mostly access (point queries)
>> > > For short-long scans block cache is preferable.
>> > > Cassandra has it (Row cache) but as since they cache the whole row
>> (which
>> > > can be very large) in many cases
>> > > it has sub par performance. Make sense to make caching configurable:
>> > table
>> > > can use key-value cache and do not use block cache
>> > > and vice verse.
>> > >
>> > > Best regards,
>> > > Vladimir Rodionov
>> > > Principal Platform Engineer
>> > > Carrier IQ, www.carrieriq.com
>> > > e-mail: vrodio...@carrieriq.com
>> > >
>> > > ________________________________________
>> > > From: Enis Söztutar [e...@apache.org]
>> > > Sent: Tuesday, April 03, 2012 3:34 PM
>> > > To: dev@hbase.apache.org
>> > > Subject: keyvalue cache
>> > >
>> > > Hi,
>> > >
>> > > Before opening the issue, I though I should ask around first. What do
>> you
>> > > think about a keyvalue cache sitting on top of the block cache? It is
>> > > mentioned in the big table paper, and it seems that zipfian kv access
>> > > patterns might benefit from something like this a lot. I could not find
>> > > anybody who proposed that before.
>> > >
>> > > What do you guys think? Should we pursue a kv query-cache. My gut
>> feeling
>> > > says that especially for some workloads we might gain significant
>> > > performance improvements, but we cannot verify it, until we implement
>> and
>> > > profile it, right?
>> > >
>> > > Thanks,
>> > > Enis
>> > >
>> > > Confidentiality Notice:  The information contained in this message,
>> > > including any attachments hereto, may be confidential and is intended
>> to
>> > be
>> > > read only by the individual or entity to whom this message is
>> addressed.
>> > If
>> > > the reader of this message is not the intended recipient or an agent or
>> > > designee of the intended recipient, please note that any review, use,
>> > > disclosure or distribution of this message or its attachments, in any
>> > form,
>> > > is strictly prohibited.  If you have received this message in error,
>> > please
>> > > immediately notify the sender and/or notificati...@carrieriq.com and
>> > > delete or destroy any copy of this message and its attachments.
>> > >
>> >
>>
>> Confidentiality Notice:  The information contained in this message,
>> including any attachments hereto, may be confidential and is intended to be
>> read only by the individual or entity to whom this message is addressed. If
>> the reader of this message is not the intended recipient or an agent or
>> designee of the intended recipient, please note that any review, use,
>> disclosure or distribution of this message or its attachments, in any form,
>> is strictly prohibited.  If you have received this message in error, please
>> immediately notify the sender and/or notificati...@carrieriq.com and
>> delete or destroy any copy of this message and its attachments.
>>
>
>
>

Re: keyvalue cache

Reply via email to