[jira] [Commented] (SAMZA-428) Investigate: how to tune down caching in the KeyValueStore implementations

Jay Kreps (JIRA) Thu, 02 Oct 2014 09:47:36 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14156731#comment-14156731
 ]


Jay Kreps commented on SAMZA-428:
---------------------------------

Let me give the rationale here. 

I agree that tuning caching in the setup we have is quite complex because there 
are effectively three levels:
1. Our in heap row cache
2. LevelDB/RocksDB uncompressed block cache
3. LevelDB/RocksDB compressed blocks cached in the filesystem

How to correctly allocate memory between these optimally is pretty workload 
specific.

The row cache (a) avoids serialization overhead, (b) avoids writes to Kafka and 
disk I/O entirely, (c) is extremely wasteful of memory. The memory waste is 
worth considering because of the number of java objects that end up cached, it 
is very unlikely you can get to more than 30% useful data versus object, heap, 
and data structure overhead. So for big chunks of memory I suspect the 
filesystem or RocksDB cache is better.

So why have an in-process cache at all? The rationale was that there are 
actually lots of simple cases that can be vastly improved with even a very 
small in-process cache. These are cases where you are incrementing a small 
number of counters over and over again. Logging out each change is very 
expensive and the serialization overhead is really high since each increment 
requires deserialization and reserialization.  By defaulting to just a small 
in-process cache I think we can make the case of a small data set pretty 
efficient out of the box at the cost of just a little bit of memory.

> Investigate: how to tune down caching in the KeyValueStore implementations
> --------------------------------------------------------------------------
>
>                 Key: SAMZA-428
>                 URL: https://issues.apache.org/jira/browse/SAMZA-428
>             Project: Samza
>          Issue Type: Improvement
>          Components: kv
>    Affects Versions: 0.8.0
>            Reporter: Chinmay Soman
>             Fix For: 0.8.0
>
>
> Currently, we have a 'CachedStore' layer on top of the KeyValueStore 
> implementation that we use. This might lead to double caching:
> i) Once at the CachedStore layer
> ii) Possibly cached again in the specific K-V store that we use (for eg: 
> RocksDB / BDB)
> We need the CachedStore layer so that the writes to LoggedStore (if 
> configured) are done in an efficient manner. 
> We can then potentially do some config tuning for the K-V store to reduce its 
> memory footprint and simply write to disk. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (SAMZA-428) Investigate: how to tune down caching in the KeyValueStore implementations

Reply via email to