[
https://issues.apache.org/jira/browse/SAMZA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759827#comment-13759827
]
Jay Kreps commented on SAMZA-47:
--------------------------------
Okay posted an rb:
https://reviews.apache.org/r/14009
This gives the following set of properties:
stores.my-store.write.batch.size=500
stores.my-store.object.cache.size=1000
stores.my-store.leveldb.block.cache.size=16777216
stores.my-store.leveldb.compress=true
stores.my-store.leveldb.block.size=4096
stores.my-store.leveldb.write.buffer.size=8388608
I also changed the write.buffer.size to 8MB and the block cache to 16MB by
default. The reason is because these are currently per-task (which I wasn't
thinking of). Hence if you have 50 tasks the old default block cache of 64MB
would use 3.2GB, which is too much for a default.
However I wonder if this approach is right at all. After all the user budgets
memory at the CONTAINER level. So to get the math right here you need to
multiply by the number of tasks in your container and keep updating these as
this changes. Another approach would be to have the user specify these values
and divide by the number of containers. This might be a little tricky since
right now the task kind of stands alone but could be more intuitive when doing
container memory arithmetic. In this model it is a little harder to say how
much memory goes to each leveldb instance but you know the total used.
Thoughts?
> LevelDB and L1 cache use the same configuration value in KeyValueStorageEngine
> ------------------------------------------------------------------------------
>
> Key: SAMZA-47
> URL: https://issues.apache.org/jira/browse/SAMZA-47
> Project: Samza
> Issue Type: Bug
> Affects Versions: 0.7.0
> Reporter: Jay Kreps
> Assignee: Jay Kreps
>
> Both seem to key off of
> cache.size
> This is not right. The L1 cache is caching a number of objects and leveldb is
> allocating a number of bytes. In general the leveldb cache should be big
> (tens of MBs) and the L1 cache small (a few thousand).
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira