[ 
https://issues.apache.org/jira/browse/SAMZA-47?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13759827#comment-13759827
 ] 

Jay Kreps commented on SAMZA-47:
--------------------------------

Okay posted an rb:
https://reviews.apache.org/r/14009

This gives the following set of properties:

    stores.my-store.write.batch.size=500
    stores.my-store.object.cache.size=1000
    stores.my-store.leveldb.block.cache.size=16777216
    stores.my-store.leveldb.compress=true
    stores.my-store.leveldb.block.size=4096
    stores.my-store.leveldb.write.buffer.size=8388608

I also changed the write.buffer.size to 8MB and the block cache to 16MB by 
default. The reason is because these are currently per-task (which I wasn't 
thinking of). Hence if you have 50 tasks the old default block cache of 64MB  
would use 3.2GB, which is too much for a default.

However I wonder if this approach is right at all. After all the user budgets 
memory at the CONTAINER level. So to get the math right here you need to 
multiply by the number of tasks in your container and keep updating these as 
this changes. Another approach would be to have the user specify these values 
and divide by the number of containers. This might be a little tricky since 
right now the task kind of stands alone but could be more intuitive when doing 
container memory arithmetic. In this model it is a little harder to say how 
much memory goes to each leveldb instance but you know the total used.

Thoughts?
                
> LevelDB and L1 cache use the same configuration value in KeyValueStorageEngine
> ------------------------------------------------------------------------------
>
>                 Key: SAMZA-47
>                 URL: https://issues.apache.org/jira/browse/SAMZA-47
>             Project: Samza
>          Issue Type: Bug
>    Affects Versions: 0.7.0
>            Reporter: Jay Kreps
>            Assignee: Jay Kreps
>
> Both seem to key off of
>   cache.size
> This is not right. The L1 cache is caching a number of objects and leveldb is 
> allocating a number of bytes. In general the leveldb cache should be big 
> (tens of MBs) and the L1 cache small (a few thousand).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to