[ 
https://issues.apache.org/jira/browse/KAFKA-12559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17317460#comment-17317460
 ] 

A. Sophie Blee-Goldman commented on KAFKA-12559:
------------------------------------------------

Sure thing -- I recommend checking out the example implementation in the Memory 
Management section (linked to in the ticket description) and read up on the KIP 
process if you're not familiar with it yet: 
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals

The idea here is to add one (or two) new StreamsConfigs which let the user 
control the rocksdb memory without having to implement a RocksDBConfigSetter. 
So if a user has set these configs but not a RocksDBConfigSetter, then we would 
have a default RocksDBConfigSetter similar to the one in the Memory Management 
section, where the rocksdb.max.bytes.off.heap config determines the value of 
TOTAL_OFF_HEAP_MEMORY in the example.

We probably also want to give users a way to control the other parameter in 
that example, TOTAL_MEMTABLE_MEMORY. The basic formula for memory usage in 
rocksdb is TOTAL_OFF_HEAP_MEMORY = TOTAL_MEMTABLE_MEMORY + TOTAL_CACHE_MEMORY 
-- so just think about what is the best way to let users specify how much of 
the total memory should go to the cache vs towards the memory. Maybe you just 
want one config for TOTAL_MEMTABLE_MEMORY, or you could consider a config like 
rocksdb.memtable.to.block.cache.off.heap.memory.ratio which represents the 
ratio of memtable memory, ie TOTAL_MEMTABLE_MEMORY / TOTAL_OFF_HEAP_MEMORY

Does that make sense? Let me know if you have any specific questions

> Add a top-level Streams config for bounding off-heap memory
> -----------------------------------------------------------
>
>                 Key: KAFKA-12559
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12559
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: A. Sophie Blee-Goldman
>            Priority: Major
>              Labels: needs-kip, newbie, newbie++
>
> At the moment we provide an example of how to bound the memory usage of 
> rocskdb in the [Memory 
> Management|https://kafka.apache.org/27/documentation/streams/developer-guide/memory-mgmt.html#rocksdb]
>  section of the docs. This requires implementing a custom RocksDBConfigSetter 
> class and setting a number of rocksdb options for relatively advanced 
> concepts and configurations. It seems a fair number of users either fail to 
> find this or consider it to be for more advanced use cases/users. But RocksDB 
> can eat up a lot of off-heap memory and it's not uncommon for users to come 
> across a {{RocksDBException: Cannot allocate memory}}
> It would probably be a much better user experience if we implemented this 
> memory bound out-of-the-box and just gave users a top-level StreamsConfig to 
> tune the off-heap memory given to rocksdb, like we have for on-heap cache 
> memory with cache.max.bytes.buffering. More advanced users can continue to 
> fine-tune their memory bounding and apply other configs with a custom config 
> setter, while new or more casual users can cap on the off-heap memory without 
> getting their hands dirty with rocksdb.
> I would propose to add the following top-level config:
> rocksdb.max.bytes.off.heap: medium priority, default to -1 (unbounded), valid 
> values are [0, inf]
> I'd also want to consider adding a second, lower priority top-level config to 
> give users a knob for adjusting how much of that total off-heap memory goes 
> to the block cache + index/filter blocks, and how much of it is afforded to 
> the write buffers. I'm struggling to come up with a good name for this 
> config, but it would be something like
> rocksdb.memtable.to.block.cache.off.heap.memory.ratio: low priority, default 
> to 0.5, valid values are [0, 1]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to