Boyang Chen created KAFKA-10237:
-----------------------------------

             Summary: Properly handle in-memory stores OOM
                 Key: KAFKA-10237
                 URL: https://issues.apache.org/jira/browse/KAFKA-10237
             Project: Kafka
          Issue Type: Improvement
          Components: streams
            Reporter: Boyang Chen


We have seen the in-memory store buffered too much data and eventually get OOM. 
Generally speaking, OOM has no real indication of the underlying problem and 
increases the difficulty for user debugging, since the failed thread may not be 
the actual culprit which causes the explosion. If we could get better 
protection to avoid hitting memory limit, or at least giving out a clear guide, 
the end user debugging would be much simpler. 

To make it work, we need to enforce a certain memory limit below heap size and 
take actions  when hitting it. The first question would be, whether we set a 
numeric limit, such as 100MB or 500MB, or a percentile limit, such as 60% or 
80% of total memory.

The second question is about the action itself. One approach would be crashing 
the store immediately and inform the user to increase their application 
capacity. The second approach would be opening up an on-disk store 
spontaneously and offload the data to it.

Personally I'm in favor of approach #2 because it has minimum impact to the 
on-going application. However it is more complex and potentially requires 
significant works to define the proper behavior such as the default store 
configuration, how to manage its lifecycle, etc.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to