[jira] [Commented] (KAFKA-6777) Wrong reaction on Out Of Memory situation

John Roesler (JIRA) Fri, 07 Sep 2018 12:34:05 -0700


    [ 
https://issues.apache.org/jira/browse/KAFKA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16607574#comment-16607574
 ]


John Roesler commented on KAFKA-6777:
-------------------------------------

Hi [~habdank],

When Java processes are under extreme memory pressure, but not actually out of 
memory, it is expected to see GC take an increasing percentage of CPU. The GC 
interruptions will grow more frequent and also longer, although G1GC attempts 
to bound the pause length.

Note that the user-space code, such as Kafka, has effectively *no visibility* 
into when these collections occur or how long they take. From the application 
code's perspective, it's exactly like running on a slow CPU when you get into 
this state. This is why you can't expect Kafka, or any other JVM application, 
to detect this state for you.

When running Kafka, or any other JVM application, you will want to monitor GC 
activity, as you suggested. When it passes a threshold that you're comfortable 
with (you suggested 40% CPU time), you would set up some alert.

I don't think it would be a good idea to just bounce the process if GC is 
becoming an issue. The heavy GC is just an indication that you're trying to run 
the application with a heap that is too small for its workload. Better 
reactions would be to increase the heap size or decrease the workload per node.

Note that with JVM apps, you have to account not only for the memory 
requirements of the application itself, but also for the garbage that it 
generates. If the heap is too small for the app's own memory requirements, then 
you *will* get an OOME. If the heap is big enough for the app, but not big 
enough for the GC's data structures, then you'll just get heavy GC and *not* an 
OOME. Does this make sense?

Thanks,

-John

> Wrong reaction on Out Of Memory situation
> -----------------------------------------
>
>                 Key: KAFKA-6777
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6777
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.0
>            Reporter: Seweryn Habdank-Wojewodzki
>            Priority: Critical
>         Attachments: screenshot-1.png
>
>
> Dears,
> We already encountered many times problems related to Out Of Memory situation 
> in Kafka Broker and streaming clients.
> The scenario is the following.
> When Kafka Broker (or Streaming Client) is under load and has too less 
> memory, there are no errors in server logs. One can see some cryptic entries 
> in GC logs, but they are definitely not self-explaining.
> Kafka Broker (and Streaming Clients) works further. Later we see in JMX 
> monitoring, that JVM uses more and more time in GC. In our case it grows from 
> e.g. 1% to 80%-90% of CPU time is used by GC.
> Next, software collapses into zombie mode – process in not ending. In such a 
> case I would expect, that process is crashing (e.g. got SIG SEGV). Even worse 
> Kafka treats such a zombie process normal and somewhat sends messages, which 
> are in fact getting lost, also the cluster is not excluding broken nodes. The 
> question is how to configure Kafka to really terminate the JVM and not remain 
> in zombie mode, to give a chance to other nodes to know, that something is 
> dead.
> I would expect that in Out Of Memory situation JVM is ended if not graceful 
> than at least process is crashed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (KAFKA-6777) Wrong reaction on Out Of Memory situation

Reply via email to