[ 
https://issues.apache.org/jira/browse/KAFKA-6777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Seweryn Habdank-Wojewodzki updated KAFKA-6777:
----------------------------------------------
    Description: 
Dears,

We already encountered many times problems related to Out Of Memory situation 
in Kafka Broker and streaming clients.

The scenario is the following.

When Kafka Broker (or Streaming Client) is under load and has too less memory, 
there are no errors in server logs. One can see some cryptic entries in GC 
logs, but they are definitely not self-explaining.

Kafka Broker (and Streaming Clients) works further. Later we see in JMX 
monitoring, that JVM uses more and more time in GC. In our case it grows from 
e.g. 1% to 80%-90% of CPU time is used by GC.

Next, software collapses into zombie mode – process in not ending. In such a 
case I would expect, that process is crashing (e.g. got SIG SEGV). Even worse 
Kafka treats such a zombie process normal and somewhat sends messages, which 
are in fact getting lost, also the cluster is not excluding broken nodes. The 
question is how to configure Kafka to really terminate the JVM and not remain 
in zombie mode, to give a chance to other nodes to know, that something is dead.

I would expect that in Out Of Memory situation JVM is ended if not graceful 
than at least process is crashed.

  was:
Dears,

We already encountered many times problems related to Out Of Memory situation 
in Kafka Broker and streaming clients.

The scenario is the following.

When Kafka Broker (or Streaming Client) is under load and has too less memory, 
there are no errors in server logs. One can see some cryptic entries in GC 
logs, but they are definitely not self-explaining.

Kafka Broker (and Streaming Clients) works further. Later we see in JMX 
monitoring, that JVM uses more and more time in GC. In our case it grows from 
e.g. 1% to 80%-90% of CPU time is used by GC.

Next software collapses into zombie mode – process in not ending. In such a 
case I would expect, that process is crashing (e.g. got SIG SEGV). Even worse 
Kafka treats such a zombie process normal and somewhat sends messages, which 
are in fact getting lost, also the cluster is not excluding broken nodes. The 
question is how to configure Kafka to really terminate the JVM and not remain 
in zombie mode, to give a chance to other nodes to know, that something is dead.

I would expect that in Out Of Memory situation JVM is ended if not graceful 
than at least process is crashed.


> Wrong reaction on Out Of Memory situation
> -----------------------------------------
>
>                 Key: KAFKA-6777
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6777
>             Project: Kafka
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.0.0
>            Reporter: Seweryn Habdank-Wojewodzki
>            Priority: Critical
>
> Dears,
> We already encountered many times problems related to Out Of Memory situation 
> in Kafka Broker and streaming clients.
> The scenario is the following.
> When Kafka Broker (or Streaming Client) is under load and has too less 
> memory, there are no errors in server logs. One can see some cryptic entries 
> in GC logs, but they are definitely not self-explaining.
> Kafka Broker (and Streaming Clients) works further. Later we see in JMX 
> monitoring, that JVM uses more and more time in GC. In our case it grows from 
> e.g. 1% to 80%-90% of CPU time is used by GC.
> Next, software collapses into zombie mode – process in not ending. In such a 
> case I would expect, that process is crashing (e.g. got SIG SEGV). Even worse 
> Kafka treats such a zombie process normal and somewhat sends messages, which 
> are in fact getting lost, also the cluster is not excluding broken nodes. The 
> question is how to configure Kafka to really terminate the JVM and not remain 
> in zombie mode, to give a chance to other nodes to know, that something is 
> dead.
> I would expect that in Out Of Memory situation JVM is ended if not graceful 
> than at least process is crashed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to