[
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Andrew Jorgensen updated KAFKA-3980:
------------------------------------
Description:
I have some nodes in a kafka cluster that occasionally will run out of memory
whenever I restart the producers. I was able to take a heap dump from both a
recently restarted Kafka node which weighed in at about 20 MB and a node that
has been running for 2 months is using over 700MB of memory. Looking at the
heap dump it looks like the JmxReporter is holding on to metrics and causing
them to build up over time.
!http://imgur.com/N6Cd0Ku.png!
!http://imgur.com/kQBqA2j.png!
The ultimate problem this causes is that there is a chance when I restart the
producers it will cause the node to experience an Java heap space exception and
OOM. The nodes then fail to startup correctly and write a -1 as the leader
number to the partitions they were responsible for effectively resetting the
offset and rendering that partition unavailable. The kafka process then needs
to go be restarted in order to re-assign the node to the partition that it owns.
I have a few questions:
1. I am not quite sure why there are so many client id entries in that
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of
time or a way to turn certain high cardinality metrics like these off?
I can provide any logs or heap dumps if more information is needed.
was:
I have some nodes in a kafka cluster that occasionally will run out of memory
whenever I restart the producers. I was able to take a heap dump from both a
recently restarted Kafka node which weighed in at about 20 MB and a node that
has been running for 2 months is using over 700MB of memory. Looking at the
heap dump it looks like the JmxReporter is holding on to metrics and causing
them to build up over time.
!http://imgur.com/N6Cd0Ku.png!
!http://imgur.com/kQBqA2j.png!
The ultimate problem this causes is that there is a chance when I restart the
producers it will cause the node to experience an Java heap space exception and
OOM. The nodes then fail to startup correctly and write a -1 as the leader
number to the partitions they were responsible for effectively reseting the
offset and rendering that partition unavailable. The kafka process then needs
to go be restarted in order to re-assign the node to the partition that it owns.
I have a few questions:
1. I am not quite sure why there are so many client id entries in that
JmxReporter map.
2. Is there a way to have the JmxReporter release metrics after a set amount of
time or a way to turn certain high cardinality metrics like these off?
I can provide any logs or heap dumps if more information is needed.
> JmxReporter uses excessive memory causing OutOfMemoryException
> --------------------------------------------------------------
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.9.0.1
> Reporter: Andrew Jorgensen
>
> I have some nodes in a kafka cluster that occasionally will run out of memory
> whenever I restart the producers. I was able to take a heap dump from both a
> recently restarted Kafka node which weighed in at about 20 MB and a node that
> has been running for 2 months is using over 700MB of memory. Looking at the
> heap dump it looks like the JmxReporter is holding on to metrics and causing
> them to build up over time.
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the
> producers it will cause the node to experience an Java heap space exception
> and OOM. The nodes then fail to startup correctly and write a -1 as the
> leader number to the partitions they were responsible for effectively
> resetting the offset and rendering that partition unavailable. The kafka
> process then needs to go be restarted in order to re-assign the node to the
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)