Robin Tweedie created KAFKA-6199:
------------------------------------

             Summary: Single broker with fast growing heap usage
                 Key: KAFKA-6199
                 URL: https://issues.apache.org/jira/browse/KAFKA-6199
             Project: Kafka
          Issue Type: Bug
    Affects Versions: 0.10.2.1
         Environment: Amazon Linux
            Reporter: Robin Tweedie
         Attachments: Screen Shot 2017-11-10 at 1.55.33 PM.png, Screen Shot 
2017-11-10 at 11.59.06 AM.png

We have a single broker in our cluster of 25 with fast growing heap usage which 
necessitates us restarting it every 12 hours. If we don't restart the broker, 
it becomes very slow from long GC pauses and eventually has {{OutOfMemory}} 
errors.

Here's a graph of heap usage percentage. A "normal" broker in the same cluster 
stays below 50% (averaged) over the same time period.

!Screen Shot 2017-11-10 at 11.59.06 AM.png|thumbnail!

We have taken heap dumps when the broker's heap usage is getting dangerously 
high, and there are a lot of retained {{NetworkSend}} objects referencing byte 
buffers.

We also noticed that the single affected broker logs a lot more of this kind of 
warning than any other broker:
{noformat}
WARN Attempting to send response via channel for which there is no open 
connection, connection id 13 (kafka.network.Processor)
{noformat}

Here are counts of that WARN message visualized across all the brokers (to show 
it happens a bit on other brokers, but not nearly as much as it does on the 
broker):
!Screen Shot 2017-11-10 at 1.55.33 PM.png|thumbnail!

I can't make the heap dumps public, but would appreciate advice on how to pin 
down the problem better. We're currently trying to narrow it down to a 
particular client, but without much success so far.

Let me know what else I could investigate or share to track down the source of 
this leak.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to