[ 
https://issues.apache.org/jira/browse/KAFKA-5512?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismael Juma updated KAFKA-5512:
-------------------------------
    Labels: performance  (was: )

> KafkaConsumer: High memory allocation rate when idle
> ----------------------------------------------------
>
>                 Key: KAFKA-5512
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5512
>             Project: Kafka
>          Issue Type: Bug
>          Components: consumer
>    Affects Versions: 0.10.1.1
>            Reporter: Stephane Roset
>              Labels: performance
>             Fix For: 0.11.0.1
>
>
> Hi,
> We noticed in our application that the memory allocation rate increased 
> significantly when we have no Kafka messages to consume. We isolated the 
> issue by using a JVM that simply runs 128 Kafka consumers. These consumers 
> consume 128 partitions (so each consumer consumes one partition). The 
> partitions are empty and no message has been sent during the test. The 
> consumers were configured with default values (session.timeout.ms=30000, 
> fetch.max.wait.ms=500, receive.buffer.bytes=65536, 
> heartbeat.interval.ms=3000, max.poll.interval.ms=300000, 
> max.poll.records=500). The Kafka cluster was made of 3 brokers. Within this 
> context, the allocation rate was about 55 MiB/s. This high allocation rate 
> generates a lot of GC activity (to garbage the young heap) and was an issue 
> for our project.
> We profiled the JVM with JProfiler. We noticed that there were a huge 
> quantity of ArrayList$Itr in memory. These collections were mainly 
> instantiated by the methods handleCompletedReceives, handleCompletedSends, 
> handleConnecions and handleDisconnections of the class NetWorkClient. We also 
> noticed that we had a lot of calls to the method pollOnce of the class 
> KafkaConsumer. 
> So we decided to run only one consumer and to profile the calls to the method 
> pollOnce. We noticed that regularly a huge number of calls is made to this 
> method, up to 268000 calls within 100ms. The pollOnce method calls the 
> NetworkClient.handle* methods. These methods iterate on collections (even if 
> they are empty), so that explains the huge number of iterators in memory.
> The large number of calls is related to the heartbeat mechanism. The pollOnce 
> method calculates the poll timeout; if a heartbeat needs to be done, the 
> timeout will be set to 0. The problem is that the heartbeat thread checks 
> every 100 ms (default value of retry.backoff.ms) if a heartbeat should be 
> sent, so the KafkaConsumer will call the poll method in a loop without 
> timeout until the heartbeat thread awakes. For example: the heartbeat thread 
> just started to wait and will awake in 99ms. So during 99ms, the 
> KafkaConsumer will call in a loop the pollOnce method and will use a timeout 
> of 0. That explains how we can have 268000 calls within 100ms. 
> The heartbeat thread calls the method AbstractCoordinator.wait() to sleep, so 
> I think the Kafka consumer should awake the heartbeat thread with a notify 
> when needed.
> We made two quick fixes to solve this issue:
>   - In NetworkClient.handle*(), we don't iterate on collections if they are 
> empty (to avoid unnecessary iterators instantiations).
>   - In KafkaConsumer.pollOnce(), if the poll timeout is equal to 0 we notify 
> the heartbeat thread to awake it (dirty fix because we don't handle the 
> autocommit case).
> With these 2 quick fixes and 128 consumers, the allocation rate drops down 
> from 55 MiB/s to 4 MiB/s.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to