Allen Wang created KAFKA-2096:
---------------------------------
Summary: Enable keepalive socket option for broker
Key: KAFKA-2096
URL: https://issues.apache.org/jira/browse/KAFKA-2096
Project: Kafka
Issue Type: Improvement
Components: network
Affects Versions: 0.8.2.1
Reporter: Allen Wang
Assignee: Jun Rao
Priority: Critical
We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (> 10000).
Also the number of producer instances scale up and down significantly on a
daily basis.
The issue we found is that after 10 days, the open file descriptor count will
approach the limit of 32K. An investigation of these open file descriptors
shows that a significant portion of these are from client instances that are
terminated during scaling down. Somehow they still show as "ESTABLISHED" in
netstat. We suspect that the AWS firewall between the client and broker causes
this issue.
We attempted to use "keepalive" socket option to reduce this socket leak on
broker and it appears to be working. Specifically, we added this line to
kafka.network.Acceptor.accept():
socketChannel.socket().setKeepAlive(true)
It is confirmed during our experiment of this change that entries in netstat
where the client instance is terminated were probed as configured in operating
system. After configured number of probes, the OS determined that the peer is
no longer alive and the entry is removed, possibly after an error in Kafka to
read from the channel and closing the channel. Also, our experiment shows that
after a few days, the instance was able to keep a stable low point of open file
descriptor count, compared with other instances where the low point keeps
increasing day to day.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)