[
https://issues.apache.org/jira/browse/KAFKA-2096?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15874848#comment-15874848
]
Faisal commented on KAFKA-2096:
-------------------------------
Does this solution also resolve following error in spark streaming direct mode
connecting to Kafka?
*Too many open files, java.net.SocketException*
After running 5-10 days with 10 seconds interval , my spark streaming get this
error on driver node that i only see in driver log file.
Kafka version: 0.8.2.0
Spark streaming: 1.5.0-cdh5.5.6
> Enable keepalive socket option for broker to prevent socket leak
> ----------------------------------------------------------------
>
> Key: KAFKA-2096
> URL: https://issues.apache.org/jira/browse/KAFKA-2096
> Project: Kafka
> Issue Type: Improvement
> Components: network
> Affects Versions: 0.8.2.1
> Reporter: Allen Wang
> Assignee: Allen Wang
> Priority: Critical
> Fix For: 0.9.0.0
>
> Attachments: patch.diff
>
>
> We run a Kafka 0.8.2.1 cluster in AWS with large number of producers (>
> 10000). Also the number of producer instances scale up and down significantly
> on a daily basis.
> The issue we found is that after 10 days, the open file descriptor count will
> approach the limit of 32K. An investigation of these open file descriptors
> shows that a significant portion of these are from client instances that are
> terminated during scaling down. Somehow they still show as "ESTABLISHED" in
> netstat. We suspect that the AWS firewall between the client and broker
> causes this issue.
> We attempted to use "keepalive" socket option to reduce this socket leak on
> broker and it appears to be working. Specifically, we added this line to
> kafka.network.Acceptor.accept():
> socketChannel.socket().setKeepAlive(true)
> It is confirmed during our experiment of this change that entries in netstat
> where the client instance is terminated were probed as configured in
> operating system. After configured number of probes, the OS determined that
> the peer is no longer alive and the entry is removed, possibly after an error
> in Kafka to read from the channel and closing the channel. Also, our
> experiment shows that after a few days, the instance was able to keep a
> stable low point of open file descriptor count, compared with other instances
> where the low point keeps increasing day to day.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)