[
https://issues.apache.org/jira/browse/KAFKA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13729666#comment-13729666
]
Jun Rao commented on KAFKA-989:
-------------------------------
Hmm, in the shutdown logic of consumer connector, we set zkclient to null the
last. So, all fetchers and the leader finder thread should have been stopped
when zkclient is null.
> Race condition shutting down high-level consumer results in spinning
> background thread
> --------------------------------------------------------------------------------------
>
> Key: KAFKA-989
> URL: https://issues.apache.org/jira/browse/KAFKA-989
> Project: Kafka
> Issue Type: Bug
> Affects Versions: 0.8
> Environment: Ubuntu Linux x64
> Reporter: Phil Hargett
> Attachments: KAFKA-989-failed-to-find-leader.patch,
> KAFKA-989-failed-to-find-leader-patch2.patch
>
>
> Running an application that uses the Kafka client under load, can often hit
> this issue within a few hours.
> High-level consumers come and go over this application's lifecycle, but there
> are a variety of defenses that ensure each high-level consumer lasts several
> seconds before being shutdown. Nevertheless, some race is causing this
> background thread to continue long after the ZKClient it is using has been
> disconnected. Since the thread was spawned by a consumer that has already
> been shutdown, the application has no way to find this thread and stop it.
> Reported on the users-kafka mailing list 6/25 as "0.8 throwing exception
> 'Failed to find leader' and high-level consumer fails to make progress".
> The only remedy is to shutdown the application and restart it. Externally
> detecting that this state has occurred is not pleasant: need to grep log for
> repeated occurrences of the same exception.
> Stack trace:
> Failed to find leader for Set([topic6,0]): java.lang.NullPointerException
> at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
> at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
> at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
> at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
> at kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
> at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75)
> at
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63)
> at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira