[ 
https://issues.apache.org/jira/browse/KAFKA-989?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phil Hargett updated KAFKA-989:
-------------------------------

    Description: 
Running an application that uses the Kafka client under load, can often hit 
this issue within a few hours.

High-level consumers come and go over this application's lifecycle, but there 
are a variety of defenses that ensure each high-level consumer lasts several 
seconds before being shutdown.  Nevertheless, some race is causing this 
background thread to continue long after the ZKClient it is using has been 
disconnected.  Since the thread was spawned by a consumer that has already been 
shutdown, the application has no way to find this thread and stop it.

Reported on the users-kafka mailing list 6/25 as "0.8 throwing exception 
'Failed to find leader' and high-level consumer fails to make progress". 

The only remedy is to shutdown the application and restart it.  Externally 
detecting that this state has occurred is not pleasant: need to grep log for 
repeated occurrences of the same exception.

Stack trace:

Failed to find leader for Set([topic6,0]): java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
        at kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
        at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75)
        at 
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

  was:
Running an application that uses the Kafka client under load, can often hit 
this issue within a few hours.

High-level consumers come and go over this application's lifecycle, but there 
are a variety of defenses that ensure each high-level consumer lasts several 
seconds before being shutdown.  Nevertheless, some race is causing this 
background thread to continue long after the ZKClient it is using has been 
disconnected.  Since the thread was spawned by a consumer that has already been 
shutdown, the application has no way to find this thread and stop it.

Reported on the users-kafka mailing list 6/25 as "0.8 throwing exception 
"Failed to find leader" and high-level consumer fails to make progress".  
Thread here: 

The only remedy is to shutdown the application and restart it.  Externally 
detecting that this state has occurred is not pleasant: need to grep log for 
repeated occurrences of the same exception.

Stack trace:

Failed to find leader for Set([topic6,0]): java.lang.NullPointerException
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
        at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
        at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
        at kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
        at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75)
        at 
kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63)
        at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

    
> Race condition shutting down high-level consumer results in spinning 
> background thread
> --------------------------------------------------------------------------------------
>
>                 Key: KAFKA-989
>                 URL: https://issues.apache.org/jira/browse/KAFKA-989
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8
>         Environment: Ubuntu Linux x64
>            Reporter: Phil Hargett
>
> Running an application that uses the Kafka client under load, can often hit 
> this issue within a few hours.
> High-level consumers come and go over this application's lifecycle, but there 
> are a variety of defenses that ensure each high-level consumer lasts several 
> seconds before being shutdown.  Nevertheless, some race is causing this 
> background thread to continue long after the ZKClient it is using has been 
> disconnected.  Since the thread was spawned by a consumer that has already 
> been shutdown, the application has no way to find this thread and stop it.
> Reported on the users-kafka mailing list 6/25 as "0.8 throwing exception 
> 'Failed to find leader' and high-level consumer fails to make progress". 
> The only remedy is to shutdown the application and restart it.  Externally 
> detecting that this state has occurred is not pleasant: need to grep log for 
> repeated occurrences of the same exception.
> Stack trace:
> Failed to find leader for Set([topic6,0]): java.lang.NullPointerException
>       at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:416)
>       at org.I0Itec.zkclient.ZkClient$2.call(ZkClient.java:413)
>       at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:675)
>       at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:413)
>       at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:409)
>       at kafka.utils.ZkUtils$.getChildrenParentMayNotExist(ZkUtils.scala:438)
>       at kafka.utils.ZkUtils$.getAllBrokersInCluster(ZkUtils.scala:75)
>       at 
> kafka.consumer.ConsumerFetcherManager$LeaderFinderThread.doWork(ConsumerFetcherManager.scala:63)
>       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to