[ 
https://issues.apache.org/jira/browse/KAFKA-1317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13942299#comment-13942299
 ] 

Jay Kreps commented on KAFKA-1317:
----------------------------------

Actually non-daemon is not wrong. There are two cases. There are a few things 
which are pure "nice to have" background activities that can be killed at any 
time. But actually most of our threads may be doing I/O or other things so they 
go through an explicit shutdown rather than just terminating in the middle of 
whatever they are doing. For these we actually don't want them to be daemon 
threads, we just need to call shutdown. If we don't call shutdown the hang is a 
good indication that there is a problem--making them daemon would just mask the 
problem.

> KafkaServer 0.8.1 not responding to .shutdown() cleanly, possibly related to 
> TopicDeletionManager or MetricsMeter state
> -----------------------------------------------------------------------------------------------------------------------
>
>                 Key: KAFKA-1317
>                 URL: https://issues.apache.org/jira/browse/KAFKA-1317
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.8.1
>            Reporter: Brent Bradbury
>            Assignee: Timothy Chen
>            Priority: Blocker
>              Labels: newbie
>             Fix For: 0.8.1.1
>
>         Attachments: threaddump.txt
>
>
> When I run an in-process instance of KafkaServer, send a message through it, 
> then call shutdown(), some threads never exit and the process hangs until the 
> process is killed manually. The same scenario does not result in a hang on 
> 0.8.0. The hang happens when calling both shutdown() by itself as well as 
> shutdown() and awaitShutdown() together. I have seen similar behavior 
> shutting down a deployed kafka server as well, but haven't had time to 
> diagnose whether or not it is the same symptom.
> I suspect either the metrics-meter-tick-thread-1 & 2 or delete-topics-thread
>  (waiting in 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$awaitTopicDeletionNotification(TopicDeletionManager.scala:178)
>  is to blame. Since the TopicDeletionManager is new, it seems more suspicious 
> to me. A complete thread dump is attached; the suspect threads are below.
> "delete-topics-thread" prio=5 tid=0x00007fb3e31d2800 nid=0x6b03 waiting on 
> condition [0x000000013c3b3000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000012e6e6920> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>       at 
> kafka.controller.TopicDeletionManager.kafka$controller$TopicDeletionManager$$awaitTopicDeletionNotification(TopicDeletionManager.scala:178)
>       at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply$mcV$sp(TopicDeletionManager.scala:334)
>       at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:333)
>       at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread$$anonfun$doWork$1.apply(TopicDeletionManager.scala:333)
>       at kafka.utils.Utils$.inLock(Utils.scala:538)
>       at 
> kafka.controller.TopicDeletionManager$DeleteTopicsThread.doWork(TopicDeletionManager.scala:333)
>       at kafka.utils.ShutdownableThread.run(ShutdownableThread.scala:51)
>    Locked ownable synchronizers:
>       - None
> "metrics-meter-tick-thread-2" daemon prio=5 tid=0x00007fb3e31c1000 nid=0x5f03 
> runnable [0x000000013ab8f000]
>    java.lang.Thread.State: TIMED_WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000012e7d05d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at 
> java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:226)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:2082)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1090)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>       at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:724)
>    Locked ownable synchronizers:
>       - None
> "metrics-meter-tick-thread-1" daemon prio=5 tid=0x00007fb3e31ef800 nid=0x5e03 
> waiting on condition [0x000000013a98c000]
>    java.lang.Thread.State: WAITING (parking)
>       at sun.misc.Unsafe.park(Native Method)
>       - parking to wait for  <0x000000012e7d05d8> (a 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
>       at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
>       at 
> java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:1085)
>       at 
> java.util.concurrent.ScheduledThreadPoolExecutor$DelayedWorkQueue.take(ScheduledThreadPoolExecutor.java:807)
>       at 
> java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
>       at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
>       at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>       at java.lang.Thread.run(Thread.java:724)
>    Locked ownable synchronizers:
>       - None



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to