[ https://issues.apache.org/jira/browse/KAFKA-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019285#comment-16019285 ]
Rajini Sivaram commented on KAFKA-5298: --------------------------------------- This has been fixed under KAFKA-5086. Expired topics are now added back to the metadata list while in the metadata wait loop. > MirrorMaker deadlocks with missing topics > ----------------------------------------- > > Key: KAFKA-5298 > URL: https://issues.apache.org/jira/browse/KAFKA-5298 > Project: Kafka > Issue Type: Bug > Components: clients, tools > Affects Versions: 0.10.2.1 > Reporter: Raymond Conn > > When mirrorMaker mirrors a topic to destination brokers that have topic auto > create disabled and a topic doesn't exist on the destination brokers, the > producer in mirror maker logs the following > {code} > Error while fetching metadata with correlation id 467 : > \{mirror-test2=UNKNOWN_TOPIC_OR_PARTITION\} > Error while fetching metadata with correlation id 468 : > {mirror-test2=UNKNOWN_TOPIC_OR_PARTITION} > {code} > This log message is fine and expected. The problem is the log message stops > ~5 min later. At which point the logs look fine, but mirror maker is not > mirroring any of its topics. > What's worse is mirrorMaker is basically in an unrecoverable state once this > happens (the log statement stops). If you create the topic at the destination > mirrorMaker still won't mirror data until a restart. Attempts to restart > mirrorMaker (cleanly) fail because the process is more or less deadlocked in > its shutdown hook. > Here is the reasoning: > * MirrorMaker becomes unrecoverable after 5 minutes because of this loop in > the > [producer|https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L543-L561] > * The producer will keep waiting for metadata for the missing topic or until > the max timeout is reached. (max long in this case) > * after 5 minutes the producer stops making a metadata request for the topic > because that topic expires > [here|https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L218] > > * topic is never re-added for metadata requests since the only add is before > entering the loop > [here|(https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L528] > So basically after 5 minutes all metadata requests moving forward are for no > topics since the topic expired. The mirrorMaker thread essentially gets stuck > waiting forever since there will never be a metadata request for the topic > the thread is waiting on > All of this basically leads to a deadlock state in the shutdown hook. > * shutdown hook sends a shutdown to the mirrorMaker threads > * waits for threads to exit their loop by waitind on a > [latch|https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/tools/MirrorMaker.scala#L396] > * latch is never counted down in > [produce|https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/tools/MirrorMaker.scala#L434] > > * thread will never exit the loop to countdown the latch on line 462. > This can be seen with a thread dump of the shutdown hook thread > {code} > Name: MirrorMakerShutdownHook > State: WAITING on java.util.concurrent.CountDownLatch$Sync@3ffebeac > Total blocked: 0 Total waited: 1 > Stack trace: > sun.misc.Unsafe.park(Native Method) > java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) > java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836) > java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997) > java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304) > java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231) > kafka.tools.MirrorMaker$MirrorMakerThread.awaitShutdown(MirrorMaker.scala:498) > kafka.tools.MirrorMaker$$anonfun$cleanShutdown$4.apply(MirrorMaker.scala:396) > kafka.tools.MirrorMaker$$anonfun$cleanShutdown$4.apply(MirrorMaker.scala:396) > scala.collection.Iterator$class.foreach(Iterator.scala:893) > {code} > The root of the issue more or less related to the issue documented here where > the producer can block waiting for metadata. > https://issues.apache.org/jira/browse/KAFKA-3450 -- This message was sent by Atlassian JIRA (v6.3.15#6346)