[ 
https://issues.apache.org/jira/browse/KAFKA-5298?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16019285#comment-16019285
 ] 

Rajini Sivaram commented on KAFKA-5298:
---------------------------------------

This has been fixed under KAFKA-5086. Expired topics are now added back to the 
metadata list while in the metadata wait loop.

> MirrorMaker deadlocks with missing topics
> -----------------------------------------
>
>                 Key: KAFKA-5298
>                 URL: https://issues.apache.org/jira/browse/KAFKA-5298
>             Project: Kafka
>          Issue Type: Bug
>          Components: clients, tools
>    Affects Versions: 0.10.2.1
>            Reporter: Raymond Conn
>
> When mirrorMaker mirrors a topic to destination brokers that have topic auto 
> create disabled and a topic doesn't exist on the destination brokers, the 
> producer in mirror maker logs the following 
> {code}
> Error while fetching metadata with correlation id 467 : 
> \{mirror-test2=UNKNOWN_TOPIC_OR_PARTITION\}
> Error while fetching metadata with correlation id 468 : 
> {mirror-test2=UNKNOWN_TOPIC_OR_PARTITION}
> {code}
> This log message is fine and expected. The problem is the log message stops 
> ~5 min later. At which point the logs look fine, but mirror maker is not 
> mirroring any of its topics. 
> What's worse is mirrorMaker is basically in an unrecoverable state once this 
> happens (the log statement stops). If you create the topic at the destination 
> mirrorMaker still won't mirror data until a restart. Attempts to restart 
> mirrorMaker (cleanly) fail because the process is more or less deadlocked in 
> its shutdown hook.
> Here is the reasoning:
> * MirrorMaker becomes unrecoverable after 5 minutes because of this loop in 
> the 
> [producer|https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L543-L561]
> * The producer will keep waiting for metadata for the missing topic or until 
> the max timeout is reached. (max long in this case) 
> * after 5 minutes the producer stops making a metadata request for the topic 
> because that topic expires 
> [here|https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/Metadata.java#L218]
>  
> * topic is never re-added for metadata requests since the only add is before 
> entering the loop 
> [here|(https://github.com/apache/kafka/blob/e06cd3e55f25a0bb414e0770493906ea8019420a/clients/src/main/java/org/apache/kafka/clients/producer/KafkaProducer.java#L528]
> So basically after 5 minutes all metadata requests moving forward are for no 
> topics since the topic expired. The mirrorMaker thread essentially gets stuck 
> waiting forever since there will never be a metadata request for the topic 
> the thread is waiting on
> All of this basically leads to a deadlock state in the shutdown hook. 
> * shutdown hook sends a shutdown to the mirrorMaker threads 
> * waits for threads to exit their loop by waitind on a 
> [latch|https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/tools/MirrorMaker.scala#L396]
> * latch is never counted down in 
> [produce|https://github.com/apache/kafka/blob/0.10.2/core/src/main/scala/kafka/tools/MirrorMaker.scala#L434]
>  
> * thread will never exit the loop to countdown the latch on line 462.
> This can be seen with a thread dump of the shutdown hook thread
> {code}
> Name: MirrorMakerShutdownHook
> State: WAITING on java.util.concurrent.CountDownLatch$Sync@3ffebeac
> Total blocked: 0  Total waited: 1
> Stack trace: 
> sun.misc.Unsafe.park(Native Method)
> java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.doAcquireSharedInterruptibly(AbstractQueuedSynchronizer.java:997)
> java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireSharedInterruptibly(AbstractQueuedSynchronizer.java:1304)
> java.util.concurrent.CountDownLatch.await(CountDownLatch.java:231)
> kafka.tools.MirrorMaker$MirrorMakerThread.awaitShutdown(MirrorMaker.scala:498)
> kafka.tools.MirrorMaker$$anonfun$cleanShutdown$4.apply(MirrorMaker.scala:396)
> kafka.tools.MirrorMaker$$anonfun$cleanShutdown$4.apply(MirrorMaker.scala:396)
> scala.collection.Iterator$class.foreach(Iterator.scala:893)
> {code}
> The root of the issue more or less related to the issue documented here where 
> the producer can block waiting for metadata. 
> https://issues.apache.org/jira/browse/KAFKA-3450



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

Reply via email to