David Arthur created KAFKA-17766:
------------------------------------
Summary: TopicBasedRemoteLogMetadataManager stuck in close
Key: KAFKA-17766
URL: https://issues.apache.org/jira/browse/KAFKA-17766
Project: Kafka
Issue Type: Bug
Reporter: David Arthur
Attachments: GradleWorkerMain-7952.txt
During a CI run, there was a timed out build due to this class stuck in its
close method.
{code:java}
"Test worker" #1 prio=5 os_prio=0 cpu=9155.23ms elapsed=9615.57s
tid=0x00007fcc80029800 nid=0x1f12 in Object.wait() [0x00007fcc853f9000]
java.lang.Thread.State: WAITING (on object monitor)
at java.lang.Object.wait([email protected]/Native Method)
- waiting on <no object reference available>
at java.lang.Thread.join([email protected]/Thread.java:1300)
- waiting to re-lock in wait() <0x000000008189e9f8> (a
org.apache.kafka.common.utils.KafkaThread)
at java.lang.Thread.join([email protected]/Thread.java:1375)
at
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.close(TopicBasedRemoteLogMetadataManager.java:575)
{code}
{code:java}
"RLMMInitializationThread" #9511 prio=5 os_prio=0 cpu=1.40ms elapsed=9222.98s
tid=0x00007fcc8196f800 nid=0x12ef2 waiting on condition [0x00007fcbe05fe000]
java.lang.Thread.State: WAITING (parking)
at jdk.internal.misc.Unsafe.park([email protected]/Native Method)
- parking to wait for <0x0000000081e364c0> (a
java.util.concurrent.locks.ReentrantReadWriteLock$NonfairSync)
at
java.util.concurrent.locks.LockSupport.park([email protected]/LockSupport.java:194)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt([email protected]/AbstractQueuedSynchronizer.java:885)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued([email protected]/AbstractQueuedSynchronizer.java:917)
at
java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire([email protected]/AbstractQueuedSynchronizer.java:1240)
at
java.util.concurrent.locks.ReentrantReadWriteLock$WriteLock.lock([email protected]/ReentrantReadWriteLock.java:959)
at
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:432)
at
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager$$Lambda$2007/0x0000000100934c40.run(Unknown
Source)
at java.lang.Thread.run([email protected]/Thread.java:829) {code}
It seems we are joining the initialization thread assuming that it has (or
will) complete. This appears to be a lock race between the close method and the
initialization thread which results in a dead lock.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)