anshul35 opened a new pull request, #17491: URL: https://github.com/apache/kafka/pull/17491
Issue Details: Inside TopicBasedRemoteLogMetadataManager::close, one thread(t1) is calling join on initializationThread thread after taking writeLock on "lock" object => t1 will wait for initializationThread to complete. Internally initializationThread is also using writeLock on "lock" object. This can cause deadlock in below situation 1. initializationThread is started 2. close has been invoked as part of a separate thread. But this thread is not yet scheduled by OS. 3. At line 430, initializationThread is preempted and OS has started running close thread. close takes writeLock and invoked join on initializationThread. 4. Now OS schedules initializationThread again and at line 433 this thread also tries to take writeLock. But since writeLock is already held by close threaf => both are waiting on each other to complete. initializationThread will wait on close to unlock the writeLock, while close thread will wait for completion of initializationThread Inside TopicBasedRemoteLogMetadataManager, initializationThread instance variable is protected by ReentrantReadWriteLock lock object -> lock But, this same thread is also using writeLock above while execution. => causing deadlock situation. Fix Details: If we see the access patterns of initializationThread instance variable, we can convert it into AtomicReference to control read/write to it atomically. Using this we can avoid taking writeLock before calling join on initializationThread. Once initializationThread is done then close thread will awake and can take the writeLock only after that. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org