anshul35 opened a new pull request, #17491:
URL: https://github.com/apache/kafka/pull/17491

   Issue Details: Inside TopicBasedRemoteLogMetadataManager::close, one 
thread(t1) is calling join on initializationThread thread after taking 
writeLock on "lock" object => t1 will wait for initializationThread to 
complete. Internally initializationThread is also using writeLock on "lock" 
object. This can cause deadlock in below situation
   
   1. initializationThread is started
   2. close has been invoked as part of a separate thread. But this thread is 
not yet scheduled by OS.
   3. At line 430, initializationThread is preempted and OS has started running 
close thread. close takes writeLock and invoked join on initializationThread.
   4. Now OS schedules initializationThread again and at line 433 this thread 
also tries to take writeLock. But since writeLock is already held by close 
threaf => both are waiting on each other to complete. initializationThread will 
wait on close to unlock the writeLock, while close thread will wait for 
completion of initializationThread
   
   Inside TopicBasedRemoteLogMetadataManager, initializationThread instance 
variable is protected by ReentrantReadWriteLock lock object -> lock But, this 
same thread is also using writeLock above while execution.
   
   => causing deadlock situation.
   
   Fix Details: If we see the access patterns of initializationThread instance 
variable, we can convert it into AtomicReference to control read/write to it 
atomically. Using this we can avoid taking writeLock before calling join on 
initializationThread. Once initializationThread is done then close thread will 
awake and can take the writeLock only after that.
   
   *More detailed description of your change,
   if necessary. The PR title and PR message become
   the squashed commit message, so use a separate
   comment to ping reviewers.*
   
   *Summary of testing strategy (including rationale)
   for the feature or bug fix. Unit and/or integration
   tests are expected for any behaviour change and
   system tests should be considered for larger changes.*
   
   ### Committer Checklist (excluded from commit message)
   - [ ] Verify design and implementation 
   - [ ] Verify test coverage and CI build status
   - [ ] Verify documentation (including upgrade notes)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to