anshul35 opened a new pull request, #17492: URL: https://github.com/apache/kafka/pull/17492
Issue Details: Inside TopicBasedRemoteLogMetadataManager::close, one thread(t1) is calling join on initializationThread thread after taking writeLock on "lock" object => t1 will wait for initializationThread to complete. Internally initializationThread is also using writeLock on "lock" object. This can cause deadlock in below situation 1. initializationThread is started 2. close has been invoked as part of a separate thread. But this thread is not yet scheduled by OS. 3. At line 430, initializationThread is preempted and OS has started running close thread. close takes writeLock and invoked join on initializationThread. 4. Now OS schedules initializationThread again and at line 433 this thread also tries to take writeLock. But since writeLock is already held by close thread => both are waiting on each other to complete. initializationThread will wait on close to release the writeLock, while close thread will wait for completion of initializationThread Fix Details: Ideally before even close starts its processing, it should do so if either initialization has not yet started or it has completed. Similarly, initialization thread should not start any processing if another thread has invoked invoke. This can be achieved by using writeLock() before even starting the close or initialization. One should happen before another. Case 1 : close() is invoked after initialization thread is complete. In this case, we can close all the resources and done with the close() method invocation. Case 2 : close() is invoked while initialization thread is running. In this case, thread invoking close() method will wait to get the writeLock i.e. until initialization thread is complete. Case 3 : close() is invoked before initializationThread starts. In this case, we will set closing to true and done with the close() method invocation. When initialization starts, it will acquire the writeLock and after that it will read closing instance variable. Based on that it won't enter while loop and simply exit. *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
