junrao commented on pull request #8657:
URL: https://github.com/apache/kafka/pull/8657#issuecomment-681361030


   The following is my understand. The current PR introduces a new deadlock 
through the following path.
   
   path 1
   hold group lock -> joinPurgatory.tryCompleteElseWatch(delayedJoin) -> 
watchForOperation (now delayedJoin visible through other threads) -> 
operation.maybeTryComplete() -> hold delayedJoin.lock
   
   path 2
   delayedJoin.maybeTryComplete -> hold hold delayedJoin.lock -> tryComplete() 
-> hold group lock
   
   The existing code doesn't have this deadlock since (1) delayedJoin.lock is 
the same as the group lock held in the caller and (2) a delayed join operation 
is registered under the group key (so each time we check completeness for a 
group key, only one delayed join operation will be affected). By switching back 
to this code, we avoid the new deadlock.
   
   The existing code has a different deadlock issue that 
groupManager.storeGroup() in GroupCoordinator.onCompleteJoin may need to 
complete to other delayed operations and potentially hold a different group 
lock while already holding a group lock. This issue will be resolved if we 
complete those delayed operations due to groupManager.storeGroup() elsewhere 
without holding any locks.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to