junrao commented on a change in pull request #8657: URL: https://github.com/apache/kafka/pull/8657#discussion_r483966128
########## File path: core/src/test/scala/unit/kafka/coordinator/AbstractCoordinatorConcurrencyTest.scala ########## @@ -201,8 +201,8 @@ object AbstractCoordinatorConcurrencyTest { } } val producerRequestKeys = entriesPerPartition.keys.map(TopicPartitionOperationKey(_)).toSeq - watchKeys ++= producerRequestKeys producePurgatory.tryCompleteElseWatch(delayedProduce, producerRequestKeys) + watchKeys ++= producerRequestKeys Review comment: @chia7712 : Thanks for the explanation. I agree that it's a potential problem. Does using `tryLock` in `tryCompleteElseWatch()` lead us back to the previous issue that we could miss the opportunity to to complete an operation (fixed with KAFKA-6653)? Another possibly is that we hold the lock in delayed operation while adding the operation to watch list and do the final `safeTryComplete()` check. This way, when the delayed operation is exposed to another thread, every thread, including the caller, always first acquires the lock in delayed operation. This should avoid all potential deadlocks between `tryCompleteElseWatch()` and `checkAndComplete()`. What do you think? ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org