chia7712 commented on pull request #6915: URL: https://github.com/apache/kafka/pull/6915#issuecomment-626119452
@junrao thanks for the great explanation. It is indeed a long story of lock improvement in kafka :) > After it appends to the local log, it may call ReplicaManager.tryCompleteDelayedProduce(), just double check. the code ```ReplicaManager.tryCompleteDelayedProduce``` is nonexistent in trunk branch and the replacement is ```Partition#tryCompleteDelayedRequests```, right? (https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/cluster/Partition.scala#L850) ```scala /** * Try to complete any pending requests. This should be called without holding the leaderIsrUpdateLock. */ private def tryCompleteDelayedRequests(): Unit = delayedOperations.checkAndCompleteAll() ``` > which may need to hold a different group lock (since the key of the operation is a topic partition on which many groups can reside), which can cause a deadlock. It seems we should introduce a check that you have to release all group lock before completing topic partition level of delayed produce. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org