guozhangwang commented on a change in pull request #9441:
URL: https://github.com/apache/kafka/pull/9441#discussion_r624219744



##########
File path: core/src/main/scala/kafka/server/KafkaApis.scala
##########
@@ -279,30 +279,33 @@ class KafkaApis(val requestChannel: RequestChannel,
         new 
StopReplicaResponseData().setErrorCode(Errors.STALE_BROKER_EPOCH.code)))
     } else {
       val partitionStates = stopReplicaRequest.partitionStates().asScala
-      val (result, error) = replicaManager.stopReplicas(
-        request.context.correlationId,
-        stopReplicaRequest.controllerId,
-        stopReplicaRequest.controllerEpoch,
-        stopReplicaRequest.brokerEpoch,
-        partitionStates)
-      // Clear the coordinator caches in case we were the leader. In the case 
of a reassignment, we
-      // cannot rely on the LeaderAndIsr API for this since it is only sent to 
active replicas.
-      result.forKeyValue { (topicPartition, error) =>
-        if (error == Errors.NONE) {
-          if (topicPartition.topic == GROUP_METADATA_TOPIC_NAME
-              && partitionStates(topicPartition).deletePartition) {
-            groupCoordinator.onResignation(topicPartition.partition)
-          } else if (topicPartition.topic == TRANSACTION_STATE_TOPIC_NAME
-                     && partitionStates(topicPartition).deletePartition) {
+      def onStopReplicas(error: Errors, partitions: Map[TopicPartition, 
Errors]): Unit = {
+        // Clear the coordinator caches in case we were the leader. In the 
case of a reassignment, we
+        // cannot rely on the LeaderAndIsr API for this since it is only sent 
to active replicas.
+        partitions.forKeyValue { (topicPartition, partitionError) =>
+          if (partitionError == Errors.NONE) {
             val partitionState = partitionStates(topicPartition)
             val leaderEpoch = if (partitionState.leaderEpoch >= 0)
-                Some(partitionState.leaderEpoch)
+              Some(partitionState.leaderEpoch)
             else
               None
-            txnCoordinator.onResignation(topicPartition.partition, 
coordinatorEpoch = leaderEpoch)
+            if (topicPartition.topic == GROUP_METADATA_TOPIC_NAME
+              && partitionState.deletePartition) {
+              groupCoordinator.onResignation(topicPartition.partition, 
leaderEpoch)
+            } else if (topicPartition.topic == TRANSACTION_STATE_TOPIC_NAME
+              && partitionState.deletePartition) {
+              txnCoordinator.onResignation(topicPartition.partition, 
coordinatorEpoch = leaderEpoch)
+            }
           }
         }
       }
+      val (result, error) = replicaManager.stopReplicas(
+        request.context.correlationId,
+        stopReplicaRequest.controllerId,
+        stopReplicaRequest.controllerEpoch,
+        stopReplicaRequest.brokerEpoch,
+        partitionStates,
+        onStopReplicas)

Review comment:
       @hachikuji I think this is related to an earlier comment: 
https://github.com/apache/kafka/pull/9441#issuecomment-744788568 The concern is 
that `onResignation` is called from two places: `handleStopReplicaRequest` and 
`handleLeaderAndIsrRequest`. The latter is protected by 
`replicaStateChangeLock` but the former is not, and hence race conditions may 
still happen.
   
   The current approach seems to be going to a slight different way: instead of 
always trying to synchronize under `replicaStateChangeLock`, we just compare 
and swapping the `epochForPartitionId`, is that right?
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to