jsancio commented on a change in pull request #9631:
URL: https://github.com/apache/kafka/pull/9631#discussion_r529105990



##########
File path: core/src/main/scala/kafka/cluster/Partition.scala
##########
@@ -947,9 +947,10 @@ class Partition(val topicPartition: TopicPartition,
                                   leaderEndOffset: Long,
                                   currentTimeMs: Long,
                                   maxLagMs: Long): Boolean = {
-    val followerReplica = getReplicaOrException(replicaId)
-    followerReplica.logEndOffset != leaderEndOffset &&
-      (currentTimeMs - followerReplica.lastCaughtUpTimeMs) > maxLagMs
+    getReplica(replicaId).fold(true) { followerReplica =>

Review comment:
       Thanks for the review!
   
   > This might be ok, but is unnecessary work since the controller will be 
doing that soon.
   
   According to some users and the report from KAFKA-9672, it looks like under 
some conditions the controller is writing to ZK that it removed the replica 
from the assignment but not from the ISR. I am unable to reproduce this or 
convince myself from the code on how this can happen.
   
   I was thinking of defensively letting the leader also remove the replica 
from the ISR so that Kafka can recover from this case. If the leader is not 
allowed to do this then `ack=all` produce messages will continue to fail.
   
   What do you think @junrao?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to