hachikuji commented on a change in pull request #11225:
URL: https://github.com/apache/kafka/pull/11225#discussion_r694260532



##########
File path: core/src/main/scala/kafka/server/ReplicaManager.scala
##########
@@ -2207,15 +2198,28 @@ class ReplicaManager(val config: KafkaConfig,
                     InitialFetchState(leaderEndPoint, 
partition.getLeaderEpoch, fetchOffset))
                 } else {
                   stateChangeLogger.info(
-                    s"Skipped the become-follower state change after marking 
its partition as " +
+                    "Skipped the become-follower state change after marking 
its partition as " +
                     s"follower for partition $tp with id ${info.topicId} and 
partition state $state."
                   )
                 }
             }
           }
           changedPartitions.add(partition)
         } catch {
-          case e: Throwable => stateChangeLogger.error(s"Unable to start 
fetching ${tp} " +
+          case e: KafkaStorageException =>
+            // If there is an offline log directory, a Partition object may 
have been created by
+            // `getOrCreatePartition()` before `createLogIfNotExists()` failed 
to create local replica due
+            // to KafkaStorageException. In this case 
`ReplicaManager.allPartitions` will map this topic-partition
+            // to an empty Partition object. We need to map this 
topic-partition to OfflinePartition instead.
+            markPartitionOffline(tp)

Review comment:
       Our test coverage seems a bit lacking. How much effort would it be to 
try and cover all these cases where the partition gets marked offline? As far 
as I can tell, there are no tests today which hit any verify the partition gets 
marked offline in any of these cases.

##########
File path: core/src/main/scala/kafka/server/ReplicaManager.scala
##########
@@ -2207,15 +2198,28 @@ class ReplicaManager(val config: KafkaConfig,
                     InitialFetchState(leaderEndPoint, 
partition.getLeaderEpoch, fetchOffset))
                 } else {
                   stateChangeLogger.info(
-                    s"Skipped the become-follower state change after marking 
its partition as " +
+                    "Skipped the become-follower state change after marking 
its partition as " +
                     s"follower for partition $tp with id ${info.topicId} and 
partition state $state."
                   )
                 }
             }
           }
           changedPartitions.add(partition)
         } catch {
-          case e: Throwable => stateChangeLogger.error(s"Unable to start 
fetching ${tp} " +
+          case e: KafkaStorageException =>
+            // If there is an offline log directory, a Partition object may 
have been created by
+            // `getOrCreatePartition()` before `createLogIfNotExists()` failed 
to create local replica due
+            // to KafkaStorageException. In this case 
`ReplicaManager.allPartitions` will map this topic-partition
+            // to an empty Partition object. We need to map this 
topic-partition to OfflinePartition instead.
+            markPartitionOffline(tp)
+            stateChangeLogger.error(s"Unable to start fetching $tp " +
+              s"with topic ID ${info.topicId} due to a storage error 
${e.getMessage}", e)
+            replicaFetcherManager.addFailedPartition(tp)
+            error(s"Error while making broker the follower for partition $tp 
in dir " +

Review comment:
       Do you think this log message adds anything beyond what is in the state 
change message above?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to