hachikuji commented on a change in pull request #11225: URL: https://github.com/apache/kafka/pull/11225#discussion_r694260532
########## File path: core/src/main/scala/kafka/server/ReplicaManager.scala ########## @@ -2207,15 +2198,28 @@ class ReplicaManager(val config: KafkaConfig, InitialFetchState(leaderEndPoint, partition.getLeaderEpoch, fetchOffset)) } else { stateChangeLogger.info( - s"Skipped the become-follower state change after marking its partition as " + + "Skipped the become-follower state change after marking its partition as " + s"follower for partition $tp with id ${info.topicId} and partition state $state." ) } } } changedPartitions.add(partition) } catch { - case e: Throwable => stateChangeLogger.error(s"Unable to start fetching ${tp} " + + case e: KafkaStorageException => + // If there is an offline log directory, a Partition object may have been created by + // `getOrCreatePartition()` before `createLogIfNotExists()` failed to create local replica due + // to KafkaStorageException. In this case `ReplicaManager.allPartitions` will map this topic-partition + // to an empty Partition object. We need to map this topic-partition to OfflinePartition instead. + markPartitionOffline(tp) Review comment: Our test coverage seems a bit lacking. How much effort would it be to try and cover all these cases where the partition gets marked offline? As far as I can tell, there are no tests today which hit any verify the partition gets marked offline in any of these cases. ########## File path: core/src/main/scala/kafka/server/ReplicaManager.scala ########## @@ -2207,15 +2198,28 @@ class ReplicaManager(val config: KafkaConfig, InitialFetchState(leaderEndPoint, partition.getLeaderEpoch, fetchOffset)) } else { stateChangeLogger.info( - s"Skipped the become-follower state change after marking its partition as " + + "Skipped the become-follower state change after marking its partition as " + s"follower for partition $tp with id ${info.topicId} and partition state $state." ) } } } changedPartitions.add(partition) } catch { - case e: Throwable => stateChangeLogger.error(s"Unable to start fetching ${tp} " + + case e: KafkaStorageException => + // If there is an offline log directory, a Partition object may have been created by + // `getOrCreatePartition()` before `createLogIfNotExists()` failed to create local replica due + // to KafkaStorageException. In this case `ReplicaManager.allPartitions` will map this topic-partition + // to an empty Partition object. We need to map this topic-partition to OfflinePartition instead. + markPartitionOffline(tp) + stateChangeLogger.error(s"Unable to start fetching $tp " + + s"with topic ID ${info.topicId} due to a storage error ${e.getMessage}", e) + replicaFetcherManager.addFailedPartition(tp) + error(s"Error while making broker the follower for partition $tp in dir " + Review comment: Do you think this log message adds anything beyond what is in the state change message above? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org