Kamal Chandraprakash created KAFKA-19523: --------------------------------------------
Summary: Gracefully handle error while building remoteLogAuxState Key: KAFKA-19523 URL: https://issues.apache.org/jira/browse/KAFKA-19523 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Improve the error handling while building the remote-log-auxiliary state when a follower node with an empty disk begin to synchronise with the leader. If the topic has remote storage enabled, then the ReplicaFetcherThread attempt to build the remote-log-auxiliary state. Note that the remote-log-auxiliary state gets invoked only when the leader-log-start-offset is non-zero and leader-log-start-offset is not equal to leader-local-log-start-offset. When the LeaderAndISR request is received, then the ReplicaManager#becomeLeaderOrFollower invokes 'makeFollowers' initially, followed by the RemoteLogManager#onLeadershipChange call. As a result, when ReplicaFetcherThread initiates the RemoteLogManager#fetchRemoteLogSegmentMetadata, the partition may not have been initialized at that time. Introducing a new RetriableRemoteStorageException requires a KIP as it is a public API change, so wrap the IllegalStateException in RemoteStorageException to gracefully handle the error. stacktrace: {code} [2025-07-19 19:15:47,915] ERROR [ReplicaFetcher replicaId=3, leaderId=2, fetcherId=0] Error building remote log auxiliary state for orange-0 (kafka.server.ReplicaFetcherThread) java.lang.IllegalStateException: This instance is in invalid state, initialized: false close: false at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.ensureInitializedAndNotClosed(TopicBasedRemoteLogMetadataManager.java:569) ~[kafka-storage-4.2.0-SNAPSHOT.jar:?] at org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.remoteLogSegmentMetadata(TopicBasedRemoteLogMetadataManager.java:221) ~[kafka-storage-4.2.0-SNAPSHOT.jar:?] at org.apache.kafka.server.log.remote.storage.RemoteLogManager.fetchRemoteLogSegmentMetadata(RemoteLogManager.java:606) ~[kafka-storage-4.2.0-SNAPSHOT.jar:?] at kafka.server.TierStateMachine.buildRemoteLogAuxState(TierStateMachine.java:233) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.TierStateMachine.start(TierStateMachine.java:114) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.AbstractFetcherThread.handleOffsetsMovedToTieredStorage(AbstractFetcherThread.scala:785) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$7(AbstractFetcherThread.scala:434) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at scala.Option.foreach(Option.scala:437) ~[scala-library-2.13.16.jar:?] at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6(AbstractFetcherThread.scala:342) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.AbstractFetcherThread.$anonfun$processFetchRequest$6$adapted(AbstractFetcherThread.scala:341) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry(JavaCollectionWrappers.scala:430) ~[scala-library-2.13.16.jar:?] at scala.collection.convert.JavaCollectionWrappers$JMapWrapperLike.foreachEntry$(JavaCollectionWrappers.scala:426) ~[scala-library-2.13.16.jar:?] at scala.collection.convert.JavaCollectionWrappers$AbstractJMapWrapper.foreachEntry(JavaCollectionWrappers.scala:344) ~[scala-library-2.13.16.jar:?] at kafka.server.AbstractFetcherThread.processFetchRequest(AbstractFetcherThread.scala:341) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.AbstractFetcherThread.$anonfun$maybeFetch$3(AbstractFetcherThread.scala:137) ~[kafka_2.13-4.2.0-SNAPSHOT.jar:?] at java.base/java.util.Optional.ifPresent(Optional.java:178) [?:?] at kafka.server.AbstractFetcherThread.maybeFetch(AbstractFetcherThread.scala:136) [kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.AbstractFetcherThread.doWork(AbstractFetcherThread.scala:117) [kafka_2.13-4.2.0-SNAPSHOT.jar:?] at kafka.server.ReplicaFetcherThread.doWork(ReplicaFetcherThread.scala:96) [kafka_2.13-4.2.0-SNAPSHOT.jar:?] at org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:136) [kafka-server-common-4.2.0-SNAPSHOT.jar:?] {code} The exception gets thrown repeatedly until the RemoteLogMetadataManager#isReady(topicIdPartition) becomes true. This is a retriable error so we have to handle it gracefully. -- This message was sent by Atlassian Jira (v8.20.10#820010)