[ https://issues.apache.org/jira/browse/KAFKA-13855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17527797#comment-17527797 ]
Haruki Okada commented on KAFKA-13855: -------------------------------------- I guess that's same cause as https://issues.apache.org/jira/browse/KAFKA-13403 > FileNotFoundException: Error while rolling log segment for topic partition in > dir > --------------------------------------------------------------------------------- > > Key: KAFKA-13855 > URL: https://issues.apache.org/jira/browse/KAFKA-13855 > Project: Kafka > Issue Type: Bug > Components: log > Affects Versions: 2.6.1 > Reporter: Sergey Ivanov > Priority: Major > > Hello, > We faced an issue when one of Kafka broker in cluster has failed with an > exception and restarted: > > {code:java} > [2022-04-13T09:51:44,563][ERROR][category=kafka.server.LogDirFailureChannel] > Error while rolling log segment for prod_data_topic-7 in dir > /var/opt/kafka/data/1 > java.io.FileNotFoundException: > /var/opt/kafka/data/1/prod_data_topic-7/00000000000026872377.index (No such > file or directory) > at java.base/java.io.RandomAccessFile.open0(Native Method) > at java.base/java.io.RandomAccessFile.open(Unknown Source) > at java.base/java.io.RandomAccessFile.<init>(Unknown Source) > at java.base/java.io.RandomAccessFile.<init>(Unknown Source) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:183) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:176) > at > kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:242) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:242) > at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:508) > at kafka.log.Log.$anonfun$roll$8(Log.scala:1916) > at kafka.log.Log.$anonfun$roll$2(Log.scala:1916) > at kafka.log.Log.roll(Log.scala:2349) > at kafka.log.Log.maybeRoll(Log.scala:1865) > at kafka.log.Log.$anonfun$append$2(Log.scala:1169) > at kafka.log.Log.append(Log.scala:2349) > at kafka.log.Log.appendAsLeader(Log.scala:1019) > at > kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:984) > at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:972) > at > kafka.server.ReplicaManager.$anonfun$appendToLocalLog$4(ReplicaManager.scala:883) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:273) > at > scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149) > at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237) > at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:149) > at scala.collection.TraversableLike.map(TraversableLike.scala:273) > at scala.collection.TraversableLike.map$(TraversableLike.scala:266) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:871) > at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:571) > at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:605) > at kafka.server.KafkaApis.handle(KafkaApis.scala:132) > at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:70) > at java.base/java.lang.Thread.run(Unknown Source) > [2022-04-13T09:51:44,812][ERROR][category=kafka.log.LogManager] Shutdown > broker because all log dirs in /var/opt/kafka/data/1 have failed {code} > There are no any additional useful information in logs, just one warn before > this error: > {code:java} > [2022-04-13T09:51:44,720][WARN][category=kafka.server.ReplicaManager] > [ReplicaManager broker=1] Broker 1 stopped fetcher for partitions > __consumer_offsets-22,prod_data_topic-5,__consumer_offsets-30, > .... > prod_data_topic-0 and stopped moving logs for partitions because they are in > the failed log directory /var/opt/kafka/data/1. > [2022-04-13T09:51:44,720][WARN][category=kafka.log.LogManager] Stopping > serving logs in dir /var/opt/kafka/data/1{code} > The topic configuration is: > {code:java} > /opt/kafka $ ./bin/kafka-topics.sh --bootstrap-server localhost:9092 > --describe --topic prod_data_topic > Topic: prod_data_topic PartitionCount: 12 ReplicationFactor: 3 > Configs: > min.insync.replicas=2,segment.bytes=1073741824,max.message.bytes=15728640,retention.bytes=4294967296 > Topic: prod_data_topic Partition: 0 Leader: 3 > Replicas: 3,1,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 1 Leader: 1 > Replicas: 1,2,3 Isr: 3,2,1 > Topic: prod_data_topic Partition: 2 Leader: 2 > Replicas: 2,3,1 Isr: 3,2,1 > Topic: prod_data_topic Partition: 3 Leader: 3 > Replicas: 3,2,1 Isr: 3,2,1 > Topic: prod_data_topic Partition: 4 Leader: 1 > Replicas: 1,3,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 5 Leader: 2 > Replicas: 2,1,3 Isr: 3,2,1 > Topic: prod_data_topic Partition: 6 Leader: 3 > Replicas: 3,2,1 Isr: 3,2,1 > Topic: prod_data_topic Partition: 7 Leader: 1 > Replicas: 1,3,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 8 Leader: 2 > Replicas: 2,1,3 Isr: 3,2,1 > Topic: prod_data_topic Partition: 9 Leader: 3 > Replicas: 3,1,2 Isr: 3,2,1 > Topic: prod_data_topic Partition: 10 Leader: 1 > Replicas: 1,2,3 Isr: 3,2,1 > Topic: prod_data_topic Partition: 11 Leader: 2 > Replicas: 2,3,1 Isr: 3,2,1 {code} > Previously (a day before it happened) we have set "rettention.bytes" broker > config to: 5368709120 (previously the values was 6442450944). But not sure it > affected. Current custom broker config: > > {code:java} > log.retention.check.interval.ms=300000 > log.segment.bytes=1073741824 > log.retention.bytes=4294967296 > log.retention.hours=40 > message.max.bytes=15728640 > replica.lag.time.max.ms=30000 > min.insync.replicas=2 > delete.topic.enable=true > replica.fetch.max.bytes=15728640 > default.replication.factor=3 > num.replica.fetchers=2 > {code} > > Could you please help to investigate what could be a reason of this fail? > Because we don't have any ideas (there were no cleaning topics, files or > other maintenance procedure with disk). -- This message was sent by Atlassian Jira (v8.20.7#820007)