Re: Kafka broker crash - broker id then changed
Possibly tmp got cleaned up? Seems like one of the log files where deleted while a producer was writing messages to it: On Thursday, 26 May 2016, cs user wrote: > Hi All, > > We are running Kafka version 0.9.0.1, at the time the brokers crashed > yesterday we were running in a 2 mode cluster. This has now been increased > to 3. > > We are not specifying a broker id and relying on kafka generating one. > > After the brokers crashed (at exactly the same time) we left kafka stopped > for a while. After kafka was started back up, the broker id's on both > servers were incremented, they were 1001/1002 and they flipped to > 1003/1004. This seemed to cause some problems as partitions were assigned > to broker id's which it believed had disappeared and so were not > recoverable. > > We noticed that the broker id's are actually stored in: > > /tmp/kafka-logs/meta.properties > > So we set these back to what they were and restarted. Is there a reason why > these would change? > > Below are the error logs from each server: > > Server 1 > > [2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002] > Removed fetcher for partitions [Topic1Heartbeat,1] > (kafka.server.ReplicaFetcherManager) > [2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 with > log end offset 0 (kafka.log.Log) > [2016-05-25 09:05:52,831] INFO Created log for partition > [Topic1Heartbeat,1] in /tmp/kafka-logs with properties {compression.type -> > producer, file.delete.delay.ms -> 6, max.message.bytes -> 112, > min.insync.replicas -> 1, segment. > jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, > index.interval.bytes -> 4096, unclean.leader.election.enable -> true, > retention.bytes -> -1, delete.retention.ms -> 8640, cleanup.policy -> > delete, flush.ms -> 9 > 223372036854775807, segment.ms -> 60480, segment.bytes -> 1073741824, > retention.ms -> 60480, segment.index.bytes -> 10485760, flush.messages > -> 9223372036854775807}. (kafka.log.LogManager) > [2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker > 1002: No checkpointed highwatermark is found for partition > [Topic1Heartbeat,1] (kafka.cluster.Partition) > [2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to > restabilize group Topic1 with old generation 0 > (kafka.coordinator.GroupCoordinator) > [2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group > Topic1 generation 1 (kafka.coordinator.GroupCoordinator) > [2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment received > from leader for group Topic1 for generation 1 > (kafka.coordinator.GroupCoordinator) > [2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: Halting > due to unrecoverable I/O error while handling produce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-0' > at kafka.log.Log.append(Log.scala:318) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > at > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > at > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > at > > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > at > > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at > > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at scala.Option.foreach(Option.scala:257) > at > > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) > at > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) > at kafka.server.KafkaApis.handle(KafkaApis.scala:76) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /tmp/kafka-logs/__consumer_offsets-0/
Kafka broker crash - broker id then changed
Hi All, We are running Kafka version 0.9.0.1, at the time the brokers crashed yesterday we were running in a 2 mode cluster. This has now been increased to 3. We are not specifying a broker id and relying on kafka generating one. After the brokers crashed (at exactly the same time) we left kafka stopped for a while. After kafka was started back up, the broker id's on both servers were incremented, they were 1001/1002 and they flipped to 1003/1004. This seemed to cause some problems as partitions were assigned to broker id's which it believed had disappeared and so were not recoverable. We noticed that the broker id's are actually stored in: /tmp/kafka-logs/meta.properties So we set these back to what they were and restarted. Is there a reason why these would change? Below are the error logs from each server: Server 1 [2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002] Removed fetcher for partitions [Topic1Heartbeat,1] (kafka.server.ReplicaFetcherManager) [2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 with log end offset 0 (kafka.log.Log) [2016-05-25 09:05:52,831] INFO Created log for partition [Topic1Heartbeat,1] in /tmp/kafka-logs with properties {compression.type -> producer, file.delete.delay.ms -> 6, max.message.bytes -> 112, min.insync.replicas -> 1, segment. jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> true, retention.bytes -> -1, delete.retention.ms -> 8640, cleanup.policy -> delete, flush.ms -> 9 223372036854775807, segment.ms -> 60480, segment.bytes -> 1073741824, retention.ms -> 60480, segment.index.bytes -> 10485760, flush.messages -> 9223372036854775807}. (kafka.log.LogManager) [2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker 1002: No checkpointed highwatermark is found for partition [Topic1Heartbeat,1] (kafka.cluster.Partition) [2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to restabilize group Topic1 with old generation 0 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group Topic1 generation 1 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment received from leader for group Topic1 for generation 1 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-0' at kafka.log.Log.append(Log.scala:318) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at scala.Option.foreach(Option.scala:257) at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) at kafka.server.KafkaApis.handle(KafkaApis.scala:76) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-0/.index (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.(RandomAccessFile.java:243) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) at kafka.log.OffsetIndex$$anonfun$resize