Re: Kafka broker crash - broker id then changed
Coming back to this issue, looks like it was a result of the centos 7 systemd cleanup task on tmp: /usr/lib/tmpfiles.d/tmp.conf # This file is part of systemd. # # systemd is free software; you can redistribute it and/or modify it # under the terms of the GNU Lesser General Public License as published by # the Free Software Foundation; either version 2.1 of the License, or # (at your option) any later version. # See tmpfiles.d(5) for details # Clear tmp directories separately, to make them easier to override v /tmp 1777 root root 10d v /var/tmp 1777 root root 30d # Exclude namespace mountpoints created with PrivateTmp=yes x /tmp/systemd-private-%b-* X /tmp/systemd-private-%b-*/tmp x /var/tmp/systemd-private-%b-* X /var/tmp/systemd-private-%b-*/tmp Cheers! On Thu, May 26, 2016 at 9:27 AM, cs userwrote: > Hi Ben, > > Thanks for responding. I can't imagine what would have cleaned temp up at > that time. I don't think we have anything in place to do that, it also > appears to happened to both machines at the same time. > > It also appears that the other topics were not affected, there were still > other files present in temp. > > Thanks! > > On Thu, May 26, 2016 at 9:19 AM, Ben Davison > wrote: > >> Possibly tmp got cleaned up? >> >> Seems like one of the log files where deleted while a producer was writing >> messages to it: >> >> On Thursday, 26 May 2016, cs user wrote: >> >> > Hi All, >> > >> > We are running Kafka version 0.9.0.1, at the time the brokers crashed >> > yesterday we were running in a 2 mode cluster. This has now been >> increased >> > to 3. >> > >> > We are not specifying a broker id and relying on kafka generating one. >> > >> > After the brokers crashed (at exactly the same time) we left kafka >> stopped >> > for a while. After kafka was started back up, the broker id's on both >> > servers were incremented, they were 1001/1002 and they flipped to >> > 1003/1004. This seemed to cause some problems as partitions were >> assigned >> > to broker id's which it believed had disappeared and so were not >> > recoverable. >> > >> > We noticed that the broker id's are actually stored in: >> > >> > /tmp/kafka-logs/meta.properties >> > >> > So we set these back to what they were and restarted. Is there a reason >> why >> > these would change? >> > >> > Below are the error logs from each server: >> > >> > Server 1 >> > >> > [2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002] >> > Removed fetcher for partitions [Topic1Heartbeat,1] >> > (kafka.server.ReplicaFetcherManager) >> > [2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 >> with >> > log end offset 0 (kafka.log.Log) >> > [2016-05-25 09:05:52,831] INFO Created log for partition >> > [Topic1Heartbeat,1] in /tmp/kafka-logs with properties >> {compression.type -> >> > producer, file.delete.delay.ms -> 6, max.message.bytes -> 112, >> > min.insync.replicas -> 1, segment. >> > jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, >> > index.interval.bytes -> 4096, unclean.leader.election.enable -> true, >> > retention.bytes -> -1, delete.retention.ms -> 8640, cleanup.policy >> -> >> > delete, flush.ms -> 9 >> > 223372036854775807, segment.ms -> 60480, segment.bytes -> >> 1073741824, >> > retention.ms -> 60480, segment.index.bytes -> 10485760, >> flush.messages >> > -> 9223372036854775807}. (kafka.log.LogManager) >> > [2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker >> > 1002: No checkpointed highwatermark is found for partition >> > [Topic1Heartbeat,1] (kafka.cluster.Partition) >> > [2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to >> > restabilize group Topic1 with old generation 0 >> > (kafka.coordinator.GroupCoordinator) >> > [2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group >> > Topic1 generation 1 (kafka.coordinator.GroupCoordinator) >> > [2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment >> received >> > from leader for group Topic1 for generation 1 >> > (kafka.coordinator.GroupCoordinator) >> > [2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: >> Halting >> > due to unrecoverable I/O error while handling produce request: >> > (kafka.server.ReplicaManager) >> > kafka.common.KafkaStorageException: I/O exception in append to log >> > '__consumer_offsets-0' >> > at kafka.log.Log.append(Log.scala:318) >> > at kafka.cluster.Partition$$anonfun$9.apply(Partition. >> scala:442) >> > at kafka.cluster.Partition$$anonfun$9.apply(Partition. >> scala:428) >> > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) >> > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) >> > at >> > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) >> > at >> > >> >
Re: Kafka broker crash
/tmp is not a good location for storing files. It will get cleaned up periodically, depending on your linux distribution. Radu On 22 June 2016 at 19:33, Misra, Rahul <rahul.mi...@altisource.com> wrote: > Hi Madhukar, > > Thanks for your quick response. The path is "/tmp/kafka-logs/". But the > servers have not been restarted any time lately. The uptime for all the 3 > servers is almost 67 days. > > Regards, > Rahul Misra > > > -Original Message- > From: Madhukar Bharti [mailto:bhartimadhu...@gmail.com] > Sent: Wednesday, June 22, 2016 8:37 PM > To: users@kafka.apache.org > Subject: Re: Kafka broker crash > > Hi Rahul, > > Whether the path is "/tmp/kafka-logs/" or "/temp/kafka-logs" ? > > Mostly if path is set to "/tmp/" then in case machine restart it may > delete the files. So it is throwing FileNotFoundException. > you can change the file location to some other path and restart all broker. > This might fix the issue. > > Regrads, > Madhukar > > On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <rahul.mi...@altisource.com> > wrote: > > > Hi, > > > > I'm facing a strange issue in my Kafka cluster. Could anybody please > > help me with it. The issue is as follows: > > > > We have a 3 node kafka cluster. We installed the zookeeper separately > > and have pointed the brokers to it. The zookeeper is also 3 node, but > > for our POC setup, the zookeeper nodes are on the same machines as the > > Kafka brokers. > > > > While receiving messages from an existing topic using a new groupId, 2 > > of the brokers crashed with same FATAL errors: > > > > > > <<<<<<<<<<<<< [server 2 logs] >>>>>>>>>>>>>>> > > > > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group > > pocTestNew11 generation 1 (kafka.coordinator.Gro > > upCoordinator) > > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment > > received from leader for group pocTestNew11 for genera tion 1 > > (kafka.coordinator.GroupCoordinator) > > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting > > due to unrecoverable I/O error while handling p roduce request: > > (kafka.server.ReplicaManager) > > kafka.common.KafkaStorageException: I/O exception in append to log > > '__consumer_offsets-4' > > at kafka.log.Log.append(Log.scala:318) > > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > > at > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > > at > > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > > at > > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > > at > > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > > at > > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > > at > > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > > at > scala.collection.AbstractTraversable.map(Traversable.scala:104) > > at > > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > > at > > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > > at > > > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > > at > > > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > > at > > > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > > at scala.Option.foreach(Option.scala:257) > > at > > > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) > > at > > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) > > at kafka.server.KafkaApis.handle(KafkaApis.scala:76) > > at > > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > > at java.lang.Thread.run(Thr
RE: Kafka broker crash
Hi Madhukar, Thanks for your quick response. The path is "/tmp/kafka-logs/". But the servers have not been restarted any time lately. The uptime for all the 3 servers is almost 67 days. Regards, Rahul Misra -Original Message- From: Madhukar Bharti [mailto:bhartimadhu...@gmail.com] Sent: Wednesday, June 22, 2016 8:37 PM To: users@kafka.apache.org Subject: Re: Kafka broker crash Hi Rahul, Whether the path is "/tmp/kafka-logs/" or "/temp/kafka-logs" ? Mostly if path is set to "/tmp/" then in case machine restart it may delete the files. So it is throwing FileNotFoundException. you can change the file location to some other path and restart all broker. This might fix the issue. Regrads, Madhukar On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahul <rahul.mi...@altisource.com> wrote: > Hi, > > I'm facing a strange issue in my Kafka cluster. Could anybody please > help me with it. The issue is as follows: > > We have a 3 node kafka cluster. We installed the zookeeper separately > and have pointed the brokers to it. The zookeeper is also 3 node, but > for our POC setup, the zookeeper nodes are on the same machines as the > Kafka brokers. > > While receiving messages from an existing topic using a new groupId, 2 > of the brokers crashed with same FATAL errors: > > > <<<<<<<<<<<<< [server 2 logs] >>>>>>>>>>>>>>> > > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group > pocTestNew11 generation 1 (kafka.coordinator.Gro > upCoordinator) > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment > received from leader for group pocTestNew11 for genera tion 1 > (kafka.coordinator.GroupCoordinator) > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting > due to unrecoverable I/O error while handling p roduce request: > (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-4' > at kafka.log.Log.append(Log.scala:318) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > at > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at scala.Option.foreach(Option.scala:257) > at > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) > at > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) > at kafka.server.KafkaApis.handle(KafkaApis.scala:76) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /tmp/kafka-logs/__consumer_offsets-4/.index (No > such file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.(RandomAccessFile.java:243) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) > at > kafka.log.OffsetIn
Re: Kafka broker crash
Hi Rahul, Whether the path is "/tmp/kafka-logs/" or "/temp/kafka-logs" ? Mostly if path is set to "/tmp/" then in case machine restart it may delete the files. So it is throwing FileNotFoundException. you can change the file location to some other path and restart all broker. This might fix the issue. Regrads, Madhukar On Wed, Jun 22, 2016 at 1:40 PM, Misra, Rahulwrote: > Hi, > > I'm facing a strange issue in my Kafka cluster. Could anybody please help > me with it. The issue is as follows: > > We have a 3 node kafka cluster. We installed the zookeeper separately and > have pointed the brokers to it. The zookeeper is also 3 node, but for our > POC setup, the zookeeper nodes are on the same machines as the Kafka > brokers. > > While receiving messages from an existing topic using a new groupId, 2 of > the brokers crashed with same FATAL errors: > > > < [server 2 logs] >>> > > [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group > pocTestNew11 generation 1 (kafka.coordinator.Gro > upCoordinator) > [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received > from leader for group pocTestNew11 for genera > tion 1 (kafka.coordinator.GroupCoordinator) > [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due > to unrecoverable I/O error while handling p > roduce request: (kafka.server.ReplicaManager) > kafka.common.KafkaStorageException: I/O exception in append to log > '__consumer_offsets-4' > at kafka.log.Log.append(Log.scala:318) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > at > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > at > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > at > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > at scala.collection.AbstractTraversable.map(Traversable.scala:104) > at > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > at > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > at > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at > kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) > at scala.Option.foreach(Option.scala:257) > at > kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) > at > kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) > at kafka.server.KafkaApis.handle(KafkaApis.scala:76) > at > kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) > at java.lang.Thread.run(Thread.java:745) > Caused by: java.io.FileNotFoundException: > /tmp/kafka-logs/__consumer_offsets-4/.index (No such > file or directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.(RandomAccessFile.java:243) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) > at > kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at > kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264) > at kafka.log.Log.roll(Log.scala:627) > at kafka.log.Log.maybeRoll(Log.scala:602) > at kafka.log.Log.append(Log.scala:357) > > -- > < [server 3 logs] >>> > > [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error > while replicating data. (kafka.server.ReplicaFe >
Kafka broker crash
Hi, I'm facing a strange issue in my Kafka cluster. Could anybody please help me with it. The issue is as follows: We have a 3 node kafka cluster. We installed the zookeeper separately and have pointed the brokers to it. The zookeeper is also 3 node, but for our POC setup, the zookeeper nodes are on the same machines as the Kafka brokers. While receiving messages from an existing topic using a new groupId, 2 of the brokers crashed with same FATAL errors: < [server 2 logs] >>> [2016-06-21 23:09:14,697] INFO [GroupCoordinator 1]: Stabilized group pocTestNew11 generation 1 (kafka.coordinator.Gro upCoordinator) [2016-06-21 23:09:15,006] INFO [GroupCoordinator 1]: Assignment received from leader for group pocTestNew11 for genera tion 1 (kafka.coordinator.GroupCoordinator) [2016-06-21 23:09:20,335] FATAL [Replica Manager on Broker 1]: Halting due to unrecoverable I/O error while handling p roduce request: (kafka.server.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-4' at kafka.log.Log.append(Log.scala:318) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at scala.Option.foreach(Option.scala:257) at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) at kafka.server.KafkaApis.handle(KafkaApis.scala:76) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-4/.index (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.(RandomAccessFile.java:243) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:276) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.log.OffsetIndex.resize(OffsetIndex.scala:276) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(OffsetIndex.scala:265) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) at kafka.log.OffsetIndex$$anonfun$trimToValidSize$1.apply(OffsetIndex.scala:265) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.log.OffsetIndex.trimToValidSize(OffsetIndex.scala:264) at kafka.log.Log.roll(Log.scala:627) at kafka.log.Log.maybeRoll(Log.scala:602) at kafka.log.Log.append(Log.scala:357) -- < [server 3 logs] >>> [2016-06-21 23:08:49,796] FATAL [ReplicaFetcherThread-0-0], Disk error while replicating data. (kafka.server.ReplicaFe tcherThread) kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-4' at kafka.log.Log.append(Log.scala:318) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:113) at kafka.server.ReplicaFetcherThread.processPartitionData(ReplicaFetcherThread.scala:42) at kafka.server.AbstractFetcherThread$$anonfun$processFetchRequest$2$$anonfun$apply$mcV$sp$1$$anonfun$apply$2. apply(AbstractFetcherThread.scala:138) at
Re: Kafka broker crash - broker id then changed
Hi Ben, Thanks for responding. I can't imagine what would have cleaned temp up at that time. I don't think we have anything in place to do that, it also appears to happened to both machines at the same time. It also appears that the other topics were not affected, there were still other files present in temp. Thanks! On Thu, May 26, 2016 at 9:19 AM, Ben Davisonwrote: > Possibly tmp got cleaned up? > > Seems like one of the log files where deleted while a producer was writing > messages to it: > > On Thursday, 26 May 2016, cs user wrote: > > > Hi All, > > > > We are running Kafka version 0.9.0.1, at the time the brokers crashed > > yesterday we were running in a 2 mode cluster. This has now been > increased > > to 3. > > > > We are not specifying a broker id and relying on kafka generating one. > > > > After the brokers crashed (at exactly the same time) we left kafka > stopped > > for a while. After kafka was started back up, the broker id's on both > > servers were incremented, they were 1001/1002 and they flipped to > > 1003/1004. This seemed to cause some problems as partitions were assigned > > to broker id's which it believed had disappeared and so were not > > recoverable. > > > > We noticed that the broker id's are actually stored in: > > > > /tmp/kafka-logs/meta.properties > > > > So we set these back to what they were and restarted. Is there a reason > why > > these would change? > > > > Below are the error logs from each server: > > > > Server 1 > > > > [2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002] > > Removed fetcher for partitions [Topic1Heartbeat,1] > > (kafka.server.ReplicaFetcherManager) > > [2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 > with > > log end offset 0 (kafka.log.Log) > > [2016-05-25 09:05:52,831] INFO Created log for partition > > [Topic1Heartbeat,1] in /tmp/kafka-logs with properties {compression.type > -> > > producer, file.delete.delay.ms -> 6, max.message.bytes -> 112, > > min.insync.replicas -> 1, segment. > > jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, > > index.interval.bytes -> 4096, unclean.leader.election.enable -> true, > > retention.bytes -> -1, delete.retention.ms -> 8640, cleanup.policy > -> > > delete, flush.ms -> 9 > > 223372036854775807, segment.ms -> 60480, segment.bytes -> > 1073741824, > > retention.ms -> 60480, segment.index.bytes -> 10485760, > flush.messages > > -> 9223372036854775807}. (kafka.log.LogManager) > > [2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker > > 1002: No checkpointed highwatermark is found for partition > > [Topic1Heartbeat,1] (kafka.cluster.Partition) > > [2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to > > restabilize group Topic1 with old generation 0 > > (kafka.coordinator.GroupCoordinator) > > [2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group > > Topic1 generation 1 (kafka.coordinator.GroupCoordinator) > > [2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment > received > > from leader for group Topic1 for generation 1 > > (kafka.coordinator.GroupCoordinator) > > [2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: Halting > > due to unrecoverable I/O error while handling produce request: > > (kafka.server.ReplicaManager) > > kafka.common.KafkaStorageException: I/O exception in append to log > > '__consumer_offsets-0' > > at kafka.log.Log.append(Log.scala:318) > > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) > > at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) > > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) > > at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) > > at > > kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) > > at > > > > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) > > at > > > > > kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) > > at > > > > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > > at > > > > > scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) > > at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) > > at > > scala.collection.TraversableLike$class.map(TraversableLike.scala:245) > > at > scala.collection.AbstractTraversable.map(Traversable.scala:104) > > at > > kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) > > at > > kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) > > at > > > > > kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) > > at > > > > >
Kafka broker crash - broker id then changed
Hi All, We are running Kafka version 0.9.0.1, at the time the brokers crashed yesterday we were running in a 2 mode cluster. This has now been increased to 3. We are not specifying a broker id and relying on kafka generating one. After the brokers crashed (at exactly the same time) we left kafka stopped for a while. After kafka was started back up, the broker id's on both servers were incremented, they were 1001/1002 and they flipped to 1003/1004. This seemed to cause some problems as partitions were assigned to broker id's which it believed had disappeared and so were not recoverable. We noticed that the broker id's are actually stored in: /tmp/kafka-logs/meta.properties So we set these back to what they were and restarted. Is there a reason why these would change? Below are the error logs from each server: Server 1 [2016-05-25 09:05:52,827] INFO [ReplicaFetcherManager on broker 1002] Removed fetcher for partitions [Topic1Heartbeat,1] (kafka.server.ReplicaFetcherManager) [2016-05-25 09:05:52,831] INFO Completed load of log Topic1Heartbeat-1 with log end offset 0 (kafka.log.Log) [2016-05-25 09:05:52,831] INFO Created log for partition [Topic1Heartbeat,1] in /tmp/kafka-logs with properties {compression.type -> producer, file.delete.delay.ms -> 6, max.message.bytes -> 112, min.insync.replicas -> 1, segment. jitter.ms -> 0, preallocate -> false, min.cleanable.dirty.ratio -> 0.5, index.interval.bytes -> 4096, unclean.leader.election.enable -> true, retention.bytes -> -1, delete.retention.ms -> 8640, cleanup.policy -> delete, flush.ms -> 9 223372036854775807, segment.ms -> 60480, segment.bytes -> 1073741824, retention.ms -> 60480, segment.index.bytes -> 10485760, flush.messages -> 9223372036854775807}. (kafka.log.LogManager) [2016-05-25 09:05:52,831] INFO Partition [Topic1Heartbeat,1] on broker 1002: No checkpointed highwatermark is found for partition [Topic1Heartbeat,1] (kafka.cluster.Partition) [2016-05-25 09:14:12,189] INFO [GroupCoordinator 1002]: Preparing to restabilize group Topic1 with old generation 0 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,190] INFO [GroupCoordinator 1002]: Stabilized group Topic1 generation 1 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,195] INFO [GroupCoordinator 1002]: Assignment received from leader for group Topic1 for generation 1 (kafka.coordinator.GroupCoordinator) [2016-05-25 09:14:12,749] FATAL [Replica Manager on Broker 1002]: Halting due to unrecoverable I/O error while handling produce request: (kafka.server.ReplicaManager) kafka.common.KafkaStorageException: I/O exception in append to log '__consumer_offsets-0' at kafka.log.Log.append(Log.scala:318) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:442) at kafka.cluster.Partition$$anonfun$9.apply(Partition.scala:428) at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262) at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:268) at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:428) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:401) at kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:386) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:245) at scala.collection.immutable.Map$Map1.foreach(Map.scala:116) at scala.collection.TraversableLike$class.map(TraversableLike.scala:245) at scala.collection.AbstractTraversable.map(Traversable.scala:104) at kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:386) at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:322) at kafka.coordinator.GroupMetadataManager.store(GroupMetadataManager.scala:228) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at kafka.coordinator.GroupCoordinator$$anonfun$handleCommitOffsets$9.apply(GroupCoordinator.scala:429) at scala.Option.foreach(Option.scala:257) at kafka.coordinator.GroupCoordinator.handleCommitOffsets(GroupCoordinator.scala:429) at kafka.server.KafkaApis.handleOffsetCommitRequest(KafkaApis.scala:280) at kafka.server.KafkaApis.handle(KafkaApis.scala:76) at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60) at java.lang.Thread.run(Thread.java:745) Caused by: java.io.FileNotFoundException: /tmp/kafka-logs/__consumer_offsets-0/.index (No such file or directory) at java.io.RandomAccessFile.open0(Native Method) at java.io.RandomAccessFile.open(RandomAccessFile.java:316) at java.io.RandomAccessFile.(RandomAccessFile.java:243) at kafka.log.OffsetIndex$$anonfun$resize$1.apply(OffsetIndex.scala:277) at