broker randomly shuts down
I know I'm reviving an old thread but did the original poster ever find the cause of this issue and figure out what the fix was? I am running a cluster of 18 Kafka .9 brokers and three of them are having behaving exactly this way once a week. Pretty scary because they are doing a full resign as ISR, dropping all of their partitions, and stopping. Then requiring a full rebuild when restarted. No smoking gun errors in the log. Just a clean shutdown triggered for no apparent reason.
Re: broker randomly shuts down
This is somewhat specific to your runtime environment, you can check out whatever script is getting used for bringing up Kafka, and where the stderr of the java command is being redirected (hopefully not /dev/null!). On Thu, Jun 30, 2016 at 5:24 PM allen chanwrote: > Hi Shikhar, > I do not see stderr log file anywhere. Can you point me to where kafka > would write such a file? > > On Thu, Jun 30, 2016 at 5:10 PM, Shikhar Bhushan > wrote: > > > Perhaps it's a JVM crash? You might not see anything in the standard > > application-level logs, you'd need to look for the stderr. > > > > On Thu, Jun 30, 2016 at 5:07 PM allen chan > > > wrote: > > > > > Anyone else have ideas? > > > > > > This is still happening. I moved off zookeeper from the server to its > own > > > dedicated VMs. > > > Kakfa starts with 4G of heap and gets nowhere near that much consumed > > when > > > it crashed. > > > i bumped up the zookeeper timeout settings but that has not solved it. > > > > > > I also disconnected all the producers and consumers. This point > something > > > between kafka and zookeeper right? > > > > > > Again logs are no help as to why kafka decided to shut itself down > > > https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206 > > > > > > > > > > > > > > > On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie > > wrote: > > > > > > > What about in dmesg? I have run into this issue and it was the OOM > > > > killer. I also ran into a heap issue using too much of the direct > > memory > > > > (JVM). Reducing the fetcher threads helped with that problem. > > > > On Jun 2, 2016 12:19 PM, "allen chan" > > > > wrote: > > > > > > > > > Hi Tom, > > > > > > > > > > That is one of the first things that i checked. Active memory never > > > goes > > > > > above 50% of overall available. File cache uses the rest of the > > memory > > > > but > > > > > i do not think that causes OOM killer. > > > > > Either way there is no entries in /var/log/messages (centos) to > show > > > OOM > > > > is > > > > > happening. > > > > > > > > > > Thanks > > > > > > > > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford > > > > > wrote: > > > > > > > > > > > That looks like somebody is killing the process. I'd suspect > either > > > the > > > > > > linux OOM killer or something else automatically killing the JVM > > for > > > > some > > > > > > reason. > > > > > > > > > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy > to > > > find > > > > > in > > > > > > /var/log/syslog (depending on your setup). I don't know about > other > > > > > > operating systems. > > > > > > > > > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan < > > > > allen.michael.c...@gmail.com > > > > > > > > > > > > wrote: > > > > > > > > > > > > > I have an issue where my brokers would randomly shut itself > down. > > > > > > > I turned on debug in log4j.properties but still do not see a > > reason > > > > why > > > > > > the > > > > > > > shutdown is happening. > > > > > > > > > > > > > > Anyone seen this behavior before? > > > > > > > > > > > > > > version 0.10.0 > > > > > > > log4j.properties > > > > > > > log4j.rootLogger=DEBUG, kafkaAppender > > > > > > > * I tried TRACE level but i do not see any additional log > > messages > > > > > > > > > > > > > > snippet of log around shutdown > > > > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for > sessionid: > > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on > Broker > > > 2]: > > > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on > Broker > > > 2]: > > > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > > >
Re: broker randomly shuts down
Hi Shikhar, I do not see stderr log file anywhere. Can you point me to where kafka would write such a file? On Thu, Jun 30, 2016 at 5:10 PM, Shikhar Bhushanwrote: > Perhaps it's a JVM crash? You might not see anything in the standard > application-level logs, you'd need to look for the stderr. > > On Thu, Jun 30, 2016 at 5:07 PM allen chan > wrote: > > > Anyone else have ideas? > > > > This is still happening. I moved off zookeeper from the server to its own > > dedicated VMs. > > Kakfa starts with 4G of heap and gets nowhere near that much consumed > when > > it crashed. > > i bumped up the zookeeper timeout settings but that has not solved it. > > > > I also disconnected all the producers and consumers. This point something > > between kafka and zookeeper right? > > > > Again logs are no help as to why kafka decided to shut itself down > > https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206 > > > > > > > > > > On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie > wrote: > > > > > What about in dmesg? I have run into this issue and it was the OOM > > > killer. I also ran into a heap issue using too much of the direct > memory > > > (JVM). Reducing the fetcher threads helped with that problem. > > > On Jun 2, 2016 12:19 PM, "allen chan" > > > wrote: > > > > > > > Hi Tom, > > > > > > > > That is one of the first things that i checked. Active memory never > > goes > > > > above 50% of overall available. File cache uses the rest of the > memory > > > but > > > > i do not think that causes OOM killer. > > > > Either way there is no entries in /var/log/messages (centos) to show > > OOM > > > is > > > > happening. > > > > > > > > Thanks > > > > > > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford > > > wrote: > > > > > > > > > That looks like somebody is killing the process. I'd suspect either > > the > > > > > linux OOM killer or something else automatically killing the JVM > for > > > some > > > > > reason. > > > > > > > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to > > find > > > > in > > > > > /var/log/syslog (depending on your setup). I don't know about other > > > > > operating systems. > > > > > > > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan < > > > allen.michael.c...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > I have an issue where my brokers would randomly shut itself down. > > > > > > I turned on debug in log4j.properties but still do not see a > reason > > > why > > > > > the > > > > > > shutdown is happening. > > > > > > > > > > > > Anyone seen this behavior before? > > > > > > > > > > > > version 0.10.0 > > > > > > log4j.properties > > > > > > log4j.rootLogger=DEBUG, kafkaAppender > > > > > > * I tried TRACE level but i do not see any additional log > messages > > > > > > > > > > > > snippet of log around shutdown > > > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker > > 2]: > > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker > > 2]: > > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > > > > > > 0x2550a693b470001 after 1ms
Re: broker randomly shuts down
Perhaps it's a JVM crash? You might not see anything in the standard application-level logs, you'd need to look for the stderr. On Thu, Jun 30, 2016 at 5:07 PM allen chanwrote: > Anyone else have ideas? > > This is still happening. I moved off zookeeper from the server to its own > dedicated VMs. > Kakfa starts with 4G of heap and gets nowhere near that much consumed when > it crashed. > i bumped up the zookeeper timeout settings but that has not solved it. > > I also disconnected all the producers and consumers. This point something > between kafka and zookeeper right? > > Again logs are no help as to why kafka decided to shut itself down > https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206 > > > > > On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoie wrote: > > > What about in dmesg? I have run into this issue and it was the OOM > > killer. I also ran into a heap issue using too much of the direct memory > > (JVM). Reducing the fetcher threads helped with that problem. > > On Jun 2, 2016 12:19 PM, "allen chan" > > wrote: > > > > > Hi Tom, > > > > > > That is one of the first things that i checked. Active memory never > goes > > > above 50% of overall available. File cache uses the rest of the memory > > but > > > i do not think that causes OOM killer. > > > Either way there is no entries in /var/log/messages (centos) to show > OOM > > is > > > happening. > > > > > > Thanks > > > > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford > > wrote: > > > > > > > That looks like somebody is killing the process. I'd suspect either > the > > > > linux OOM killer or something else automatically killing the JVM for > > some > > > > reason. > > > > > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to > find > > > in > > > > /var/log/syslog (depending on your setup). I don't know about other > > > > operating systems. > > > > > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan < > > allen.michael.c...@gmail.com > > > > > > > > wrote: > > > > > > > > > I have an issue where my brokers would randomly shut itself down. > > > > > I turned on debug in log4j.properties but still do not see a reason > > why > > > > the > > > > > shutdown is happening. > > > > > > > > > > Anyone seen this behavior before? > > > > > > > > > > version 0.10.0 > > > > > log4j.properties > > > > > log4j.rootLogger=DEBUG, kafkaAppender > > > > > * I tried TRACE level but i do not see any additional log messages > > > > > > > > > > snippet of log around shutdown > > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker > 2]: > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker > 2]: > > > > > Removed 0 expired offsets in 0 milliseconds. > > > > > (kafka.coordinator.GroupMetadataManager) > > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > > > (kafka.server.KafkaServer) > > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > > > (kafka.server.KafkaServer) > > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting > controlled > > > > > shutdown (kafka.server.KafkaServer) > > > > > [2016-06-01 15:12:13,336] INFO [Kafka
Re: broker randomly shuts down
Anyone else have ideas? This is still happening. I moved off zookeeper from the server to its own dedicated VMs. Kakfa starts with 4G of heap and gets nowhere near that much consumed when it crashed. i bumped up the zookeeper timeout settings but that has not solved it. I also disconnected all the producers and consumers. This point something between kafka and zookeeper right? Again logs are no help as to why kafka decided to shut itself down https://gist.github.com/allenmchan/f9331e54bb4fd77cc5bc0b031a7a6206 On Thu, Jun 2, 2016 at 4:22 PM, Russ Lavoiewrote: > What about in dmesg? I have run into this issue and it was the OOM > killer. I also ran into a heap issue using too much of the direct memory > (JVM). Reducing the fetcher threads helped with that problem. > On Jun 2, 2016 12:19 PM, "allen chan" > wrote: > > > Hi Tom, > > > > That is one of the first things that i checked. Active memory never goes > > above 50% of overall available. File cache uses the rest of the memory > but > > i do not think that causes OOM killer. > > Either way there is no entries in /var/log/messages (centos) to show OOM > is > > happening. > > > > Thanks > > > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford > wrote: > > > > > That looks like somebody is killing the process. I'd suspect either the > > > linux OOM killer or something else automatically killing the JVM for > some > > > reason. > > > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to find > > in > > > /var/log/syslog (depending on your setup). I don't know about other > > > operating systems. > > > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan < > allen.michael.c...@gmail.com > > > > > > wrote: > > > > > > > I have an issue where my brokers would randomly shut itself down. > > > > I turned on debug in log4j.properties but still do not see a reason > why > > > the > > > > shutdown is happening. > > > > > > > > Anyone seen this behavior before? > > > > > > > > version 0.10.0 > > > > log4j.properties > > > > log4j.rootLogger=DEBUG, kafkaAppender > > > > * I tried TRACE level but i do not see any additional log messages > > > > > > > > snippet of log around shutdown > > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > > > Removed 0 expired offsets in 0 milliseconds. > > > > (kafka.coordinator.GroupMetadataManager) > > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > > > Removed 0 expired offsets in 0 milliseconds. > > > > (kafka.coordinator.GroupMetadataManager) > > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > > (kafka.server.KafkaServer) > > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > > (kafka.server.KafkaServer) > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > > > shutdown (kafka.server.KafkaServer) > > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > > > shutdown (kafka.server.KafkaServer) > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > > connections-closed: > > > > (org.apache.kafka.common.metrics.Metrics) > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > > connections-created: > > > > (org.apache.kafka.common.metrics.Metrics) > > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > >
Re: broker randomly shuts down
What about in dmesg? I have run into this issue and it was the OOM killer. I also ran into a heap issue using too much of the direct memory (JVM). Reducing the fetcher threads helped with that problem. On Jun 2, 2016 12:19 PM, "allen chan"wrote: > Hi Tom, > > That is one of the first things that i checked. Active memory never goes > above 50% of overall available. File cache uses the rest of the memory but > i do not think that causes OOM killer. > Either way there is no entries in /var/log/messages (centos) to show OOM is > happening. > > Thanks > > On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayford wrote: > > > That looks like somebody is killing the process. I'd suspect either the > > linux OOM killer or something else automatically killing the JVM for some > > reason. > > > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to find > in > > /var/log/syslog (depending on your setup). I don't know about other > > operating systems. > > > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan > > > wrote: > > > > > I have an issue where my brokers would randomly shut itself down. > > > I turned on debug in log4j.properties but still do not see a reason why > > the > > > shutdown is happening. > > > > > > Anyone seen this behavior before? > > > > > > version 0.10.0 > > > log4j.properties > > > log4j.rootLogger=DEBUG, kafkaAppender > > > * I tried TRACE level but i do not see any additional log messages > > > > > > snippet of log around shutdown > > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > > Removed 0 expired offsets in 0 milliseconds. > > > (kafka.coordinator.GroupMetadataManager) > > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > > Removed 0 expired offsets in 0 milliseconds. > > > (kafka.coordinator.GroupMetadataManager) > > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > (kafka.server.KafkaServer) > > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > > (kafka.server.KafkaServer) > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > > shutdown (kafka.server.KafkaServer) > > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > > shutdown (kafka.server.KafkaServer) > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > connections-closed: > > > (org.apache.kafka.common.metrics.Metrics) > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > connections-created: > > > (org.apache.kafka.common.metrics.Metrics) > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > > bytes-sent-received: > > > (org.apache.kafka.common.metrics.Metrics) > > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent: > > > (org.apache.kafka.common.metrics.Metrics) > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name bytes-received: > > > (org.apache.kafka.common.metrics.Metrics) > > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time: > > > (org.apache.kafka.common.metrics.Metrics) > > > > > > -- > > > Allen Michael Chan > > > > > > > > > -- > Allen Michael Chan >
Re: broker randomly shuts down
Hi Tom, That is one of the first things that i checked. Active memory never goes above 50% of overall available. File cache uses the rest of the memory but i do not think that causes OOM killer. Either way there is no entries in /var/log/messages (centos) to show OOM is happening. Thanks On Thu, Jun 2, 2016 at 5:36 AM, Tom Crayfordwrote: > That looks like somebody is killing the process. I'd suspect either the > linux OOM killer or something else automatically killing the JVM for some > reason. > > For the OOM killer, assuming you're on ubuntu, it's pretty easy to find in > /var/log/syslog (depending on your setup). I don't know about other > operating systems. > > On Thu, Jun 2, 2016 at 5:54 AM, allen chan > wrote: > > > I have an issue where my brokers would randomly shut itself down. > > I turned on debug in log4j.properties but still do not see a reason why > the > > shutdown is happening. > > > > Anyone seen this behavior before? > > > > version 0.10.0 > > log4j.properties > > log4j.rootLogger=DEBUG, kafkaAppender > > * I tried TRACE level but i do not see any additional log messages > > > > snippet of log around shutdown > > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > Removed 0 expired offsets in 0 milliseconds. > > (kafka.coordinator.GroupMetadataManager) > > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > > Removed 0 expired offsets in 0 milliseconds. > > (kafka.coordinator.GroupMetadataManager) > > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > (kafka.server.KafkaServer) > > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > > (kafka.server.KafkaServer) > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > shutdown (kafka.server.KafkaServer) > > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > > shutdown (kafka.server.KafkaServer) > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > connections-closed: > > (org.apache.kafka.common.metrics.Metrics) > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > connections-created: > > (org.apache.kafka.common.metrics.Metrics) > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name > bytes-sent-received: > > (org.apache.kafka.common.metrics.Metrics) > > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent: > > (org.apache.kafka.common.metrics.Metrics) > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name bytes-received: > > (org.apache.kafka.common.metrics.Metrics) > > [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time: > > (org.apache.kafka.common.metrics.Metrics) > > > > -- > > Allen Michael Chan > > > -- Allen Michael Chan
Re: broker randomly shuts down
That looks like somebody is killing the process. I'd suspect either the linux OOM killer or something else automatically killing the JVM for some reason. For the OOM killer, assuming you're on ubuntu, it's pretty easy to find in /var/log/syslog (depending on your setup). I don't know about other operating systems. On Thu, Jun 2, 2016 at 5:54 AM, allen chanwrote: > I have an issue where my brokers would randomly shut itself down. > I turned on debug in log4j.properties but still do not see a reason why the > shutdown is happening. > > Anyone seen this behavior before? > > version 0.10.0 > log4j.properties > log4j.rootLogger=DEBUG, kafkaAppender > * I tried TRACE level but i do not see any additional log messages > > snippet of log around shutdown > [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. > (kafka.coordinator.GroupMetadataManager) > [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: > Removed 0 expired offsets in 0 milliseconds. > (kafka.coordinator.GroupMetadataManager) > [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: > 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > (kafka.server.KafkaServer) > [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down > (kafka.server.KafkaServer) > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > shutdown (kafka.server.KafkaServer) > [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled > shutdown (kafka.server.KafkaServer) > [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-closed: > (org.apache.kafka.common.metrics.Metrics) > [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-created: > (org.apache.kafka.common.metrics.Metrics) > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent-received: > (org.apache.kafka.common.metrics.Metrics) > [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent: > (org.apache.kafka.common.metrics.Metrics) > [2016-06-01 15:12:13,339] DEBUG Added sensor with name bytes-received: > (org.apache.kafka.common.metrics.Metrics) > [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time: > (org.apache.kafka.common.metrics.Metrics) > > -- > Allen Michael Chan >
broker randomly shuts down
I have an issue where my brokers would randomly shut itself down. I turned on debug in log4j.properties but still do not see a reason why the shutdown is happening. Anyone seen this behavior before? version 0.10.0 log4j.properties log4j.rootLogger=DEBUG, kafkaAppender * I tried TRACE level but i do not see any additional log messages snippet of log around shutdown [2016-06-01 15:11:51,374] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:11:53,376] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:11:55,377] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:11:57,380] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:11:59,383] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:01,386] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:03,389] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2016-06-01 15:12:04,121] INFO [Group Metadata Manager on Broker 2]: Removed 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager) [2016-06-01 15:12:05,390] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:07,393] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:09,396] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:11,399] DEBUG Got ping response for sessionid: 0x2550a693b470001 after 1ms (org.apache.zookeeper.ClientCnxn) [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down (kafka.server.KafkaServer) [2016-06-01 15:12:13,334] INFO [Kafka Server 2], shutting down (kafka.server.KafkaServer) [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled shutdown (kafka.server.KafkaServer) [2016-06-01 15:12:13,336] INFO [Kafka Server 2], Starting controlled shutdown (kafka.server.KafkaServer) [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-closed: (org.apache.kafka.common.metrics.Metrics) [2016-06-01 15:12:13,338] DEBUG Added sensor with name connections-created: (org.apache.kafka.common.metrics.Metrics) [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent-received: (org.apache.kafka.common.metrics.Metrics) [2016-06-01 15:12:13,338] DEBUG Added sensor with name bytes-sent: (org.apache.kafka.common.metrics.Metrics) [2016-06-01 15:12:13,339] DEBUG Added sensor with name bytes-received: (org.apache.kafka.common.metrics.Metrics) [2016-06-01 15:12:13,339] DEBUG Added sensor with name select-time: (org.apache.kafka.common.metrics.Metrics) -- Allen Michael Chan