Re: High CPU usage for idle kafka server
Has this to do with KAFKA-1461? Can you see which thread is taking a lot of cpu? Some jconsole plugin can get that information. Jiangjie (Becket) Qin On 6/5/15, 2:57 PM, "pundlik.anuja" wrote: >Hi Jay, > >Good to hear from you. I met you at the kafka meetup at linkedin. > >- No, I am running kafka_2.11-0.8.2.1 > > >Are there any logs/ any info that I can provide that will help you >understand what could be the issue? > >Thanks, >Anuja > >On Fri, Jun 5, 2015 at 2:36 PM, Jay Kreps wrote: > >> This sounds a lot like a bug we fixed in 0.8.2.0, no chance you are >>running >> that pre-release version is there? >> >> -Jay >> >> On Wed, Jun 3, 2015 at 9:43 PM, Anuja Pundlik (apundlik) < >> apund...@cisco.com >> > wrote: >> >> > Hi, >> > >> > I am using Kafka 0.8.2.1. >> > We have 1 zookeeper, 3 kafka brokers. >> > We have 9 topics, out of which 1 topic has 18 partitions, while >>another >> > has 12 partitions. All other topics have 1 partition each. >> > >> > We see that idle kafka brokers (not carrying any message) are using >>more >> > than 50% of CPU. See top output below. >> > >> > Is this a known issue? >> > >> > >> > Thanks >> > >> > >> > >> > top - 04:42:30 up 2:07, 1 user, load average: 1.50, 1.31, 0.92 >> > Tasks: 177 total, 1 running, 176 sleeping, 0 stopped, 0 zombie >> > Cpu(s): 13.5%us, 4.5%sy, 0.0%ni, 81.3%id, 0.2%wa, 0.0%hi, 0.1%si, >> > 0.4%st >> > Mem: 65974296k total, 22310524k used, 43663772k free, 112688k >>buffers >> > Swap:0k total,0k used,0k free, 13382460k >>cached >> > >> > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND >> > 9295 wae 20 0 5212m 894m 12m S 62 1.4 22:50.99 java >> > 9323 wae 20 0 5502m 894m 12m S 56 1.4 24:28.69 java >> > 9353 wae 20 0 5072m 896m 12m S 54 1.4 17:04.31 java >> > >>
Re: High CPU usage for idle kafka server
Hi Jay, Good to hear from you. I met you at the kafka meetup at linkedin. - No, I am running kafka_2.11-0.8.2.1 Are there any logs/ any info that I can provide that will help you understand what could be the issue? Thanks, Anuja On Fri, Jun 5, 2015 at 2:36 PM, Jay Kreps wrote: > This sounds a lot like a bug we fixed in 0.8.2.0, no chance you are running > that pre-release version is there? > > -Jay > > On Wed, Jun 3, 2015 at 9:43 PM, Anuja Pundlik (apundlik) < > apund...@cisco.com > > wrote: > > > Hi, > > > > I am using Kafka 0.8.2.1. > > We have 1 zookeeper, 3 kafka brokers. > > We have 9 topics, out of which 1 topic has 18 partitions, while another > > has 12 partitions. All other topics have 1 partition each. > > > > We see that idle kafka brokers (not carrying any message) are using more > > than 50% of CPU. See top output below. > > > > Is this a known issue? > > > > > > Thanks > > > > > > > > top - 04:42:30 up 2:07, 1 user, load average: 1.50, 1.31, 0.92 > > Tasks: 177 total, 1 running, 176 sleeping, 0 stopped, 0 zombie > > Cpu(s): 13.5%us, 4.5%sy, 0.0%ni, 81.3%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.4%st > > Mem: 65974296k total, 22310524k used, 43663772k free, 112688k buffers > > Swap:0k total,0k used,0k free, 13382460k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 9295 wae 20 0 5212m 894m 12m S 62 1.4 22:50.99 java > > 9323 wae 20 0 5502m 894m 12m S 56 1.4 24:28.69 java > > 9353 wae 20 0 5072m 896m 12m S 54 1.4 17:04.31 java > > >
Re: High CPU usage for idle kafka server
This sounds a lot like a bug we fixed in 0.8.2.0, no chance you are running that pre-release version is there? -Jay On Wed, Jun 3, 2015 at 9:43 PM, Anuja Pundlik (apundlik) wrote: > Hi, > > I am using Kafka 0.8.2.1. > We have 1 zookeeper, 3 kafka brokers. > We have 9 topics, out of which 1 topic has 18 partitions, while another > has 12 partitions. All other topics have 1 partition each. > > We see that idle kafka brokers (not carrying any message) are using more > than 50% of CPU. See top output below. > > Is this a known issue? > > > Thanks > > > > top - 04:42:30 up 2:07, 1 user, load average: 1.50, 1.31, 0.92 > Tasks: 177 total, 1 running, 176 sleeping, 0 stopped, 0 zombie > Cpu(s): 13.5%us, 4.5%sy, 0.0%ni, 81.3%id, 0.2%wa, 0.0%hi, 0.1%si, > 0.4%st > Mem: 65974296k total, 22310524k used, 43663772k free, 112688k buffers > Swap:0k total,0k used,0k free, 13382460k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > 9295 wae 20 0 5212m 894m 12m S 62 1.4 22:50.99 java > 9323 wae 20 0 5502m 894m 12m S 56 1.4 24:28.69 java > 9353 wae 20 0 5072m 896m 12m S 54 1.4 17:04.31 java >
Re: High CPU usage for idle kafka server
There are no messages being sent or received. The system is idle. however, there seems to be some GC going on in the kafka broker and some socket reads and writes. It is using approx 500MB of memory. On Fri, Jun 5, 2015 at 1:44 PM, pundlik.anuja wrote: > Hi Otis, > How do I check garbage collection on kafka broker? > > > On Thu, Jun 4, 2015 at 1:24 PM, Otis Gospodnetic < > otis.gospodne...@gmail.com> wrote: > >> How's their garbage collection doing? >> >> Otis >> -- >> Monitoring * Alerting * Anomaly Detection * Centralized Log Management >> Solr & Elasticsearch Support * http://sematext.com/ >> >> >> On Thu, Jun 4, 2015 at 12:43 AM, Anuja Pundlik (apundlik) < >> apund...@cisco.com> wrote: >> >> > Hi, >> > >> > I am using Kafka 0.8.2.1. >> > We have 1 zookeeper, 3 kafka brokers. >> > We have 9 topics, out of which 1 topic has 18 partitions, while another >> > has 12 partitions. All other topics have 1 partition each. >> > >> > We see that idle kafka brokers (not carrying any message) are using more >> > than 50% of CPU. See top output below. >> > >> > Is this a known issue? >> > >> > >> > Thanks >> > >> > >> > >> > top - 04:42:30 up 2:07, 1 user, load average: 1.50, 1.31, 0.92 >> > Tasks: 177 total, 1 running, 176 sleeping, 0 stopped, 0 zombie >> > Cpu(s): 13.5%us, 4.5%sy, 0.0%ni, 81.3%id, 0.2%wa, 0.0%hi, 0.1%si, >> > 0.4%st >> > Mem: 65974296k total, 22310524k used, 43663772k free, 112688k buffers >> > Swap:0k total,0k used,0k free, 13382460k cached >> > >> > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND >> > 9295 wae 20 0 5212m 894m 12m S 62 1.4 22:50.99 java >> > 9323 wae 20 0 5502m 894m 12m S 56 1.4 24:28.69 java >> > 9353 wae 20 0 5072m 896m 12m S 54 1.4 17:04.31 java >> > >> > >
Re: High CPU usage for idle kafka server
Hi Otis, How do I check garbage collection on kafka broker? On Thu, Jun 4, 2015 at 1:24 PM, Otis Gospodnetic wrote: > How's their garbage collection doing? > > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > On Thu, Jun 4, 2015 at 12:43 AM, Anuja Pundlik (apundlik) < > apund...@cisco.com> wrote: > > > Hi, > > > > I am using Kafka 0.8.2.1. > > We have 1 zookeeper, 3 kafka brokers. > > We have 9 topics, out of which 1 topic has 18 partitions, while another > > has 12 partitions. All other topics have 1 partition each. > > > > We see that idle kafka brokers (not carrying any message) are using more > > than 50% of CPU. See top output below. > > > > Is this a known issue? > > > > > > Thanks > > > > > > > > top - 04:42:30 up 2:07, 1 user, load average: 1.50, 1.31, 0.92 > > Tasks: 177 total, 1 running, 176 sleeping, 0 stopped, 0 zombie > > Cpu(s): 13.5%us, 4.5%sy, 0.0%ni, 81.3%id, 0.2%wa, 0.0%hi, 0.1%si, > > 0.4%st > > Mem: 65974296k total, 22310524k used, 43663772k free, 112688k buffers > > Swap:0k total,0k used,0k free, 13382460k cached > > > > PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND > > 9295 wae 20 0 5212m 894m 12m S 62 1.4 22:50.99 java > > 9323 wae 20 0 5502m 894m 12m S 56 1.4 24:28.69 java > > 9353 wae 20 0 5072m 896m 12m S 54 1.4 17:04.31 java > > >
Query on kafka topic metadata
Hi, I am new to kafka and I have a doubt. How to read specified topic statistics from kafka server? I want to read below parameters about existing topic from kafka. 1) How many activeMessages 2) How many activeSubscriptions 3) How many totalMessages 4) How many totalSubscriptions 5) How mnay deliveryFaults 6) How many pendingDelivery Pls do the needful. Thanks & Regards, Pavan
Re: Consumer lag lies - orphaned offsets?
On Fri, Jun 05, 2015 at 12:53:00AM -0400, Otis Gospodnetić wrote: > Hi Joel, > > On Thu, Jun 4, 2015 at 8:52 PM, Joel Koshy wrote: > > > Hi Otis, > > > > Yes this is a limitation in the old consumer. i.e., a number of > > per-topic/partition mbeans remain even on a rebalance. Those need to > > be de-registered. So if you stop consuming from some partition after a > > rebalance that lag mbean currently remain which is why it remains > > flat. This is a known issue. > > > > I see. Is / should this be considered a bug? Something worth fixing for > 0.8.3? Yes I would call it a bug, but it hasn't been a high priority so far mainly because (I think) most users monitor lag with committed offsets. This is what we do at LinkedIn for instance as Todd mentioned in his reply. > > Also, you say this is the limitation of the old consumer. Does that mean > that this problem goes away completely if one uses the new consumer? This is sort of n/a at the moment as per-partition lag has not been added yet to the new consumer. It does have the equivalent of max-lag. If we add per-partition lag sensors we would need to be able to remove those sensors if applicable after a rebalance. > > > On the restart, the lag goes down to zero because - well the mbeans > > get recreated and the consumer starts fetching. If the fetch request > > reads up to the end of the log then the mbean will report zero. Your > > actual committed offset may be behind though which is why your true > > lag is > 0. > > > > The lag mbeans are useful, but have a number of limitations - it > > depends on active fetches in progress; > > > What do you mean by this? If the fetcher threads die for any reason then fetches stop and the consumer continues to report lag off the last fetched offset and the last reported log end offset. So it will stay flat when it should be increasing (since the log end offset on the broker is increasing if producers are still sending to that partition). Also, the old consumer pre-fetches chunks and buffers these internally. If the chunk queue is full fetches stop; and if the consumer is extremely slow in actually processing the messages off each chunk then lag can stay flat (perhaps even at zero) until the next chunk, while the consumer is iterating messages off the previous chunk. > > > it also does not exactly > > correspond with your actual processed (and committed) offset. > > Right. Though it should be updated in near real-time, so it will > approximately match the reality, no? Yes - I think it is fair to say that in most cases the lag mbeans should be accurate within a small delta of the true lag. Although we are trying to avoid further non-critical development on the old consumer it is convenient to have these mbeans. So I think it may be worth fixing this issue (i.e., deregistering mbeans on a rebalance). Can you file a jira for this? Thanks, Joel > > Thanks, > Otis > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > > > > > The most > > reliable way to monitor application lag is to use the committed > > offsets and the current log end offsets. Todd has been doing a lot of > > interesting work in making lag monitoring less painful and can comment > > more. > > > > Joel > > > > On Thu, Jun 04, 2015 at 04:55:44PM -0400, Otis Gospodnetić wrote: > > > Hi, > > > > > > On Thu, Jun 4, 2015 at 4:26 PM, Scott Reynolds > > wrote: > > > > > > > I believe the JMX metrics reflect the consumer PRIOR to committing > > offsets > > > > to Kafka / Zookeeper. But when you query from the command line using > > the > > > > kafka tools, you are just getting the committed offsets. > > > > > > > > > > Even if that were the case, and maybe it is, it doesn't explain why the > > > ConsumerLag in JMX often remains *completely constant*.forever... > > until > > > the consumer is restarted. You see what I mean? > > > > > > Otis > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > > > > > On Thu, Jun 4, 2015 at 1:23 PM, Otis Gospodnetic < > > > > otis.gospodne...@gmail.com > > > > > wrote: > > > > > > > > > Hi, > > > > > > > > > > Here's something potentially useful. > > > > > > > > > > 1) Before: https://apps.sematext.com/spm-reports/s/eQ9WhLegW9 - the > > > > "flat > > > > > Lag situation" > > > > > > > > > > 2) I restarted the consumer whose lag is shown in the above graph > > > > > > > > > > 3) After restart: https://apps.sematext.com/spm-reports/s/4YGkcUP9ms > > - > > > > NO > > > > > lag at all!? > > > > > > > > > > So that 81560 Lag value that was stuck in JMX is gone. Went down to > > 0. > > > > > Kind of makes sense - the whole consumer was restarted, consumer/java > > > > > process was restarted, everything that was in JMX got reset, and if > > there > > > > > is truly no consumer lag it makes sense that the values in JMX a
Re: Multiple instances of HL Consumer
You can have the same consumer id and Kafka will balance partitions across the two instances automatically. When one of them dies the partitions are rebalanced and assigned to the remaining alive consumers. _ From: Panda, Samaresh Sent: Friday, June 5, 2015 7:42 pm Subject: Multiple instances of HL Consumer To: I've a HL consumer receiving messages using four threads (four partitions). This is a stand-alone Java client. For fail-safe reasons, I want to run another instance of the exact same Java client in a different box. Here are my questions: > Can I keep the same consumer group name or it must be different for the 2nd > instance? > If same consumer group, will the 2nd client receive same set of messages > again? > In general what's the best practice to designing fail-safe clients? Thanks Sam
Multiple instances of HL Consumer
I've a HL consumer receiving messages using four threads (four partitions). This is a stand-alone Java client. For fail-safe reasons, I want to run another instance of the exact same Java client in a different box. Here are my questions: > Can I keep the same consumer group name or it must be different for the 2nd > instance? > If same consumer group, will the 2nd client receive same set of messages > again? > In general what's the best practice to designing fail-safe clients? Thanks Sam