Re: Rebalancing stuck, never finishes
Peter, It does seem like KAFKA-9752 is the most likely suspect, although if your clients were upgraded to 2.6.1 then I don't believe they would be on an early enough version of the JoinGroup to run into this. I'm not 100% sure though, it may be a good idea to leave a comment on that ticket and ping Jason directly since he implemented the fix Murilo, I agree that your problem is not likely to be KAFKA-9752, since that was caused by KAFKA-9232 and that code is not present in 2.2.1. But maybe you're hitting up on the issue which KAFKA-9232 was originally intended to fix? In any case, 2.2.1 is quite old now so there may be other known bugs which have since been fixed. I know it's not always possible/easy, but I would still recommend to upgrade your brokers to a more recent version if you can. On Fri, Feb 26, 2021 at 7:19 AM Murilo Tavares wrote: > Just to provide a bit more detail, I noticed Peter's pattern: > "Rebalance failed. org.apache.kafka.common.errors.DisconnectException: > null" > "(Re-)joining group" > > But I also get a different pattern, interchangeably: > Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable > or invalid due to cause: null.isDisconnected: true. Rediscovery will be > attempted. > Followed by > Discovered group coordinator broker-1:9092 (id: 2147483646 rack: null) > > > > On Fri, 26 Feb 2021 at 09:59, Murilo Tavares wrote: > > > Hi > > I got the same behaviour yesterday while trying to upgrade my > KafkaStreams > > app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1. > > > > Looking at KAFKA-9752 it mentions the cause being two other tickets: > > https://issues.apache.org/jira/browse/KAFKA-7610 > > https://issues.apache.org/jira/browse/KAFKA-9232 > > > > Although the first ticket seems fixed in 2.2.0, the latter was just fixed > > in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232. > > But what I don't understand is that KAFKA-9752 says: > > "Note that this is only possible if 1) we have a consumer using an old > > JoinGroup version, 2) the consumer times out and disconnects from its > > initial JoinGroup request." > > In this case, I guess my consumer is not using an old JoinGroup, as my > > consumers (KafkaStreams) are on 2.7.0... > > > > Thanks > > Murilo > > > > On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó > > wrote: > > > >> Hey Sophie, > >> > >> thanks for the link, I was checking that ticket, but I was not sure if > it > >> is relevant for our case. > >> Eventually we "fixed" our problem with reducing the session.timeout.ms > >> (it > >> was set to a high value for other reasons). > >> > >> But today, in another service, we faced the same problem when upgrading > >> the > >> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the > brokers. > >> Do you think the same problem (KAFKA-9752) might cause this problem too? > >> It's hard to judge just based on the description of that ticket. > >> > >> Thanks, > >> Peter > >> > > >
Re: high CPU usage after Kafka upgrade
Not sure if this applies only to this 0.10 version. [image: image.png] Regards. On Fri, Feb 26, 2021 at 2:54 PM Péter Sinóros-Szabó wrote: > Hi, > > No, CPU increase shouldn't be there. Upgrades usually bring lower CPU > usage. > > And yes, I followed the upgrade protocol as it is described in the > documentation, I got the CPU increase when I upgraded the 1st instance as > the first step. > > Cheers, > Peter > > On Fri, 26 Feb 2021 at 18:35, Jhanssen Fávaro > wrote: > > > Hi Peter, > > I am on the same, a lot of questions about the Kafka's upgrade process. > But > > looks like tha this CPU Increase is expected, at least while you don't > > finish every broker upgrade. > > > > In this case, when you say you didn't change the version, you say that > for > > any brokers right ? > > > > Basically, you should upgrade the binaries in every broker, but before > the > > restart, change its configuration to reflect your old version. > > > >- inter.broker.protocol.version=2.4.1 > >- log.message.format.version=2.4.1 > > > > And so, after you finish every host/broker binary upgrade > > and clients(consumers/producers upgrades its .jars/library versions to > > 2.6.1) you should just put a comment on those lines versions(log and > > inter.broker.vesion) for each of the brokers and restart one by one. > > > > Thats what I understood reading the documentation: > > > > https://kafka.apache.org/documentation/#upgrade > > > > Best Regards! > > > > > > On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó > > wrote: > > > > > Hi, > > > > > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on > the > > > broker after the upgrade. Upgrade in this case means that I only bumped > > the > > > broker version on 1 of the brokers out of the 6 and didn't change the > > > protocol or message format versions. Before the upgrade, it used about > > 35% > > > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the > > > host, it is happy to use about 350%. > > > > > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the > > same. > > > > > > Any idea what may be wrong? > > > > > > Thanks, > > > Peter > > > > > >
Re: high CPU usage after Kafka upgrade
Hi, No, CPU increase shouldn't be there. Upgrades usually bring lower CPU usage. And yes, I followed the upgrade protocol as it is described in the documentation, I got the CPU increase when I upgraded the 1st instance as the first step. Cheers, Peter On Fri, 26 Feb 2021 at 18:35, Jhanssen Fávaro wrote: > Hi Peter, > I am on the same, a lot of questions about the Kafka's upgrade process. But > looks like tha this CPU Increase is expected, at least while you don't > finish every broker upgrade. > > In this case, when you say you didn't change the version, you say that for > any brokers right ? > > Basically, you should upgrade the binaries in every broker, but before the > restart, change its configuration to reflect your old version. > >- inter.broker.protocol.version=2.4.1 >- log.message.format.version=2.4.1 > > And so, after you finish every host/broker binary upgrade > and clients(consumers/producers upgrades its .jars/library versions to > 2.6.1) you should just put a comment on those lines versions(log and > inter.broker.vesion) for each of the brokers and restart one by one. > > Thats what I understood reading the documentation: > > https://kafka.apache.org/documentation/#upgrade > > Best Regards! > > > On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó > wrote: > > > Hi, > > > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the > > broker after the upgrade. Upgrade in this case means that I only bumped > the > > broker version on 1 of the brokers out of the 6 and didn't change the > > protocol or message format versions. Before the upgrade, it used about > 35% > > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the > > host, it is happy to use about 350%. > > > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the > same. > > > > Any idea what may be wrong? > > > > Thanks, > > Peter > > >
Re: high CPU usage after Kafka upgrade
Hi, thanks, yes I planned to run a profiler on it, Opsian to be exact, to see what's going on, but the async profiles is a good option as well. I just wanted to ask if anyone experienced this before. I will get back here if I find something useful. Peter On Fri, 26 Feb 2021 at 18:34, Alex Woolford wrote: > It might be worth attaching a profiler to see what's eating up all the > cycles, Peter. > > I used this recently, and it turned out that my Prometheus monitoring was > the culprit: https://github.com/jvm-profiling-tools/async-profiler > > From my terminal history: > > cd /tmp > wget > > https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz > tar xvf async-profiler-1.8.3-linux-x64.tar.gz > cd async-profiler-1.8.3-linux-x64 > ./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983 > > > ... where 8983 is the pid of the Kafka process. > > ... and then it spat out a beautiful interactive flame chart. > > > On Fri, Feb 26, 2021 at 10:26 AM Péter Sinóros-Szabó > wrote: > > > Hi, > > > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the > > broker after the upgrade. Upgrade in this case means that I only bumped > the > > broker version on 1 of the brokers out of the 6 and didn't change the > > protocol or message format versions. Before the upgrade, it used about > 35% > > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the > > host, it is happy to use about 350%. > > > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the > same. > > > > Any idea what may be wrong? > > > > Thanks, > > Peter > > >
Re: high CPU usage after Kafka upgrade
Hi Peter, I am on the same, a lot of questions about the Kafka's upgrade process. But looks like tha this CPU Increase is expected, at least while you don't finish every broker upgrade. In this case, when you say you didn't change the version, you say that for any brokers right ? Basically, you should upgrade the binaries in every broker, but before the restart, change its configuration to reflect your old version. - inter.broker.protocol.version=2.4.1 - log.message.format.version=2.4.1 And so, after you finish every host/broker binary upgrade and clients(consumers/producers upgrades its .jars/library versions to 2.6.1) you should just put a comment on those lines versions(log and inter.broker.vesion) for each of the brokers and restart one by one. Thats what I understood reading the documentation: https://kafka.apache.org/documentation/#upgrade Best Regards! On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó wrote: > Hi, > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the > broker after the upgrade. Upgrade in this case means that I only bumped the > broker version on 1 of the brokers out of the 6 and didn't change the > protocol or message format versions. Before the upgrade, it used about 35% > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the > host, it is happy to use about 350%. > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same. > > Any idea what may be wrong? > > Thanks, > Peter >
Re: high CPU usage after Kafka upgrade
It might be worth attaching a profiler to see what's eating up all the cycles, Peter. I used this recently, and it turned out that my Prometheus monitoring was the culprit: https://github.com/jvm-profiling-tools/async-profiler >From my terminal history: cd /tmp wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz tar xvf async-profiler-1.8.3-linux-x64.tar.gz cd async-profiler-1.8.3-linux-x64 ./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983 ... where 8983 is the pid of the Kafka process. ... and then it spat out a beautiful interactive flame chart. On Fri, Feb 26, 2021 at 10:26 AM Péter Sinóros-Szabó wrote: > Hi, > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the > broker after the upgrade. Upgrade in this case means that I only bumped the > broker version on 1 of the brokers out of the 6 and didn't change the > protocol or message format versions. Before the upgrade, it used about 35% > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the > host, it is happy to use about 350%. > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same. > > Any idea what may be wrong? > > Thanks, > Peter >
high CPU usage after Kafka upgrade
Hi, I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the broker after the upgrade. Upgrade in this case means that I only bumped the broker version on 1 of the brokers out of the 6 and didn't change the protocol or message format versions. Before the upgrade, it used about 35% CPUs. After the upgrade it uses 200% but if I add two more CPUs to the host, it is happy to use about 350%. I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same. Any idea what may be wrong? Thanks, Peter
Re: Rebalancing stuck, never finishes
Just to provide a bit more detail, I noticed Peter's pattern: "Rebalance failed. org.apache.kafka.common.errors.DisconnectException: null" "(Re-)joining group" But I also get a different pattern, interchangeably: Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable or invalid due to cause: null.isDisconnected: true. Rediscovery will be attempted. Followed by Discovered group coordinator broker-1:9092 (id: 2147483646 rack: null) On Fri, 26 Feb 2021 at 09:59, Murilo Tavares wrote: > Hi > I got the same behaviour yesterday while trying to upgrade my KafkaStreams > app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1. > > Looking at KAFKA-9752 it mentions the cause being two other tickets: > https://issues.apache.org/jira/browse/KAFKA-7610 > https://issues.apache.org/jira/browse/KAFKA-9232 > > Although the first ticket seems fixed in 2.2.0, the latter was just fixed > in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232. > But what I don't understand is that KAFKA-9752 says: > "Note that this is only possible if 1) we have a consumer using an old > JoinGroup version, 2) the consumer times out and disconnects from its > initial JoinGroup request." > In this case, I guess my consumer is not using an old JoinGroup, as my > consumers (KafkaStreams) are on 2.7.0... > > Thanks > Murilo > > On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó > wrote: > >> Hey Sophie, >> >> thanks for the link, I was checking that ticket, but I was not sure if it >> is relevant for our case. >> Eventually we "fixed" our problem with reducing the session.timeout.ms >> (it >> was set to a high value for other reasons). >> >> But today, in another service, we faced the same problem when upgrading >> the >> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers. >> Do you think the same problem (KAFKA-9752) might cause this problem too? >> It's hard to judge just based on the description of that ticket. >> >> Thanks, >> Peter >> >
Re: Rebalancing stuck, never finishes
Hi I got the same behaviour yesterday while trying to upgrade my KafkaStreams app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1. Looking at KAFKA-9752 it mentions the cause being two other tickets: https://issues.apache.org/jira/browse/KAFKA-7610 https://issues.apache.org/jira/browse/KAFKA-9232 Although the first ticket seems fixed in 2.2.0, the latter was just fixed in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232. But what I don't understand is that KAFKA-9752 says: "Note that this is only possible if 1) we have a consumer using an old JoinGroup version, 2) the consumer times out and disconnects from its initial JoinGroup request." In this case, I guess my consumer is not using an old JoinGroup, as my consumers (KafkaStreams) are on 2.7.0... Thanks Murilo On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó wrote: > Hey Sophie, > > thanks for the link, I was checking that ticket, but I was not sure if it > is relevant for our case. > Eventually we "fixed" our problem with reducing the session.timeout.ms (it > was set to a high value for other reasons). > > But today, in another service, we faced the same problem when upgrading the > Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers. > Do you think the same problem (KAFKA-9752) might cause this problem too? > It's hard to judge just based on the description of that ticket. > > Thanks, > Peter >
Re: Rebalancing stuck, never finishes
Hey Sophie, thanks for the link, I was checking that ticket, but I was not sure if it is relevant for our case. Eventually we "fixed" our problem with reducing the session.timeout.ms (it was set to a high value for other reasons). But today, in another service, we faced the same problem when upgrading the Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers. Do you think the same problem (KAFKA-9752) might cause this problem too? It's hard to judge just based on the description of that ticket. Thanks, Peter