Re: Rebalancing stuck, never finishes

2021-02-26 Thread Sophie Blee-Goldman
Peter,

It does seem like KAFKA-9752 is the most likely suspect, although if your
clients
were upgraded to 2.6.1 then I don't believe they would be on an early
enough version
of the JoinGroup to run into this. I'm not 100% sure though, it may be a
good idea
to leave a comment on that ticket and ping Jason directly since he
implemented the fix

Murilo,

I agree that your problem is not likely to be KAFKA-9752, since that was
caused by
KAFKA-9232 and that code is not present  in 2.2.1. But maybe you're hitting
up on the
issue which  KAFKA-9232 was originally intended to fix? In any case, 2.2.1
is quite old now
so there may be other known bugs which have since been fixed.

I know it's not always possible/easy, but I would still recommend to
upgrade your brokers to a
more recent version if you can.


On Fri, Feb 26, 2021 at 7:19 AM Murilo Tavares  wrote:

> Just to provide a bit more detail, I noticed Peter's pattern:
> "Rebalance failed. org.apache.kafka.common.errors.DisconnectException:
> null"
> "(Re-)joining group"
>
> But I also get a different pattern, interchangeably:
> Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable
> or invalid due to cause: null.isDisconnected: true. Rediscovery will be
> attempted.
> Followed by
> Discovered group coordinator broker-1:9092 (id: 2147483646 rack: null)
>
>
>
> On Fri, 26 Feb 2021 at 09:59, Murilo Tavares  wrote:
>
> > Hi
> > I got the same behaviour yesterday while trying to upgrade my
> KafkaStreams
> > app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1.
> >
> > Looking at KAFKA-9752 it mentions the cause being two other tickets:
> > https://issues.apache.org/jira/browse/KAFKA-7610
> > https://issues.apache.org/jira/browse/KAFKA-9232
> >
> > Although the first ticket seems fixed in 2.2.0, the latter was just fixed
> > in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232.
> > But what I don't understand is that KAFKA-9752 says:
> > "Note that this is only possible if 1) we have a consumer using an old
> > JoinGroup version, 2) the consumer times out and disconnects from its
> > initial JoinGroup request."
> > In this case, I guess my consumer is not using an old JoinGroup, as my
> > consumers (KafkaStreams) are on 2.7.0...
> >
> > Thanks
> > Murilo
> >
> > On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó
> >  wrote:
> >
> >> Hey Sophie,
> >>
> >> thanks for the link, I was checking that ticket, but I was not sure if
> it
> >> is relevant for our case.
> >> Eventually we "fixed" our problem with reducing the session.timeout.ms
> >> (it
> >> was set to a high value for other reasons).
> >>
> >> But today, in another service, we faced the same problem when upgrading
> >> the
> >> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the
> brokers.
> >> Do you think the same problem (KAFKA-9752) might cause this problem too?
> >> It's hard to judge just based on the description of that ticket.
> >>
> >> Thanks,
> >> Peter
> >>
> >
>


Re: high CPU usage after Kafka upgrade

2021-02-26 Thread Jhanssen Fávaro
Not sure if this applies only to this 0.10 version.

[image: image.png]
Regards.


On Fri, Feb 26, 2021 at 2:54 PM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> No, CPU increase shouldn't be there. Upgrades usually bring lower CPU
> usage.
>
> And yes, I followed the upgrade protocol as it is described in the
> documentation, I got the CPU increase when I upgraded the 1st instance as
> the first step.
>
> Cheers,
> Peter
>
> On Fri, 26 Feb 2021 at 18:35, Jhanssen Fávaro 
> wrote:
>
> > Hi Peter,
> > I am on the same, a lot of questions about the Kafka's upgrade process.
> But
> > looks like tha this CPU Increase is expected, at least while you don't
> > finish every broker upgrade.
> >
> > In this case, when you say you didn't change the version, you say that
> for
> > any brokers right ?
> >
> > Basically, you should upgrade the binaries in every broker, but before
> the
> > restart, change its configuration to reflect your old version.
> >
> >- inter.broker.protocol.version=2.4.1
> >- log.message.format.version=2.4.1
> >
> > And so, after you finish every host/broker binary upgrade
> > and clients(consumers/producers upgrades its .jars/library versions to
> > 2.6.1) you should just put a comment on those lines versions(log and
> > inter.broker.vesion) for each of the brokers and restart one by one.
> >
> > Thats what I understood reading the documentation:
> >
> > https://kafka.apache.org/documentation/#upgrade
> >
> > Best Regards!
> >
> >
> > On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó
> >  wrote:
> >
> > > Hi,
> > >
> > > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on
> the
> > > broker after the upgrade. Upgrade in this case means that I only bumped
> > the
> > > broker version on 1 of the brokers out of the 6 and didn't change the
> > > protocol or message format versions. Before the upgrade, it used about
> > 35%
> > > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
> > > host, it is happy to use about 350%.
> > >
> > > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the
> > same.
> > >
> > > Any idea what may be wrong?
> > >
> > > Thanks,
> > > Peter
> > >
> >
>


Re: high CPU usage after Kafka upgrade

2021-02-26 Thread Péter Sinóros-Szabó
Hi,

No, CPU increase shouldn't be there. Upgrades usually bring lower CPU usage.

And yes, I followed the upgrade protocol as it is described in the
documentation, I got the CPU increase when I upgraded the 1st instance as
the first step.

Cheers,
Peter

On Fri, 26 Feb 2021 at 18:35, Jhanssen Fávaro 
wrote:

> Hi Peter,
> I am on the same, a lot of questions about the Kafka's upgrade process. But
> looks like tha this CPU Increase is expected, at least while you don't
> finish every broker upgrade.
>
> In this case, when you say you didn't change the version, you say that for
> any brokers right ?
>
> Basically, you should upgrade the binaries in every broker, but before the
> restart, change its configuration to reflect your old version.
>
>- inter.broker.protocol.version=2.4.1
>- log.message.format.version=2.4.1
>
> And so, after you finish every host/broker binary upgrade
> and clients(consumers/producers upgrades its .jars/library versions to
> 2.6.1) you should just put a comment on those lines versions(log and
> inter.broker.vesion) for each of the brokers and restart one by one.
>
> Thats what I understood reading the documentation:
>
> https://kafka.apache.org/documentation/#upgrade
>
> Best Regards!
>
>
> On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó
>  wrote:
>
> > Hi,
> >
> > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the
> > broker after the upgrade. Upgrade in this case means that I only bumped
> the
> > broker version on 1 of the brokers out of the 6 and didn't change the
> > protocol or message format versions. Before the upgrade, it used about
> 35%
> > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
> > host, it is happy to use about 350%.
> >
> > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the
> same.
> >
> > Any idea what may be wrong?
> >
> > Thanks,
> > Peter
> >
>


Re: high CPU usage after Kafka upgrade

2021-02-26 Thread Péter Sinóros-Szabó
Hi,

thanks, yes I planned to run a profiler on it, Opsian to be exact, to see
what's going on, but the async profiles is a good option as well.
I just wanted to ask if anyone experienced this before.

I will get back here if I find something useful.

Peter

On Fri, 26 Feb 2021 at 18:34, Alex Woolford  wrote:

> It might be worth attaching a profiler to see what's eating up all the
> cycles, Peter.
>
> I used this recently, and it turned out that my Prometheus monitoring was
> the culprit: https://github.com/jvm-profiling-tools/async-profiler
>
> From my terminal history:
>
> cd /tmp
> wget
>
> https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz
> tar xvf async-profiler-1.8.3-linux-x64.tar.gz
> cd async-profiler-1.8.3-linux-x64
> ./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983
>
>
> ... where 8983 is the pid of the Kafka process.
>
> ... and then it spat out a beautiful interactive flame chart.
>
>
> On Fri, Feb 26, 2021 at 10:26 AM Péter Sinóros-Szabó
>  wrote:
>
> > Hi,
> >
> > I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the
> > broker after the upgrade. Upgrade in this case means that I only bumped
> the
> > broker version on 1 of the brokers out of the 6 and didn't change the
> > protocol or message format versions. Before the upgrade, it used about
> 35%
> > CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
> > host, it is happy to use about 350%.
> >
> > I tried 2.5.1 and 2.7.0 versions too. All of those versions show the
> same.
> >
> > Any idea what may be wrong?
> >
> > Thanks,
> > Peter
> >
>


Re: high CPU usage after Kafka upgrade

2021-02-26 Thread Jhanssen Fávaro
Hi Peter,
I am on the same, a lot of questions about the Kafka's upgrade process. But
looks like tha this CPU Increase is expected, at least while you don't
finish every broker upgrade.

In this case, when you say you didn't change the version, you say that for
any brokers right ?

Basically, you should upgrade the binaries in every broker, but before the
restart, change its configuration to reflect your old version.

   - inter.broker.protocol.version=2.4.1
   - log.message.format.version=2.4.1

And so, after you finish every host/broker binary upgrade
and clients(consumers/producers upgrades its .jars/library versions to
2.6.1) you should just put a comment on those lines versions(log and
inter.broker.vesion) for each of the brokers and restart one by one.

Thats what I understood reading the documentation:

https://kafka.apache.org/documentation/#upgrade

Best Regards!


On Fri, Feb 26, 2021 at 2:19 PM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the
> broker after the upgrade. Upgrade in this case means that I only bumped the
> broker version on 1 of the brokers out of the 6 and didn't change the
> protocol or message format versions. Before the upgrade, it used about 35%
> CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
> host, it is happy to use about 350%.
>
> I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same.
>
> Any idea what may be wrong?
>
> Thanks,
> Peter
>


Re: high CPU usage after Kafka upgrade

2021-02-26 Thread Alex Woolford
It might be worth attaching a profiler to see what's eating up all the
cycles, Peter.

I used this recently, and it turned out that my Prometheus monitoring was
the culprit: https://github.com/jvm-profiling-tools/async-profiler

>From my terminal history:

cd /tmp
wget
https://github.com/jvm-profiling-tools/async-profiler/releases/download/v1.8.3/async-profiler-1.8.3-linux-x64.tar.gz
tar xvf async-profiler-1.8.3-linux-x64.tar.gz
cd async-profiler-1.8.3-linux-x64
./profiler.sh -d 30 -f /tmp/flamegraph.svg 8983


... where 8983 is the pid of the Kafka process.

... and then it spat out a beautiful interactive flame chart.


On Fri, Feb 26, 2021 at 10:26 AM Péter Sinóros-Szabó
 wrote:

> Hi,
>
> I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the
> broker after the upgrade. Upgrade in this case means that I only bumped the
> broker version on 1 of the brokers out of the 6 and didn't change the
> protocol or message format versions. Before the upgrade, it used about 35%
> CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
> host, it is happy to use about 350%.
>
> I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same.
>
> Any idea what may be wrong?
>
> Thanks,
> Peter
>


high CPU usage after Kafka upgrade

2021-02-26 Thread Péter Sinóros-Szabó
Hi,

I just upgraded from Kafka 2.4.1 to 2.6.1 and I see huge CPU usage on the
broker after the upgrade. Upgrade in this case means that I only bumped the
broker version on 1 of the brokers out of the 6 and didn't change the
protocol or message format versions. Before the upgrade, it used about 35%
CPUs. After the upgrade it uses 200% but if I add two more CPUs to the
host, it is happy to use about 350%.

I tried 2.5.1 and 2.7.0 versions too. All of those versions show the same.

Any idea what may be wrong?

Thanks,
Peter


Re: Rebalancing stuck, never finishes

2021-02-26 Thread Murilo Tavares
Just to provide a bit more detail, I noticed Peter's pattern:
"Rebalance failed. org.apache.kafka.common.errors.DisconnectException: null"
"(Re-)joining group"

But I also get a different pattern, interchangeably:
Group coordinator broker-1:9092 (id: 2147483646 rack: null) is unavailable
or invalid due to cause: null.isDisconnected: true. Rediscovery will be
attempted.
Followed by
Discovered group coordinator broker-1:9092 (id: 2147483646 rack: null)



On Fri, 26 Feb 2021 at 09:59, Murilo Tavares  wrote:

> Hi
> I got the same behaviour yesterday while trying to upgrade my KafkaStreams
> app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1.
>
> Looking at KAFKA-9752 it mentions the cause being two other tickets:
> https://issues.apache.org/jira/browse/KAFKA-7610
> https://issues.apache.org/jira/browse/KAFKA-9232
>
> Although the first ticket seems fixed in 2.2.0, the latter was just fixed
> in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232.
> But what I don't understand is that KAFKA-9752 says:
> "Note that this is only possible if 1) we have a consumer using an old
> JoinGroup version, 2) the consumer times out and disconnects from its
> initial JoinGroup request."
> In this case, I guess my consumer is not using an old JoinGroup, as my
> consumers (KafkaStreams) are on 2.7.0...
>
> Thanks
> Murilo
>
> On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó
>  wrote:
>
>> Hey Sophie,
>>
>> thanks for the link, I was checking that ticket, but I was not sure if it
>> is relevant for our case.
>> Eventually we "fixed" our problem with reducing the session.timeout.ms
>> (it
>> was set to a high value for other reasons).
>>
>> But today, in another service, we faced the same problem when upgrading
>> the
>> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers.
>> Do you think the same problem (KAFKA-9752) might cause this problem too?
>> It's hard to judge just based on the description of that ticket.
>>
>> Thanks,
>> Peter
>>
>


Re: Rebalancing stuck, never finishes

2021-02-26 Thread Murilo Tavares
Hi
I got the same behaviour yesterday while trying to upgrade my KafkaStreams
app from 2.4.1 to 2.7.0. Our brokers are on 2.2.1.

Looking at KAFKA-9752 it mentions the cause being two other tickets:
https://issues.apache.org/jira/browse/KAFKA-7610
https://issues.apache.org/jira/browse/KAFKA-9232

Although the first ticket seems fixed in 2.2.0, the latter was just fixed
in 2.2.3, so my brokers shouldn't have the code for KAFKA-9232.
But what I don't understand is that KAFKA-9752 says:
"Note that this is only possible if 1) we have a consumer using an old
JoinGroup version, 2) the consumer times out and disconnects from its
initial JoinGroup request."
In this case, I guess my consumer is not using an old JoinGroup, as my
consumers (KafkaStreams) are on 2.7.0...

Thanks
Murilo

On Fri, 26 Feb 2021 at 06:06, Péter Sinóros-Szabó
 wrote:

> Hey Sophie,
>
> thanks for the link, I was checking that ticket, but I was not sure if it
> is relevant for our case.
> Eventually we "fixed" our problem with reducing the session.timeout.ms (it
> was set to a high value for other reasons).
>
> But today, in another service, we faced the same problem when upgrading the
> Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers.
> Do you think the same problem (KAFKA-9752) might cause this problem too?
> It's hard to judge just based on the description of that ticket.
>
> Thanks,
> Peter
>


Re: Rebalancing stuck, never finishes

2021-02-26 Thread Péter Sinóros-Szabó
Hey Sophie,

thanks for the link, I was checking that ticket, but I was not sure if it
is relevant for our case.
Eventually we "fixed" our problem with reducing the session.timeout.ms (it
was set to a high value for other reasons).

But today, in another service, we faced the same problem when upgrading the
Kafka Client from 2.5.1 to 2.6.1. We are still using 2.4.1 on the brokers.
Do you think the same problem (KAFKA-9752) might cause this problem too?
It's hard to judge just based on the description of that ticket.

Thanks,
Peter