[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

2024-04-03 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833497#comment-17833497
 ] 

Chia-Ping Tsai commented on KAFKA-16430:


{quote}
what you mean?  Is the newer kafka script referring to the use of the new 
version of the kafka-consumer-group.sh client script? But now there is a 
problem with the kafka broker server side.
{quote}

Please ignore my previous comment :(

{quote}
At the same time, I found through the top command that the 
"group-metadata-manager-0" thread was constantly consuming 100% of the CPU 
resources. This loop could not be broken, resulting in the inability to consume 
topic partition data on that node. At this point, I suspected that the issue 
may be related to the __consumer_offsets partition data file loaded by this 
thread.
{quote}

Could you share more details? for example, the thread dump or hot path you 
observed

{quote}
We encountered this issue in our production environment using Kafka versions 
2.2.1 and 2.4.0, and I believe it may also affect other versions.
{quote}

As kafka 2.x is EOL, is it possible that your team use kafak 3.x to reproduce 
the issue?

> The group-metadata-manager thread is always in a loading state and occupies 
> one CPU, unable to end.
> ---
>
> Key: KAFKA-16430
> URL: https://issues.apache.org/jira/browse/KAFKA-16430
> Project: Kafka
>  Issue Type: Bug
>  Components: group-coordinator
>Affects Versions: 2.4.0
>Reporter: Gao Fei
>Priority: Blocker
>
> I deployed three broker instances and suddenly found that the client was 
> unable to consume data from certain topic partitions. I first tried to log in 
> to the broker corresponding to the group and used the following command to 
> view the consumer group:
> {code:java}
> ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
> --group mygroup{code}
> and found the following error:
> {code:java}
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
> coodinator is loading and hence can't process requests.{code}
> I then discovered that the broker may be stuck in a loop, which is constantly 
> in a loading state. At the same time, I found through the top command that 
> the "group-metadata-manager-0" thread was constantly consuming 100% of the 
> CPU resources. This loop could not be broken, resulting in the inability to 
> consume topic partition data on that node. At this point, I suspected that 
> the issue may be related to the __consumer_offsets partition data file loaded 
> by this thread.
> Finally, after restarting the broker instance, everything was back to normal. 
> It's very strange that if there was an issue with the __consumer_offsets 
> partition data file, the broker should have failed to start. Why was it able 
> to automatically recover after a restart? And why did this continuous loop 
> loading of the __consumer_offsets partition data occur?
> We encountered this issue in our production environment using Kafka versions 
> 2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

2024-04-02 Thread Gao Fei (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833389#comment-17833389
 ] 

Gao Fei commented on KAFKA-16430:
-

[~chia7712] what you mean?  Is the newer kafka script referring to the use of 
the new version of the kafka-consumer-group.sh client script? But now there is 
a problem with the kafka broker server side.

> The group-metadata-manager thread is always in a loading state and occupies 
> one CPU, unable to end.
> ---
>
> Key: KAFKA-16430
> URL: https://issues.apache.org/jira/browse/KAFKA-16430
> Project: Kafka
>  Issue Type: Bug
>  Components: group-coordinator
>Affects Versions: 2.4.0
>Reporter: Gao Fei
>Priority: Blocker
>
> I deployed three broker instances and suddenly found that the client was 
> unable to consume data from certain topic partitions. I first tried to log in 
> to the broker corresponding to the group and used the following command to 
> view the consumer group:
> {code:java}
> ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
> --group mygroup{code}
> and found the following error:
> {code:java}
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
> coodinator is loading and hence can't process requests.{code}
> I then discovered that the broker may be stuck in a loop, which is constantly 
> in a loading state. At the same time, I found through the top command that 
> the "group-metadata-manager-0" thread was constantly consuming 100% of the 
> CPU resources. This loop could not be broken, resulting in the inability to 
> consume topic partition data on that node. At this point, I suspected that 
> the issue may be related to the __consumer_offsets partition data file loaded 
> by this thread.
> Finally, after restarting the broker instance, everything was back to normal. 
> It's very strange that if there was an issue with the __consumer_offsets 
> partition data file, the broker should have failed to start. Why was it able 
> to automatically recover after a restart? And why did this continuous loop 
> loading of the __consumer_offsets partition data occur?
> We encountered this issue in our production environment using Kafka versions 
> 2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.

2024-03-30 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832443#comment-17832443
 ] 

Chia-Ping Tsai commented on KAFKA-16430:


[~jackin853] Could you use newer kafka script to test it again?

> The group-metadata-manager thread is always in a loading state and occupies 
> one CPU, unable to end.
> ---
>
> Key: KAFKA-16430
> URL: https://issues.apache.org/jira/browse/KAFKA-16430
> Project: Kafka
>  Issue Type: Bug
>  Components: group-coordinator
>Affects Versions: 2.4.0
>Reporter: Gao Fei
>Priority: Blocker
>
> I deployed three broker instances and suddenly found that the client was 
> unable to consume data from certain topic partitions. I first tried to log in 
> to the broker corresponding to the group and used the following command to 
> view the consumer group:
> {code:java}
> ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe 
> --group mygroup{code}
> and found the following error:
> {code:java}
> Error: Executing consumer group command failed due to 
> org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The 
> coodinator is loading and hence can't process requests.{code}
> I then discovered that the broker may be stuck in a loop, which is constantly 
> in a loading state. At the same time, I found through the top command that 
> the "group-metadata-manager-0" thread was constantly consuming 100% of the 
> CPU resources. This loop could not be broken, resulting in the inability to 
> consume topic partition data on that node. At this point, I suspected that 
> the issue may be related to the __consumer_offsets partition data file loaded 
> by this thread.
> Finally, after restarting the broker instance, everything was back to normal. 
> It's very strange that if there was an issue with the __consumer_offsets 
> partition data file, the broker should have failed to start. Why was it able 
> to automatically recover after a restart? And why did this continuous loop 
> loading of the __consumer_offsets partition data occur?
> We encountered this issue in our production environment using Kafka versions 
> 2.2.1 and 2.4.0, and I believe it may also affect other versions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)