[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.
[ https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833497#comment-17833497 ] Chia-Ping Tsai commented on KAFKA-16430: {quote} what you mean? Is the newer kafka script referring to the use of the new version of the kafka-consumer-group.sh client script? But now there is a problem with the kafka broker server side. {quote} Please ignore my previous comment :( {quote} At the same time, I found through the top command that the "group-metadata-manager-0" thread was constantly consuming 100% of the CPU resources. This loop could not be broken, resulting in the inability to consume topic partition data on that node. At this point, I suspected that the issue may be related to the __consumer_offsets partition data file loaded by this thread. {quote} Could you share more details? for example, the thread dump or hot path you observed {quote} We encountered this issue in our production environment using Kafka versions 2.2.1 and 2.4.0, and I believe it may also affect other versions. {quote} As kafka 2.x is EOL, is it possible that your team use kafak 3.x to reproduce the issue? > The group-metadata-manager thread is always in a loading state and occupies > one CPU, unable to end. > --- > > Key: KAFKA-16430 > URL: https://issues.apache.org/jira/browse/KAFKA-16430 > Project: Kafka > Issue Type: Bug > Components: group-coordinator >Affects Versions: 2.4.0 >Reporter: Gao Fei >Priority: Blocker > > I deployed three broker instances and suddenly found that the client was > unable to consume data from certain topic partitions. I first tried to log in > to the broker corresponding to the group and used the following command to > view the consumer group: > {code:java} > ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe > --group mygroup{code} > and found the following error: > {code:java} > Error: Executing consumer group command failed due to > org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The > coodinator is loading and hence can't process requests.{code} > I then discovered that the broker may be stuck in a loop, which is constantly > in a loading state. At the same time, I found through the top command that > the "group-metadata-manager-0" thread was constantly consuming 100% of the > CPU resources. This loop could not be broken, resulting in the inability to > consume topic partition data on that node. At this point, I suspected that > the issue may be related to the __consumer_offsets partition data file loaded > by this thread. > Finally, after restarting the broker instance, everything was back to normal. > It's very strange that if there was an issue with the __consumer_offsets > partition data file, the broker should have failed to start. Why was it able > to automatically recover after a restart? And why did this continuous loop > loading of the __consumer_offsets partition data occur? > We encountered this issue in our production environment using Kafka versions > 2.2.1 and 2.4.0, and I believe it may also affect other versions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.
[ https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17833389#comment-17833389 ] Gao Fei commented on KAFKA-16430: - [~chia7712] what you mean? Is the newer kafka script referring to the use of the new version of the kafka-consumer-group.sh client script? But now there is a problem with the kafka broker server side. > The group-metadata-manager thread is always in a loading state and occupies > one CPU, unable to end. > --- > > Key: KAFKA-16430 > URL: https://issues.apache.org/jira/browse/KAFKA-16430 > Project: Kafka > Issue Type: Bug > Components: group-coordinator >Affects Versions: 2.4.0 >Reporter: Gao Fei >Priority: Blocker > > I deployed three broker instances and suddenly found that the client was > unable to consume data from certain topic partitions. I first tried to log in > to the broker corresponding to the group and used the following command to > view the consumer group: > {code:java} > ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe > --group mygroup{code} > and found the following error: > {code:java} > Error: Executing consumer group command failed due to > org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The > coodinator is loading and hence can't process requests.{code} > I then discovered that the broker may be stuck in a loop, which is constantly > in a loading state. At the same time, I found through the top command that > the "group-metadata-manager-0" thread was constantly consuming 100% of the > CPU resources. This loop could not be broken, resulting in the inability to > consume topic partition data on that node. At this point, I suspected that > the issue may be related to the __consumer_offsets partition data file loaded > by this thread. > Finally, after restarting the broker instance, everything was back to normal. > It's very strange that if there was an issue with the __consumer_offsets > partition data file, the broker should have failed to start. Why was it able > to automatically recover after a restart? And why did this continuous loop > loading of the __consumer_offsets partition data occur? > We encountered this issue in our production environment using Kafka versions > 2.2.1 and 2.4.0, and I believe it may also affect other versions. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16430) The group-metadata-manager thread is always in a loading state and occupies one CPU, unable to end.
[ https://issues.apache.org/jira/browse/KAFKA-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17832443#comment-17832443 ] Chia-Ping Tsai commented on KAFKA-16430: [~jackin853] Could you use newer kafka script to test it again? > The group-metadata-manager thread is always in a loading state and occupies > one CPU, unable to end. > --- > > Key: KAFKA-16430 > URL: https://issues.apache.org/jira/browse/KAFKA-16430 > Project: Kafka > Issue Type: Bug > Components: group-coordinator >Affects Versions: 2.4.0 >Reporter: Gao Fei >Priority: Blocker > > I deployed three broker instances and suddenly found that the client was > unable to consume data from certain topic partitions. I first tried to log in > to the broker corresponding to the group and used the following command to > view the consumer group: > {code:java} > ./bin/kafka-consumer-groups.sh --bootstrap-server localhost:9093 --describe > --group mygroup{code} > and found the following error: > {code:java} > Error: Executing consumer group command failed due to > org.apache.kafka.common.errors.CoodinatorLoadInProgressException: The > coodinator is loading and hence can't process requests.{code} > I then discovered that the broker may be stuck in a loop, which is constantly > in a loading state. At the same time, I found through the top command that > the "group-metadata-manager-0" thread was constantly consuming 100% of the > CPU resources. This loop could not be broken, resulting in the inability to > consume topic partition data on that node. At this point, I suspected that > the issue may be related to the __consumer_offsets partition data file loaded > by this thread. > Finally, after restarting the broker instance, everything was back to normal. > It's very strange that if there was an issue with the __consumer_offsets > partition data file, the broker should have failed to start. Why was it able > to automatically recover after a restart? And why did this continuous loop > loading of the __consumer_offsets partition data occur? > We encountered this issue in our production environment using Kafka versions > 2.2.1 and 2.4.0, and I believe it may also affect other versions. -- This message was sent by Atlassian Jira (v8.20.10#820010)