Re: Consumer group rebalancing
Check out the GroupCoordinator logs, they say why the group is rebalancing. Specifically the one that begins "Preparing to rebalance group" If you search for that in the logs you shared, it shows the reason: 2021-07-22 09:26:37,202 INFO [GroupCoordinator 1]: Preparing to rebalance > group shows_elk_Group in state PreparingRebalance with old generation 8 > (__consumer_offsets-3) (*reason: removing member > cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f on heartbeat > expiration*) (kafka.coordinator.group.GroupCoordinator) > [executor-Heartbeat] > 2021-07-22 10:04:27,124 INFO [GroupCoordinator 1]: Preparing to rebalance > group shows_elk_Group in state PreparingRebalance with old generation 9 > (__consumer_offsets-3) (*reason: Adding new member > cms-kafka-service-9a363fe5-79a2-4dfb-9c69-4171abed14da with group instance > id None*) (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-5] 2021-07-22 10:05:23,657 INFO [GroupCoordinator 1]: Preparing to rebalance > group shows_elk_Group in state PreparingRebalance with old generation 10 > (__consumer_offsets-3) (*reason: removing member > cms-kafka-service-0dab8855-ebe2-4e03-bdca-b1351e122b42 on heartbeat > expiration*) (kafka.coordinator.group.GroupCoordinator) > [executor-Heartbeat] 2021-07-22 10:13:36,957 INFO [GroupCoordinator 1]: Preparing to rebalance > group shows_elk_Group in state PreparingRebalance with old generation 11 > (__consumer_offsets-3) (*reason: Adding new member > cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215 with group instance > id None*) (kafka.coordinator.group.GroupCoordinator) > [data-plane-kafka-request-handler-4] 2021-07-22 10:14:33,966 INFO [GroupCoordinator 1]: Preparing to rebalance > group shows_elk_Group in state PreparingRebalance with old generation 12 > (__consumer_offsets-3) *(reason: removing member > cms-kafka-service-9a363fe5-79a2-4dfb-9c69-4171abed14da on heartbeat > expiration*) (kafka.coordinator.group.GroupCoordinator) > [executor-Heartbeat] Looks like a consumer is getting kicked from the group due to heartbeat expiration and then rejoining over this time period. Unclear from the server side logs whether it's dropping out because the group was scaled down, or the pod running that consumer was bounced, etc, or if the consumer was indeed alive that whole time but somehow couldn't get a heartbeat through. You'd need to look into the client logs to figure out which of those things it is. You can try increasing the session.timeout.ms if you believe there was no change in group membership (ie all consumers were up and running the whole time). On Fri, Jul 23, 2021 at 8:22 AM shreyas shetty wrote: > Hi we have around 70-75 topics created on kafka cluster. > And there is a consumer pod which is running in different namespace. This > consumer pod creates separate consumer group for each of the topic. when I > check the logs of consumer pod it shows the below error: > ``` > GroupId: shows_elk_Group, Topic: kafka-shows connected > GroupId: movies_elk_Group, Topic: kafka-movies connected > > {"level":"ERROR","timestamp":"2021-07-22T10:14:33.977Z","logger":"kafkajs","message":"[Runner] > The group is rebalancing, > re-joining","groupId":"shows_elk_Group","memberId":"cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215","error":"The > group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":92} > > {"level":"ERROR","timestamp":"2021-07-22T10:14:33.165Z","logger":"kafkajs","message":"[Runner] > The group is rebalancing, > re-joining","groupId":"movies_elk_Group","memberId":"cms-kafka-service-56cda415-2414-4054-b285-6c794edf1700","error":"The > group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":102} > > ``` > I am getting same error as above for all other consumer group as well. > > Sharing the kafka broker logs: > ``` > kubectl logs my-kafka-cluster-kafka-1 -n kafka| grep shows_elk_Group > 2021-07-22 03:54:18,466 INFO Loaded member > MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, > groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, > sessionTimeoutMs=3, rebalanceTimeoutMs=6, > supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group > with generation 3. (kafka.coordinator.group.GroupMetadata$) > [group-metadata-manager-0] > 2021-07-22 03:54:18,471 INFO Loaded member > MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, > groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, > sessionTimeoutMs=3, rebalanceTimeoutMs=6, > supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group > with generation 4. (kafka.coordinator.group.GroupMetadata$) > [group-metadata-manager-0] > 2021-07-22 03:54:18,471 INFO Loaded member > MemberMetadata(memberId=cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f, > groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, >
Consumer group rebalancing
Hi we have around 70-75 topics created on kafka cluster. And there is a consumer pod which is running in different namespace. This consumer pod creates separate consumer group for each of the topic. when I check the logs of consumer pod it shows the below error: ``` GroupId: shows_elk_Group, Topic: kafka-shows connected GroupId: movies_elk_Group, Topic: kafka-movies connected {"level":"ERROR","timestamp":"2021-07-22T10:14:33.977Z","logger":"kafkajs","message":"[Runner] The group is rebalancing, re-joining","groupId":"shows_elk_Group","memberId":"cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215","error":"The group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":92} {"level":"ERROR","timestamp":"2021-07-22T10:14:33.165Z","logger":"kafkajs","message":"[Runner] The group is rebalancing, re-joining","groupId":"movies_elk_Group","memberId":"cms-kafka-service-56cda415-2414-4054-b285-6c794edf1700","error":"The group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":102} ``` I am getting same error as above for all other consumer group as well. Sharing the kafka broker logs: ``` kubectl logs my-kafka-cluster-kafka-1 -n kafka| grep shows_elk_Group 2021-07-22 03:54:18,466 INFO Loaded member MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, sessionTimeoutMs=3, rebalanceTimeoutMs=6, supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with generation 3. (kafka.coordinator.group.GroupMetadata$) [group-metadata-manager-0] 2021-07-22 03:54:18,471 INFO Loaded member MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, sessionTimeoutMs=3, rebalanceTimeoutMs=6, supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with generation 4. (kafka.coordinator.group.GroupMetadata$) [group-metadata-manager-0] 2021-07-22 03:54:18,471 INFO Loaded member MemberMetadata(memberId=cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f, groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, sessionTimeoutMs=3, rebalanceTimeoutMs=6, supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with generation 6. (kafka.coordinator.group.GroupMetadata$) [group-metadata-manager-0] 2021-07-22 03:54:18,475 INFO [GroupCoordinator 1]: Loading group metadata for shows_elk_Group with generation 6 (kafka.coordinator.group.GroupCoordinator) [group-metadata-manager-0] 2021-07-22 03:54:20,791 INFO [GroupCoordinator 1]: Preparing to rebalance group shows_elk_Group in state PreparingRebalance with old generation 6 (__consumer_offsets-3) (reason: leader cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f re-joining group during Stable) (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-7] 2021-07-22 03:54:20,791 INFO [GroupCoordinator 1]: Stabilized group shows_elk_Group generation 7 (__consumer_offsets-3) with 1 members (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-7] 2021-07-22 03:54:20,793 INFO [GroupCoordinator 1]: Assignment received from leader for group shows_elk_Group for generation 7. The group has 1 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-0] 2021-07-22 09:25:41,999 INFO [GroupCoordinator 1]: Preparing to rebalance group shows_elk_Group in state PreparingRebalance with old generation 7 (__consumer_offsets-3) (reason: Adding new member cms-kafka-service-0dab8855-ebe2-4e03-bdca-b1351e122b42 with group instance id None) (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-0] 2021-07-22 09:25:42,182 INFO [GroupCoordinator 1]: Stabilized group shows_elk_Group generation 8 (__consumer_offsets-3) with 2 members (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-5] 2021-07-22 09:25:42,184 INFO [GroupCoordinator 1]: Assignment received from leader for group shows_elk_Group for generation 8. The group has 2 members, 0 of which are static. (kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-2] 2021-07-22 09:26:37,201 INFO [GroupCoordinator 1]: Member cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f in group shows_elk_Group has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator) [executor-Heartbeat] 2021-07-22 09:26:37,202 INFO [GroupCoordinator 1]: Preparing to rebalance group shows_elk_Group in state PreparingRebalance with old generation 8 (__consumer_offsets-3) (reason: removing member cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator) [executor-Heartbeat] 2021-07-22 09:26:37,270 INFO [GroupCoordinator 1]: Stabilized group shows_elk_Group generation 9
Re: Consumer Group rebalancing events
Hi Tess D'erberwill, Таисия Дорошенков, Thanks a lot for the reply on this, one major issue I found with our existing metrics is that they are not granular enough i.e if the rebalancing finishes under 10 seconds then I might lose that event in the metrics if my metrics scrape interval is 10 seconds. Any suggestions around how to log the exact events than metrics(major concern is around losing the events) Thanks On Tue, Aug 25, 2020 at 7:59 PM Tess D'erberwill wrote: > Hi, Srinivas Devaki ! > > There are Kafka JMX metrics - NumGroupsCompletingRebalance and > NumGroupsPreparingRebalance. I think you can monitor them by some > application and calculate statistic. > As for us - we added Jolokia agent to our Kafka docker image and Jolokia > expose all JMX metrics via HTTP. We run additional service which gets > necessary metrics from this endpoint and analyse them. > > On 2020/08/23 16:39:35, "Devaki, Srinivas" > wrote: > > Hi Folks, > > > > I’m trying to find the consumer group rebalancing events, so to plot how > > much time consumer groups usually take in rebalancing during application > > deployments. > > > > I’ve tried to check the logs of `server.log`, `controller.log`, > > `state-change.log`. but couldn’t find anything about rebalancing. > > > > > > Thanks & Regards > > Srinivas Devaki > > sre @ zomato > > >
Re: Consumer Group rebalancing events
Hi, Srinivas Devaki ! There are Kafka JMX metrics - NumGroupsCompletingRebalance and NumGroupsPreparingRebalance. I think you can monitor them by some application and calculate statistic. As for us - we added Jolokia agent to our Kafka docker image and Jolokia expose all JMX metrics via HTTP. We run additional service which gets necessary metrics from this endpoint and analyse them. On 2020/08/23 16:39:35, "Devaki, Srinivas" wrote: > Hi Folks, > > I’m trying to find the consumer group rebalancing events, so to plot how > much time consumer groups usually take in rebalancing during application > deployments. > > I’ve tried to check the logs of `server.log`, `controller.log`, > `state-change.log`. but couldn’t find anything about rebalancing. > > > Thanks & Regards > Srinivas Devaki > sre @ zomato >
Re: Consumer Group rebalancing events
Hi, Srinivas Devaki! There are Kafka JMX metrics - NumGroupsCompletingRebalance and NumGroupsPreparingRebalance. I think you can monitor them by some application and calculate statistic. As for us - we added Jolokia agent to our Kafka docker image and Jolokia expose all JMX metrics via HHTP. We run additional service which gets necessary metrics from this endpoint and analyse them. -- С уважением, Таисия Дорошенкова On 2020/08/23 16:39:35, "Devaki, Srinivas" < me@eightnoteight.space > wrote: > Hi Folks, > > I’m trying to find the consumer group rebalancing events, so to plot how > much time consumer groups usually take in rebalancing during application > deployments. > > I’ve tried to check the logs of `server.log`, `controller.log`, > `state-change.log`. but couldn’t find anything about rebalancing. > > > Thanks & Regards > Srinivas Devaki > sre @ zomato >
Consumer Group rebalancing events
Hi Folks, I’m trying to find the consumer group rebalancing events, so to plot how much time consumer groups usually take in rebalancing during application deployments. I’ve tried to check the logs of `server.log`, `controller.log`, `state-change.log`. but couldn’t find anything about rebalancing. Thanks & Regards Srinivas Devaki sre @ zomato