Re: Consumer group rebalancing

2021-07-23 Thread Sophie Blee-Goldman
Check out the GroupCoordinator logs, they say why the group is rebalancing.
Specifically the one that begins "Preparing to rebalance group"

If you search for that in the logs you shared, it shows the reason:

2021-07-22 09:26:37,202 INFO [GroupCoordinator 1]: Preparing to rebalance
> group shows_elk_Group in state PreparingRebalance with old generation 8
> (__consumer_offsets-3) (*reason: removing member
> cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f on heartbeat
> expiration*) (kafka.coordinator.group.GroupCoordinator)
> [executor-Heartbeat]
> 2021-07-22 10:04:27,124 INFO [GroupCoordinator 1]: Preparing to rebalance
> group shows_elk_Group in state PreparingRebalance with old generation 9
> (__consumer_offsets-3) (*reason: Adding new member
> cms-kafka-service-9a363fe5-79a2-4dfb-9c69-4171abed14da with group instance
> id None*) (kafka.coordinator.group.GroupCoordinator)
> [data-plane-kafka-request-handler-5]

2021-07-22 10:05:23,657 INFO [GroupCoordinator 1]: Preparing to rebalance
> group shows_elk_Group in state PreparingRebalance with old generation 10
> (__consumer_offsets-3) (*reason: removing member
> cms-kafka-service-0dab8855-ebe2-4e03-bdca-b1351e122b42 on heartbeat
> expiration*) (kafka.coordinator.group.GroupCoordinator)
> [executor-Heartbeat]

2021-07-22 10:13:36,957 INFO [GroupCoordinator 1]: Preparing to rebalance
> group shows_elk_Group in state PreparingRebalance with old generation 11
> (__consumer_offsets-3) (*reason: Adding new member
> cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215 with group instance
> id None*) (kafka.coordinator.group.GroupCoordinator)
> [data-plane-kafka-request-handler-4]

 2021-07-22 10:14:33,966 INFO [GroupCoordinator 1]: Preparing to rebalance
> group shows_elk_Group in state PreparingRebalance with old generation 12
> (__consumer_offsets-3) *(reason: removing member
> cms-kafka-service-9a363fe5-79a2-4dfb-9c69-4171abed14da on heartbeat
> expiration*) (kafka.coordinator.group.GroupCoordinator)
> [executor-Heartbeat]


Looks like a consumer is getting kicked from the group due to heartbeat
expiration and then rejoining over this time period. Unclear from the
server side logs whether it's dropping out because the group was scaled
down, or the pod running that consumer was bounced, etc, or if the consumer
was indeed alive that whole time but somehow couldn't get a heartbeat
through. You'd need to look into the client logs to figure out which of
those things it is. You can try increasing the session.timeout.ms if you
believe there was no change in group membership (ie all consumers were up
and running the whole time).

On Fri, Jul 23, 2021 at 8:22 AM shreyas shetty 
wrote:

> Hi  we have around 70-75 topics created on kafka cluster.
> And there is a consumer pod which is running in different namespace. This
> consumer pod creates separate consumer group for each of the topic. when I
> check the logs of consumer pod it shows the below error:
> ```
> GroupId: shows_elk_Group, Topic: kafka-shows connected
> GroupId: movies_elk_Group, Topic: kafka-movies connected
>
> {"level":"ERROR","timestamp":"2021-07-22T10:14:33.977Z","logger":"kafkajs","message":"[Runner]
> The group is rebalancing,
> re-joining","groupId":"shows_elk_Group","memberId":"cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215","error":"The
> group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":92}
>
> {"level":"ERROR","timestamp":"2021-07-22T10:14:33.165Z","logger":"kafkajs","message":"[Runner]
> The group is rebalancing,
> re-joining","groupId":"movies_elk_Group","memberId":"cms-kafka-service-56cda415-2414-4054-b285-6c794edf1700","error":"The
> group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":102}
>
> ```
> I am getting same error as above for all other consumer group as well.
>
> Sharing the kafka broker logs:
> ```
>  kubectl logs my-kafka-cluster-kafka-1 -n kafka| grep shows_elk_Group
> 2021-07-22 03:54:18,466 INFO Loaded member
> MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff,
> groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192,
> sessionTimeoutMs=3, rebalanceTimeoutMs=6,
> supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group
> with generation 3. (kafka.coordinator.group.GroupMetadata$)
> [group-metadata-manager-0]
> 2021-07-22 03:54:18,471 INFO Loaded member
> MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff,
> groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192,
> sessionTimeoutMs=3, rebalanceTimeoutMs=6,
> supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group
> with generation 4. (kafka.coordinator.group.GroupMetadata$)
> [group-metadata-manager-0]
> 2021-07-22 03:54:18,471 INFO Loaded member
> MemberMetadata(memberId=cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f,
> groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192,
> 

Consumer group rebalancing

2021-07-23 Thread shreyas shetty
Hi  we have around 70-75 topics created on kafka cluster.
And there is a consumer pod which is running in different namespace. This 
consumer pod creates separate consumer group for each of the topic. when I 
check the logs of consumer pod it shows the below error:
```
GroupId: shows_elk_Group, Topic: kafka-shows connected
GroupId: movies_elk_Group, Topic: kafka-movies connected

{"level":"ERROR","timestamp":"2021-07-22T10:14:33.977Z","logger":"kafkajs","message":"[Runner]
 The group is rebalancing, 
re-joining","groupId":"shows_elk_Group","memberId":"cms-kafka-service-612f2458-5b05-427e-8776-0e4270b2c215","error":"The
 group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":92}

{"level":"ERROR","timestamp":"2021-07-22T10:14:33.165Z","logger":"kafkajs","message":"[Runner]
 The group is rebalancing, 
re-joining","groupId":"movies_elk_Group","memberId":"cms-kafka-service-56cda415-2414-4054-b285-6c794edf1700","error":"The
 group is rebalancing, so a rejoin is needed","retryCount":0,"retryTime":102}

```
I am getting same error as above for all other consumer group as well.

Sharing the kafka broker logs:
```
 kubectl logs my-kafka-cluster-kafka-1 -n kafka| grep shows_elk_Group
2021-07-22 03:54:18,466 INFO Loaded member 
MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, 
groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, 
sessionTimeoutMs=3, rebalanceTimeoutMs=6, 
supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with 
generation 3. (kafka.coordinator.group.GroupMetadata$) 
[group-metadata-manager-0]
2021-07-22 03:54:18,471 INFO Loaded member 
MemberMetadata(memberId=cms-kafka-service-df1038e3-d5db-4e62-885f-9f8d011ae9ff, 
groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, 
sessionTimeoutMs=3, rebalanceTimeoutMs=6, 
supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with 
generation 4. (kafka.coordinator.group.GroupMetadata$) 
[group-metadata-manager-0]
2021-07-22 03:54:18,471 INFO Loaded member 
MemberMetadata(memberId=cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f, 
groupInstanceId=None, clientId=cms-kafka-service, clientHost=/10.244.2.192, 
sessionTimeoutMs=3, rebalanceTimeoutMs=6, 
supportedProtocols=List(RoundRobinAssigner), ) in group shows_elk_Group with 
generation 6. (kafka.coordinator.group.GroupMetadata$) 
[group-metadata-manager-0]
2021-07-22 03:54:18,475 INFO [GroupCoordinator 1]: Loading group metadata for 
shows_elk_Group with generation 6 (kafka.coordinator.group.GroupCoordinator) 
[group-metadata-manager-0]
2021-07-22 03:54:20,791 INFO [GroupCoordinator 1]: Preparing to rebalance group 
shows_elk_Group in state PreparingRebalance with old generation 6 
(__consumer_offsets-3) (reason: leader 
cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f re-joining group during 
Stable) (kafka.coordinator.group.GroupCoordinator) 
[data-plane-kafka-request-handler-7]
2021-07-22 03:54:20,791 INFO [GroupCoordinator 1]: Stabilized group 
shows_elk_Group generation 7 (__consumer_offsets-3) with 1 members 
(kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-7]
2021-07-22 03:54:20,793 INFO [GroupCoordinator 1]: Assignment received from 
leader for group shows_elk_Group for generation 7. The group has 1 members, 0 
of which are static. (kafka.coordinator.group.GroupCoordinator) 
[data-plane-kafka-request-handler-0]
2021-07-22 09:25:41,999 INFO [GroupCoordinator 1]: Preparing to rebalance group 
shows_elk_Group in state PreparingRebalance with old generation 7 
(__consumer_offsets-3) (reason: Adding new member 
cms-kafka-service-0dab8855-ebe2-4e03-bdca-b1351e122b42 with group instance id 
None) (kafka.coordinator.group.GroupCoordinator) 
[data-plane-kafka-request-handler-0]
2021-07-22 09:25:42,182 INFO [GroupCoordinator 1]: Stabilized group 
shows_elk_Group generation 8 (__consumer_offsets-3) with 2 members 
(kafka.coordinator.group.GroupCoordinator) [data-plane-kafka-request-handler-5]
2021-07-22 09:25:42,184 INFO [GroupCoordinator 1]: Assignment received from 
leader for group shows_elk_Group for generation 8. The group has 2 members, 0 
of which are static. (kafka.coordinator.group.GroupCoordinator) 
[data-plane-kafka-request-handler-2]
2021-07-22 09:26:37,201 INFO [GroupCoordinator 1]: Member 
cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f in group shows_elk_Group 
has failed, removing it from the group 
(kafka.coordinator.group.GroupCoordinator) [executor-Heartbeat]
2021-07-22 09:26:37,202 INFO [GroupCoordinator 1]: Preparing to rebalance group 
shows_elk_Group in state PreparingRebalance with old generation 8 
(__consumer_offsets-3) (reason: removing member 
cms-kafka-service-11a6a273-9a7c-4053-b6dd-1ca27d06690f on heartbeat expiration) 
(kafka.coordinator.group.GroupCoordinator) [executor-Heartbeat]
2021-07-22 09:26:37,270 INFO [GroupCoordinator 1]: Stabilized group 
shows_elk_Group generation 9 

Re: Consumer Group rebalancing events

2020-09-03 Thread Devaki, Srinivas
Hi Tess D'erberwill, Таисия Дорошенков,

Thanks a lot for the reply on this, one major issue I found with our
existing metrics is that they are not granular enough i.e if the
rebalancing finishes under 10 seconds then I might lose that event in the
metrics if my metrics scrape interval is 10 seconds.

Any suggestions around how to log the exact events than metrics(major
concern is around losing the events)

Thanks

On Tue, Aug 25, 2020 at 7:59 PM Tess D'erberwill 
wrote:

> Hi, Srinivas Devaki !
>
> There are Kafka JMX metrics - NumGroupsCompletingRebalance and
> NumGroupsPreparingRebalance. I think you can monitor them by some
> application and calculate statistic.
> As for us - we added Jolokia agent to our Kafka docker image and Jolokia
> expose all JMX metrics via HTTP. We run additional service which gets
> necessary metrics from this endpoint and analyse them.
>
> On 2020/08/23 16:39:35, "Devaki, Srinivas" 
> wrote:
> > Hi Folks,
> >
> > I’m trying to find the consumer group rebalancing events, so to plot how
> > much time consumer groups usually take in rebalancing during application
> > deployments.
> >
> > I’ve tried to check the logs of `server.log`, `controller.log`,
> > `state-change.log`. but couldn’t find anything about rebalancing.
> >
> >
> > Thanks & Regards
> > Srinivas Devaki
> > sre @ zomato
> >
>


Re: Consumer Group rebalancing events

2020-08-25 Thread Tess D'erberwill
Hi, Srinivas Devaki !

There are Kafka JMX metrics - NumGroupsCompletingRebalance and 
NumGroupsPreparingRebalance. I think you can monitor them by some application 
and calculate statistic. 
As for us - we added Jolokia agent to our Kafka docker image and Jolokia expose 
all JMX metrics via HTTP. We run additional service which gets necessary 
metrics from this endpoint and analyse them.

On 2020/08/23 16:39:35, "Devaki, Srinivas"  wrote: 
> Hi Folks,
> 
> I’m trying to find the consumer group rebalancing events, so to plot how
> much time consumer groups usually take in rebalancing during application
> deployments.
> 
> I’ve tried to check the logs of `server.log`, `controller.log`,
> `state-change.log`. but couldn’t find anything about rebalancing.
> 
> 
> Thanks & Regards
> Srinivas Devaki
> sre @ zomato
> 


Re: Consumer Group rebalancing events

2020-08-25 Thread Таисия Дорошенкова

 Hi,  Srinivas Devaki!

There are Kafka JMX metrics - NumGroupsCompletingRebalance and 
NumGroupsPreparingRebalance. I think you can monitor them by some application 
and calculate statistic.
As for us - we added Jolokia agent to our Kafka docker image and Jolokia expose 
all JMX metrics via HHTP. We run additional service which gets necessary 
metrics from this endpoint and analyse them.
--  
С уважением, Таисия Дорошенкова


On 2020/08/23 16:39:35, "Devaki, Srinivas" < me@eightnoteight.space > wrote:
> Hi Folks,
>
> I’m trying to find the consumer group rebalancing events, so to plot how
> much time consumer groups usually take in rebalancing during application
> deployments.
>
> I’ve tried to check the logs of `server.log`, `controller.log`,
> `state-change.log`. but couldn’t find anything about rebalancing.
>
>
> Thanks & Regards
> Srinivas Devaki
> sre @ zomato
>

Consumer Group rebalancing events

2020-08-23 Thread Devaki, Srinivas
Hi Folks,

I’m trying to find the consumer group rebalancing events, so to plot how
much time consumer groups usually take in rebalancing during application
deployments.

I’ve tried to check the logs of `server.log`, `controller.log`,
`state-change.log`. but couldn’t find anything about rebalancing.


Thanks & Regards
Srinivas Devaki
sre @ zomato