Re: Kafka Streams leave group behaviour

2021-08-18 Thread Sophie Blee-Goldman
As Boyang mentioned, Kafka Streams intentionally does not send a LeaveGroup
request
when shutting down. This is because often the shutdown is not due to a
scaling down event
but instead some transient closure, such as during a rolling bounce. In
cases where the
instance is expected to start up again shortly after, we originally wanted
to avoid that member's
tasks from being redistributed across the remaining group members since
this would disturb the
stable assignment and could cause unnecessary state migration and
restoration. We also hoped
to limit the disruption to just a single rebalance, rather than forcing the
group to rebalance once
when the member shuts down and then again when it comes back up. So it's
really an optimization
for the case in which the shutdown is temporary.

That said, many of those optimizations are no longer necessary or at least
much less useful given
recent features and improvements. For example rebalances are now
lightweight so skipping the
2nd rebalance is not as worth optimizing for, and the new assignor will
take into account the actual
underlying state for each task/partition assignment, rather than just the
previous assignment, so
the assignment should be considerably more stable across bounces and
rolling restarts.

Given that, it might be time to reconsider this optimization.
Alternatively, we could introduce another
form of the close() API that forces the member to leave the group, to be
used in event of actual scale
down rather than a transient bounce.

Filed https://issues.apache.org/jira/browse/KAFKA-13217

On Thu, Aug 12, 2021 at 6:14 PM Boyang Chen 
wrote:

> You are right Uwe, Kafka Streams won't leave group no matter dynamic or
> static membership. If you want to have fast scale down, consider trying
> static membership and use the admin command `removeMemberFromGroup` when
> you need to rescale.
>
> Boyang
>
> On Thu, Aug 12, 2021 at 4:37 PM Lerh Chuan Low 
> wrote:
>
> > I think you may have stumbled upon this:
> > https://issues.apache.org/jira/browse/KAFKA-4881. 1 thing that you could
> > try is using static membership - we have yet to try that though so can't
> > comment yet on how that might work out.
> >
> > On Thu, Aug 12, 2021 at 11:29 PM c...@uweeisele.eu 
> > wrote:
> >
> > > Hello all,
> > >
> > > I have a question about the Group Membership lifecycle of Kafka
> Streams,
> > > or more specific about when Kafka Streams does leave the consumer group
> > (in
> > > case of dynamic membership).
> > >
> > > My expectation was, that a call to the method KafkaStreams.close() also
> > > sends a LeaveGroup request to the coordination (if dynamic membership
> is
> > > used). However, its seems that this is not the case (at least in my
> case
> > > the request was not send). Only if I explicitly call
> > > KafkaStreams.removeStreamThread() a LeaveGroup request is sent to the
> > > coordinator. I used the WordCount example located in
> > > https://github.com/confluentinc/kafka-streams-examples to evaluate
> this.
> > >
> > > Is this how Kafka Streams is intended to work and if yes, what do you
> > > recommend to achieve that Kafka Streams leaves the group when shutting
> > down
> > > the application? For example, one situation where I don't want to wait
> > for
> > > the session timeout is when downscaling an application.
> > >
> > > Thanks.
> > >
> > > Best Regards,
> > > Uwe
> >
>


Re: Kafka metrics to calculate number of messages in a topic

2021-08-18 Thread Eric Azama
That will get you a good approximation, but it's not guaranteed to be
completely accurate. Offsets in Kafka are not guaranteed to be continuous.

For topics with log compaction enabled, the removed records will leave
(potentially very large) holes in the offsets.

Even for topics without log compaction, it's possible for there to be small
holes in the offsets. Transaction markers for Exactly Once processing will
also use offsets.

On Mon, Aug 9, 2021 at 11:28 PM Dhirendra Singh 
wrote:

> Hi All,
> I have a requirement to display the total number of messages in a topic in
> grafana dashboard.
> I am looking at the metrics exposed by kafka broker and came across the
> following metrics.
> kafka_log_log_logendoffset
> kafka_log_log_logstartoffset
>
> My understanding is that if I take the difference of
> kafka_log_log_logendoffset and kafka_log_log_logstartoffset it should
> result in the number of messages present in the topic.
> Is my understanding correct ?
>
> Thanks,
> Dhirendra.
>