Re: [DISCUSS] KIP-490: log when consumer groups lose a message because offset has been deleted

Stanislav Kozlovski Sat, 27 Jul 2019 07:22:35 -0700

Hey Jose,

Thanks for the KIP.


I think that Colin was referring to an existing client metric called
"kafka.consumer:type=consumer-fetch-manager-metrics,client-id="{client-id}",name=records-lag-max",
exposed on the consumer application.
You should be able to use that to get a sense of how far behind your
consumer is.

That may not solve the motivation for the KIP, though, which as far as I
understand is to have a way to know when a (and which) consumer has missed
messages because they have been deleted.
I also assume that this is happening because consumers are not long-running
but rather started every once in a while. Let me know if that is correct.
The consumer code does
```
    return topicPartitionState.highWatermark == null ? null :
topicPartitionState.highWatermark - topicPartitionState.position.offset;
```
to calculate the partition lag. I believe,
`topicPartitionState.position.offset` is the latest offset consumed by the
consumer group.
In the case of a start-up of a consumer group that has lagged behind and
lost messages due to retention, the very first reading of that metric may
be very high and perhaps misleading, iff
`TopicPartitionState#logStartOffset > TopicPartitionState#position`

It would be good to expand on the motivation on when/why a consumer would
miss messages due to retention taking effect.

In any case, I think this might be better suited as a client metric. The
consumer should have all the necessary information, as it has the start
offset of the log and the group's latest offset.

Thanks,
Stanislav

On Wed, Jul 24, 2019 at 6:00 PM Jose M <yoz...@gmail.com> wrote:

> Hello Kamal,
>
> The compacted topics are excluded from the KIP, because users of compacted
> topics are mainly interested on the last state for a certain key, and can
> afford to miss intermediary states. Technically is possible to know if the
> topic is compacted through "log.config.compact" attribute. Thanks a lot for
> your feedback!
>
> Ive updated the KIP to precise:
>
>    - compacted topics are excluded of the KIP.
>    - instead of logging on the broker, I propose to create a new metric,
>    following Colin's comment (thanks a lot!)
>
> Thanks,
>
> Jose
>
> On Tue, Jul 23, 2019 at 11:45 AM Kamal Chandraprakash <
> kamal.chandraprak...@gmail.com> wrote:
>
> > Jose,
> >
> >     How do you differentiate the compaction topics from the time
> retention
> > topics? Deleting a message due to compaction policy is a valid case
> > and users won't be interested in monitoring/reading those deleted
> messages.
> >
> > Thanks,
> > Kamal
> >
> > On Tue, Jul 23, 2019 at 4:00 AM Jose M <yoz...@gmail.com> wrote:
> >
> > > Hi Colin,
> > >
> > > Thanks a lot for your feedback. Please note that I only propose to log
> > when
> > > a message is lost this for a set of consumer groups, not as default
> > > behaviour for all consumer groups.
> > > But in fact, I agree with you that to log a line per message expired
> can
> > be
> > > quite lot, and that is not the better way do it. I can propose to add a
> > > dedicated JMX metric of type counter "expired messages" per consumer
> > group.
> > > What do you think ?
> > >
> > > About monitoring the lag to ensure that messages are not lost, I know
> > that
> > > is what clients do, to set up alerting when the lag is above a
> threshold.
> > > But even if the alert is triggered, we dont know if messages have been
> > lost
> > > or not. Implementing this KIP clients would know if something has been
> > > missed or not.
> > >
> > >
> > > Thanks,
> > >
> > >
> > > Jose
> > >
> > > On Mon, Jul 22, 2019 at 5:51 PM Colin McCabe <cmcc...@apache.org>
> wrote:
> > >
> > > > Hi Jose,
> > > >
> > > > One issue that I see here is that the number of log messages could be
> > > > huge.  I've seen people create tens of thousands of consumer groups.
> > > > People can also have settings that create pretty small log files.  A
> > > > message per log file per group could be quite a lot of messages.
> > > >
> > > > A log message on the broker is also not that useful for detecting bad
> > > > client behavior.  People generally only look at the server logs after
> > > they
> > > > become aware that something is wrong through some other means.
> > > >
> > > > Perhaps the clients should just monitor their lag?  There is a JMX
> > metric
> > > > for this, which means it can be hooked into traditional metrics /
> > > reporting
> > > > systems.
> > > >
> > > > best,
> > > > Colin
> > > >
> > > >
> > > > On Mon, Jul 22, 2019, at 03:12, Jose M wrote:
> > > > > Hello,
> > > > >
> > > > > I didn't get any feedback on this small KIP-490
> > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-490%3A+log+when+consumer+groups+lose+a+message+because+offset+has+been+deleted
> > > > >.
> > > > > In summary, I propose a way to be noticed when messages are being
> > > > > removed
> > > > > due to retention policy, without being consumed by a given consumer
> > > > > group.
> > > > > It will be useful to realize that some important messages have been
> > > > > lost.
> > > > >
> > > > > As Im new to the codebase, I have technical questions about how to
> > > > achieve
> > > > > this, but before going deeper, I would like your feedback on the
> > > feature.
> > > > >
> > > > > Thanks a lot,
> > > > >
> > > > >
> > > > > Jose Morales
> > > > >
> > > > > On Sun, Jul 14, 2019 at 12:51 AM Jose M <yoz...@gmail.com> wrote:
> > > > >
> > > > > > Hello,
> > > > > >
> > > > > > I would like to know what do you think on KIP-490:
> > > > > >
> > > > > >
> > > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-490%3A+log+when+consumer+groups+lose+a+message+because+offset+has+been+deleted
> > > > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-490%3A+log+when+consumer+groups+lose+a+message+because+offset+has+expired
> > > > >
> > > > > >
> > > > > >
> > > > > > Thanks a lot !
> > > > > > --
> > > > > > Jose M
> > > > > >
> > > > >
> > > > >
> > > > > --
> > > > > J
> > > > >
> > > >
> > >
> > >
> > > --
> > > J
> > >
> >
>
>
> --
> J
>

Re: [DISCUSS] KIP-490: log when consumer groups lose a message because offset has been deleted

Reply via email to