Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

Guozhang Wang Wed, 27 Jun 2018 16:08:51 -0700

Sounds good to me.

On Wed, Jun 27, 2018 at 3:57 PM, Jason Gustafson <ja...@confluent.io> wrote:


> Hey Guozhang,
>
> That's fair. In fact, perhaps we do not need this API at all. We already
> have the new seek() in this KIP which can do the lookup based on epoch for
> this use case. I guess we should probably call it seekToNearest() though to
> make it clear that the final position may be different from what was
> requested.
>
> Thanks,
> Jason
>
> On Wed, Jun 27, 2018 at 3:20 PM, Guozhang Wang <wangg...@gmail.com> wrote:
>
> > Hi Jason,
> >
> > I think it is less worthwhile to add KafkaConsumer#
> offsetsForLeaderEpochs,
> > since probably only very advanced users are aware of the leaderEpoch, and
> > hence ever care to use it anyways. It is more like an admin client
> > operation than a consumer client operation: if the motivation is to
> > facility customized reset policy, maybe adding it as
> > AdminClient#offsetsForLeaderEpochs
> > is better as it is not an aggressive assumption that for such advanced
> > users they are willing to use some admin client to get further
> information?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Jun 27, 2018 at 2:07 PM, Jason Gustafson <ja...@confluent.io>
> > wrote:
> >
> > > Thanks for the feedback. I've updated the KIP. Specifically I removed
> the
> > > "closest" reset option and the proposal to reset by timestamp when the
> > > precise truncation point cannot be determined. Instead, I proposed that
> > we
> > > always reset using the nearest epoch when a reset policy is defined
> > (either
> > > "earliest" or "latest"). Does that sound reasonable?
> > >
> > > One thing I am still debating is whether it would be better to have a
> > > separate API to find the closest offset using the leader epoch. In the
> > > current KIP, I suggested to piggyback this information on an exception,
> > but
> > > I'm beginning to think it would be better not to hide the lookup. It is
> > > awkward to implement since it means delaying the exception and the API
> > may
> > > actually be useful when customizing reset logic if no auto reset policy
> > is
> > > defined. I was thinking we can add an API like the following:
> > >
> > > Map<TopicPartition, OffsetAndEpoch>
> > > offsetsForLeaderEpochs(Map<TopicPartition, Integer> epochsToSearch)
> > >
> > > Thoughts?
> > >
> > > -Jason
> > >
> > >
> > >
> > >
> > > On Wed, Jun 27, 2018 at 11:12 AM, Jason Gustafson <ja...@confluent.io>
> > > wrote:
> > >
> > > > @Dong
> > > >
> > > > Those are fair points. Both approaches require some fuzziness to
> reset
> > > the
> > > > offset in these pathological scenarios and we cannot guarantee
> > > > at-least-once delivery either way unless we have the full history of
> > > leader
> > > > epochs that were consumed. The KIP-101 logic may actually be more
> > > accurate
> > > > than using timestamps because it does not depend on the messages
> which
> > > are
> > > > written after the unclean leader election. The case we're talking
> about
> > > > should be extremely rare in practice anyway. I also agree that we may
> > not
> > > > want to add new machinery if it only helps the old message format.
> Ok,
> > > > let's go ahead and drop the timestamp.
> > > >
> > > > @Guozhang
> > > >
> > > > * My current understanding is that, with unclean leader election
> turned
> > > on,
> > > >> exactly-once is out of the window since we cannot guarantee that all
> > > >> committed message markers will not be lost. And hence there is no
> need
> > > to
> > > >> have special handling logic for LOG_TRUNCATED or OOR error codes
> with
> > > >> read.committed turned on. Is that right?
> > > >
> > > >
> > > > Yes, that's right. EoS and unclean leader election don't mix well. It
> > may
> > > > be worth considering separately whether we should try to reconcile
> the
> > > > transaction log following an unclean leader election. At least we may
> > be
> > > > able to prevent dangling transactions from blocking consumers. This
> KIP
> > > > does not address this problem.
> > > >
> > > > * MINOR: "if the epoch is greater than the minimum expected epoch,
> that
> > > the
> > > >> new epoch does not begin at an earlier offset than the fetch offset.
> > In
> > > >> the latter case, the leader can respond with a new LOG_TRUNCATION
> > error
> > > >> code" should it be "does not begin at a later offset than the fetch
> > > >> offset"?
> > > >
> > > >
> > > > I think the comment is correct, though the phrasing may be confusing.
> > We
> > > > know truncation has occurred if there exists a larger epoch with a
> > > starting
> > > > offset that is lower than the fetch offset. Let me try to rephrase
> > this.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Wed, Jun 27, 2018 at 9:17 AM, Guozhang Wang <wangg...@gmail.com>
> > > wrote:
> > > >
> > > >> Jason, thanks for the KIP. A few comments:
> > > >>
> > > >> * I think Dong's question about whether to use timestamp-based
> > approach
> > > >> v.s. start-offset-of-first-larger-epoch is valid; more
> specifically,
> > > with
> > > >> timestamp-based approach we may still be reseting to an offset
> falling
> > > >> into
> > > >> the truncated interval, and hence we may still miss some data, i.e.
> > not
> > > >> guaranteeing at-least-once still. With the
> > > >> start-offset-of-first-larger-epoch, I'm not sure if it will
> guarantee
> > > no
> > > >> valid data is missed when we have consecutive log truncations (maybe
> > we
> > > >> need to look back into details of KIP-101 to figure it out). If the
> > > latter
> > > >> can indeed guarantee at least once, we could consider using that
> > > approach.
> > > >>
> > > >> * My current understanding is that, with unclean leader election
> > turned
> > > >> on,
> > > >> exactly-once is out of the window since we cannot guarantee that all
> > > >> committed message markers will not be lost. And hence there is no
> need
> > > to
> > > >> have special handling logic for LOG_TRUNCATED or OOR error codes
> with
> > > >> read.committed turned on. Is that right?
> > > >>
> > > >> * MINOR: "if the epoch is greater than the minimum expected epoch,
> > that
> > > >> the
> > > >> new epoch does not begin at an earlier offset than the fetch offset.
> > In
> > > >> the latter case, the leader can respond with a new LOG_TRUNCATION
> > error
> > > >> code" should it be "does not begin at a later offset than the fetch
> > > >> offset"?
> > > >>
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >> On Tue, Jun 26, 2018 at 6:51 PM, Dong Lin <lindon...@gmail.com>
> > wrote:
> > > >>
> > > >> > Hey Jason,
> > > >> >
> > > >> > Thanks for the explanation.
> > > >> >
> > > >> > Please correct me if this is wrong. The "unknown truncation
> offset"
> > > >> > scenario happens when consumer does not have the full leaderEpoch
> ->
> > > >> offset
> > > >> > mapping. In this case we can still use the KIP-101-based approach
> to
> > > >> > truncate offset to "start offset of the first Leader Epoch larger
> > than
> > > >> last
> > > >> > epoch of the consumer" but it may be inaccurate. So the KIP
> chooses
> > to
> > > >> use
> > > >> > the timestamp-based approach which is also best-effort.
> > > >> >
> > > >> > If this understanding is correct, for "closest" offset reset
> policy
> > > and
> > > >> > "unknown truncation offset" scenario, I am wondering whether it
> > maybe
> > > >> > better to replace timestamp-based approach with KIP-101 based
> > > approach.
> > > >> In
> > > >> > comparison to timestamp-based approach, the KIP-101-based approach
> > > >> seems to
> > > >> > simplify the API a bit since user does not need to understand
> > > timestamp.
> > > >> > Similar to the timestamp-based approach, both approaches are
> > > best-effort
> > > >> > and do not guarantee that consumer can consume all messages. It is
> > not
> > > >> like
> > > >> > KIP-279 which guarantees that follower broker can consume all
> > messages
> > > >> from
> > > >> > the leader.
> > > >> >
> > > >> > Then it seems that the remaining difference is mostly about
> > accuracy,
> > > >> i.e.
> > > >> > how much message will be duplicated or missed in the "unknown
> > > truncation
> > > >> > offset" scenario. Not sure either one is clearly better than the
> > > other.
> > > >> > Note that there are two scenarios mentioned in KIP-279 which are
> not
> > > >> > addressed by KIP-101. Both scenarios require quick leadership
> change
> > > >> > between brokers, which seems to suggest that the offset based
> > obtained
> > > >> > by "start
> > > >> > offset of the first Leader Epoch larger than last epoch of the
> > > consumer"
> > > >> > under these two scenarios may be very close to the offset obtained
> > by
> > > >> the
> > > >> > message timestamp. Does this sound reasonable?
> > > >> >
> > > >> > Good point that users on v1 format can get benefit with timestamp
> > > based
> > > >> > approach. On the other hand it seems like a short term benefit for
> > > users
> > > >> > who have not migrated. I am just not sure whether it is more
> > important
> > > >> than
> > > >> > designing a better API.
> > > >> >
> > > >> > Also, for both "latest" and "earliest" reset policy, do you think
> it
> > > >> would
> > > >> > make sense to also use the KIP-101 based approach to truncate
> offset
> > > for
> > > >> > the "unknown truncation offset" scenario?
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> > Dong
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>



-- 
-- Guozhang

Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

Reply via email to