Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

Dong Lin Fri, 29 Jun 2018 10:30:53 -0700

Hey Jason,

Thanks for the update. It looks pretty good. Just some minor comments below:


1) The KIP adds new error code "LOG_TRUNCATION" and new
exception TruncatedPartitionException. Can we make the name more
consistent, e.g. LogTruncationException?

2) Do we need to add UnknownLeaderEpochException as part of API change?

3) Not sure if the offset topic schema is also public API. If so, maybe we
should also include the schema change in the API?

4) For users who store offset externally, currently they get offset using
position(..), store the offset externally, and use seek(..) to initialize
the consumer next time. After this KIP they will need to store and use the
leaderEpoch together with the offset. Should we also update the API so that
user can also get leaderEpoch from position(...)? Not sure if it is OK to
ask user to track the latest leaderEpoch of ConsumerRecord by themselves.

5) Also for users who store offset externally, they need to call seek(..)
with leaderEpoch to initialize consumer. With current KIP users need to
call seekToNearest(), whose name suggests that the final position may be
different from what was requested. However, if users may want to avoid auto
offset reset and be notified explicitly when there is log truncation,
then seekToNearest()
probably does not help here. Would it make sense to replace seekToNearest()
with seek(offset, leaderEpoch) + AminClient.offsetsForLeaderEpochs(...)?


Thanks,
Dong


On Wed, Jun 27, 2018 at 3:57 PM, Jason Gustafson <[email protected]> wrote:

> Hey Guozhang,
>
> That's fair. In fact, perhaps we do not need this API at all. We already
> have the new seek() in this KIP which can do the lookup based on epoch for
> this use case. I guess we should probably call it seekToNearest() though to
> make it clear that the final position may be different from what was
> requested.
>
> Thanks,
> Jason
>
> On Wed, Jun 27, 2018 at 3:20 PM, Guozhang Wang <[email protected]> wrote:
>
> > Hi Jason,
> >
> > I think it is less worthwhile to add KafkaConsumer#
> offsetsForLeaderEpochs,
> > since probably only very advanced users are aware of the leaderEpoch, and
> > hence ever care to use it anyways. It is more like an admin client
> > operation than a consumer client operation: if the motivation is to
> > facility customized reset policy, maybe adding it as
> > AdminClient#offsetsForLeaderEpochs
> > is better as it is not an aggressive assumption that for such advanced
> > users they are willing to use some admin client to get further
> information?
> >
> >
> > Guozhang
> >
> >
> > On Wed, Jun 27, 2018 at 2:07 PM, Jason Gustafson <[email protected]>
> > wrote:
> >
> > > Thanks for the feedback. I've updated the KIP. Specifically I removed
> the
> > > "closest" reset option and the proposal to reset by timestamp when the
> > > precise truncation point cannot be determined. Instead, I proposed that
> > we
> > > always reset using the nearest epoch when a reset policy is defined
> > (either
> > > "earliest" or "latest"). Does that sound reasonable?
> > >
> > > One thing I am still debating is whether it would be better to have a
> > > separate API to find the closest offset using the leader epoch. In the
> > > current KIP, I suggested to piggyback this information on an exception,
> > but
> > > I'm beginning to think it would be better not to hide the lookup. It is
> > > awkward to implement since it means delaying the exception and the API
> > may
> > > actually be useful when customizing reset logic if no auto reset policy
> > is
> > > defined. I was thinking we can add an API like the following:
> > >
> > > Map<TopicPartition, OffsetAndEpoch>
> > > offsetsForLeaderEpochs(Map<TopicPartition, Integer> epochsToSearch)
> > >
> > > Thoughts?
> > >
> > > -Jason
> > >
> > >
> > >
> > >
> > > On Wed, Jun 27, 2018 at 11:12 AM, Jason Gustafson <[email protected]>
> > > wrote:
> > >
> > > > @Dong
> > > >
> > > > Those are fair points. Both approaches require some fuzziness to
> reset
> > > the
> > > > offset in these pathological scenarios and we cannot guarantee
> > > > at-least-once delivery either way unless we have the full history of
> > > leader
> > > > epochs that were consumed. The KIP-101 logic may actually be more
> > > accurate
> > > > than using timestamps because it does not depend on the messages
> which
> > > are
> > > > written after the unclean leader election. The case we're talking
> about
> > > > should be extremely rare in practice anyway. I also agree that we may
> > not
> > > > want to add new machinery if it only helps the old message format.
> Ok,
> > > > let's go ahead and drop the timestamp.
> > > >
> > > > @Guozhang
> > > >
> > > > * My current understanding is that, with unclean leader election
> turned
> > > on,
> > > >> exactly-once is out of the window since we cannot guarantee that all
> > > >> committed message markers will not be lost. And hence there is no
> need
> > > to
> > > >> have special handling logic for LOG_TRUNCATED or OOR error codes
> with
> > > >> read.committed turned on. Is that right?
> > > >
> > > >
> > > > Yes, that's right. EoS and unclean leader election don't mix well. It
> > may
> > > > be worth considering separately whether we should try to reconcile
> the
> > > > transaction log following an unclean leader election. At least we may
> > be
> > > > able to prevent dangling transactions from blocking consumers. This
> KIP
> > > > does not address this problem.
> > > >
> > > > * MINOR: "if the epoch is greater than the minimum expected epoch,
> that
> > > the
> > > >> new epoch does not begin at an earlier offset than the fetch offset.
> > In
> > > >> the latter case, the leader can respond with a new LOG_TRUNCATION
> > error
> > > >> code" should it be "does not begin at a later offset than the fetch
> > > >> offset"?
> > > >
> > > >
> > > > I think the comment is correct, though the phrasing may be confusing.
> > We
> > > > know truncation has occurred if there exists a larger epoch with a
> > > starting
> > > > offset that is lower than the fetch offset. Let me try to rephrase
> > this.
> > > >
> > > > Thanks,
> > > > Jason
> > > >
> > > > On Wed, Jun 27, 2018 at 9:17 AM, Guozhang Wang <[email protected]>
> > > wrote:
> > > >
> > > >> Jason, thanks for the KIP. A few comments:
> > > >>
> > > >> * I think Dong's question about whether to use timestamp-based
> > approach
> > > >> v.s. start-offset-of-first-larger-epoch is valid; more
> specifically,
> > > with
> > > >> timestamp-based approach we may still be reseting to an offset
> falling
> > > >> into
> > > >> the truncated interval, and hence we may still miss some data, i.e.
> > not
> > > >> guaranteeing at-least-once still. With the
> > > >> start-offset-of-first-larger-epoch, I'm not sure if it will
> guarantee
> > > no
> > > >> valid data is missed when we have consecutive log truncations (maybe
> > we
> > > >> need to look back into details of KIP-101 to figure it out). If the
> > > latter
> > > >> can indeed guarantee at least once, we could consider using that
> > > approach.
> > > >>
> > > >> * My current understanding is that, with unclean leader election
> > turned
> > > >> on,
> > > >> exactly-once is out of the window since we cannot guarantee that all
> > > >> committed message markers will not be lost. And hence there is no
> need
> > > to
> > > >> have special handling logic for LOG_TRUNCATED or OOR error codes
> with
> > > >> read.committed turned on. Is that right?
> > > >>
> > > >> * MINOR: "if the epoch is greater than the minimum expected epoch,
> > that
> > > >> the
> > > >> new epoch does not begin at an earlier offset than the fetch offset.
> > In
> > > >> the latter case, the leader can respond with a new LOG_TRUNCATION
> > error
> > > >> code" should it be "does not begin at a later offset than the fetch
> > > >> offset"?
> > > >>
> > > >>
> > > >>
> > > >> Guozhang
> > > >>
> > > >>
> > > >> On Tue, Jun 26, 2018 at 6:51 PM, Dong Lin <[email protected]>
> > wrote:
> > > >>
> > > >> > Hey Jason,
> > > >> >
> > > >> > Thanks for the explanation.
> > > >> >
> > > >> > Please correct me if this is wrong. The "unknown truncation
> offset"
> > > >> > scenario happens when consumer does not have the full leaderEpoch
> ->
> > > >> offset
> > > >> > mapping. In this case we can still use the KIP-101-based approach
> to
> > > >> > truncate offset to "start offset of the first Leader Epoch larger
> > than
> > > >> last
> > > >> > epoch of the consumer" but it may be inaccurate. So the KIP
> chooses
> > to
> > > >> use
> > > >> > the timestamp-based approach which is also best-effort.
> > > >> >
> > > >> > If this understanding is correct, for "closest" offset reset
> policy
> > > and
> > > >> > "unknown truncation offset" scenario, I am wondering whether it
> > maybe
> > > >> > better to replace timestamp-based approach with KIP-101 based
> > > approach.
> > > >> In
> > > >> > comparison to timestamp-based approach, the KIP-101-based approach
> > > >> seems to
> > > >> > simplify the API a bit since user does not need to understand
> > > timestamp.
> > > >> > Similar to the timestamp-based approach, both approaches are
> > > best-effort
> > > >> > and do not guarantee that consumer can consume all messages. It is
> > not
> > > >> like
> > > >> > KIP-279 which guarantees that follower broker can consume all
> > messages
> > > >> from
> > > >> > the leader.
> > > >> >
> > > >> > Then it seems that the remaining difference is mostly about
> > accuracy,
> > > >> i.e.
> > > >> > how much message will be duplicated or missed in the "unknown
> > > truncation
> > > >> > offset" scenario. Not sure either one is clearly better than the
> > > other.
> > > >> > Note that there are two scenarios mentioned in KIP-279 which are
> not
> > > >> > addressed by KIP-101. Both scenarios require quick leadership
> change
> > > >> > between brokers, which seems to suggest that the offset based
> > obtained
> > > >> > by "start
> > > >> > offset of the first Leader Epoch larger than last epoch of the
> > > consumer"
> > > >> > under these two scenarios may be very close to the offset obtained
> > by
> > > >> the
> > > >> > message timestamp. Does this sound reasonable?
> > > >> >
> > > >> > Good point that users on v1 format can get benefit with timestamp
> > > based
> > > >> > approach. On the other hand it seems like a short term benefit for
> > > users
> > > >> > who have not migrated. I am just not sure whether it is more
> > important
> > > >> than
> > > >> > designing a better API.
> > > >> >
> > > >> > Also, for both "latest" and "earliest" reset policy, do you think
> it
> > > >> would
> > > >> > make sense to also use the KIP-101 based approach to truncate
> offset
> > > for
> > > >> > the "unknown truncation offset" scenario?
> > > >> >
> > > >> >
> > > >> > Thanks,
> > > >> > Dong
> > > >> >
> > > >>
> > > >>
> > > >>
> > > >> --
> > > >> -- Guozhang
> > > >>
> > > >
> > > >
> > >
> >
> >
> >
> > --
> > -- Guozhang
> >
>

Re: [DISCUSS] KIP-320: Allow fetchers to detect and handle log truncation

Reply via email to