Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Jun Rao via dev Mon, 20 Apr 2026 13:20:14 -0700

Hi, Jiunn-Yang and Chia-Ping,

Thanks for the reply.


The main concern I see with to_start_time is that its behavoir on how much
data to consume when the offset is out of range is not consistent and is
hard to explain. If the group still exists, it will read from the earliest
offset. Otherwise, it will read from the latest.

Jun

On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <[email protected]> wrote:

> hi all,
>
> Just a note for a potential latest_v2:
>
> Since the purpose is to read all records from extended partitions, we
> could leverage the group creation time to compare against the earliest
> record of a partition when there is no committed offset. If the group
> creation time is larger than the earliest record's timestamp, we assume it
> is not an extended partition. Otherwise, we treat it as an extended
> partition.
>
> This approach allows us to catch all "possible" extended partitions, which
> includes both "true" extended partitions and old but truncated partitions.
> While there is a rare edge case where the cost is reprocessing some records
> we don't necessarily want, it is very easy to implement and guarantees we
> will never miss the actual extended partitions.
>
> Best,
> Chia-Ping
>
> On 2026/04/20 13:33:31 黃竣陽 wrote:
> > Hello all,
> >
> > I have added a new "Future Work: latest_strict Policy" section to the
> KIP.
> > The idea is a future policy that uses latest semantics by default but
> falls
> > back to the group creation timestamp specifically for newly added
> partitions
> > during partition expansion. This would reuse the group creation time
> anchor
> > introduced by this KIP, making it a natural extension with minimal
> additional
> > protocol changes.
> >
> > Best Regards,
> > Jiunn-Yang
> >
> > > Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09 寫道：
> > >
> > > Hi all,
> > >
> > > It is practically NP-hard to guess everyone's ideal use case right now.
> > > Also, I believe we all want to avoid falling back to the intricate
> > > multi-policy approach proposed in KIP-842.
> > >
> > > I prefer to keep this KIP focused and discuss a "v2 latest" policy in a
> > > separate KIP. That future policy could build upon the to_start_time
> anchor
> > > to fix data loss specifically for extended partitions. We could call it
> > > something like latest_strict.
> > >
> > > Thoughts?
> > >
> > >
> > > 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道：
> > >
> > >> Hello Jun,
> > >>
> > >> Thanks for the reply,
> > >>
> > >> When the offset goes out of range, the user faces two options:
> > >>
> > >> 1. Skip to the end (latest behavior) — risk losing data that was
> produced
> > >> during
> > >> the group's lifetime but not yet consumed.
> > >> 2. Seek back to the group creation time (to_start_time behavior) —
> > >> potentially
> > >> reprocess some data, but guarantee no data from the group's lifetime
> is
> > >> silently lost.
> > >>
> > >> to_start_time chooses option 2 because its core promise is "never
> silently
> > >> lose data
> > >> produced after the group started." If we fell back to latest on
> > >> out-of-range, we would
> > >> break this guarantee.
> > >>
> > >> I consider users who prefer option 1 can simply use
> > >> auto.offset.reset=latest.
> > >>
> > >> Best Regards,
> > >> Jiunn-Yang
> > >>
> > >>> Jun Rao via dev <[email protected]> 於 2026年4月18日 凌晨1:57 寫道：
> > >>>
> > >>> Hi, Jiunn-Yang and Chia-Ping,
> > >>>
> > >>> Thanks for the reply.
> > >>>
> > >>> "The core semantic of to_start_time is to read all records since the
> > >>> creation of the group."
> > >>>
> > >>> I am just questioning whether this actually covers a common use
> case. If
> > >>> the offset doesn't go out of range, the logic makes sense to me. I'm
> not
> > >>> sure about the logic if the offset is out of range. If a user
> chooses to
> > >>> skip the historical data when starting the group, it seems the user
> > >> likely
> > >>> wants to do the same if the offset is out of range.
> > >>>
> > >>> Jun
> > >>>
> > >>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]> wrote:
> > >>>
> > >>>> Hello Jun,
> > >>>>
> > >>>> Thank for the feedback,
> > >>>>
> > >>>> Adding to the points above:
> > >>>>
> > >>>> Regarding by_duration as an alternative to Scenario 1: beyond clock
> skew
> > >>>> and retry issues, there is also a usability concern. by_duration
> > >> requires
> > >>>> users
> > >>>> to reason about operational timing — "how long does partition
> discovery
> > >>>> take
> > >>>> in my environment?”, and then translate that into a configuration
> value.
> > >>>> to_start_time
> > >>>> requires no such reasoning. It simply anchors to the group creation
> time
> > >>>> recorded
> > >>>> by the broker.
> > >>>>
> > >>>> Regarding Scenario 2: I'd also like to clarify that to_start_time
> does
> > >> not
> > >>>> branch between
> > >>>> "use latest" and "use earliest." It applies the same
> ListOffsetsRequest
> > >>>> with the group creation
> > >>>> timestamp in all cases. The difference in outcome:
> > >>>> - skipping old data on first start
> > >>>> - consuming surviving data after truncation
> > >>>> is a natural consequence of what data exists in the partition at
> that
> > >>>> point, not a different policy
> > >>>> being applied. The rule is always the same.
> > >>>>
> > >>>> Best Regards,
> > >>>> Jiunn-Yang
> > >>>>
> > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 上午9:48 寫道：
> > >>>>>
> > >>>>>
> > >>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日 凌晨4:57 寫道：
> > >>>>>>
> > >>>>>> Also, a group is deleted after the consumer has been idle longer
> > >>>>>> than offsets.retention.minutes. What's the semantic of
> to_start_time
> > >> if
> > >>>> the
> > >>>>>> group creation time is unavailable?
> > >>>>>
> > >>>>> If the group is recreated, a new creation time will be recorded.
> Hence,
> > >>>> it acts like a new group. Plus, it throws an exception directly if
> the
> > >>>> group truly has no creation time.
> > >>>>
> > >>>>
> > >>
> > >>
> >
> >
>

Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Reply via email to