Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Jun Rao via dev Mon, 27 Apr 2026 09:49:34 -0700

Hi, Chia-Ping,

Thanks for the reply.


I agree that we should only add new options that cover a common use case.
For auto.offset.reset.latest.max.age, it would be useful to compare it with
by_duration.

Jun

On Sat, Apr 25, 2026 at 4:30 AM Chia-Ping Tsai <[email protected]> wrote:

> hi Jun
>
> Honestly, we've seen a similar "case storm" in our local community
> discussions. Some feel a new policy could revolutionize existing pipelines,
> while others find it overly complicated to mentally juggle all these offset
> edge cases.
>
> I also realize that introducing a completely new policy just to overcome
> the "data loss on partition expansion" issue might be a bit overkill for
> now. We can always revisit a brand-new policy later.
>
> For now, I'd like to pivot back to the original pain point: how to avoid
> losing "hot" records from newly expanded partitions when using the latest
> policy. The tricky part is that expanded partitions aren't always "hot" to
> consumers. For instance, if a partition is expanded while the consumer is
> offline for a long period, the user would likely prefer to skip to the end
> upon resuming, as those records are no longer fresh.
>
> Therefore, I'd like to propose a new consumer config:
> auto.offset.reset.latest.max.age (Ryan's discussion inspires it). When a
> consumer is using the latest policy, it can rely on this threshold to
> determine its behavior on partitions without a committed offset. If the
> partition's "age" is within this threshold (i.e., it's a recently expanded
> partition), we fall back to earliest to catch the hot data. If it exceeds
> the age, or if the age is unavailable (e.g., older broker versions), it
> strictly adheres to latest.
>
> This partition "age" could be returned via the consumer heartbeat. The age
> would be calculated server-side by the coordinator: coordinator's current
> time - partition creation time. This inherently means we would need to
> modify the partition records to store the creation time, as well as update
> the heartbeat RPC to pass this relative age.
>
> We plan to draft a separate KIP for auto.offset.reset.latest.max.age and
> start a new thread for it to keep things focused. We can leave this current
> thread open for any broader discussions on completely new policies.
>
> Any feedback on this new direction is highly welcome. Thanks everyone for
> the incredible brainstorming session!
>
> Best,
> Chia-Ping
>
> On 2026/04/23 20:44:41 Jun Rao via dev wrote:
> > Hi, Chia-Ping,
> >
> > Thanks for the reply.
> >
> > "read all records produced since the group's birth."
> > Let's consider this requirement a bit more. For CDC use cases, users
> don't
> > want to lose any data. The easiest option is to consume data with the
> > earliest offset. Sometimes, there are good reasons to skip the backlog.
> For
> > example, the downstream system already obtains a database snapshot
> through
> > another channel. However, in this case, the user usually needs to set the
> > initial offsets carefully to match the snapshot's timestamp and avoid
> data
> > loss. Starting from the group creation time doesn't seem to meet the
> > business need in this case.
> >
> > Jun
> >
> >
> > On Thu, Apr 23, 2026 at 11:49 AM Chia-Ping Tsai <[email protected]>
> wrote:
> >
> > > hi Jun
> > >
> > > > This seems to
> > > fit the current auto.offset.reset framework more naturally.
> > >
> > > Your point about the existing framework is well-taken, but it
> highlights a
> > > key distinction this KIP aims to address.
> > >
> > > If a user simply wants a "Smarter Latest" (one that avoids data loss
> from
> > > extended partitions), they could indeed use by_duration=5mins as a
> > > reasonable workaround.
> > >
> > > However, there is currently no workaround for a policy that guarantees
> > > "read all records produced since the group's birth." > This is a
> critical
> > > requirement for data pipelines like OLTP (MySQL/Postgres) -> Kafka ->
> OLAP
> > > (ClickHouse/Snowflake). These users often use latest initially to
> avoid a
> > > massive historical backlog, but they have a "Zero Data Loss"
> requirement
> > > once the pipeline is active.
> > >
> > > When these users encounter an "out-of-range" error, they want to
> consume
> > > every surviving record in Kafka that belongs to their group's
> lifetime. If
> > > we force them to jump to the end, it means they have to manually
> re-load
> > > and backfill significantly more "lost records" from the source OLTP,
> which
> > > is a high-cost operational burden.
> > >
> > > In short, the policy offered by this KIP is not just another option; it
> > > provides a deterministic lifecycle anchor that cannot be emulated by
> the
> > > current policies.
> > >
> > > Jun Rao via dev <[email protected]> 於 2026年4月24日週五 上午1:38寫道：
> > >
> > >> Hi, Chia-Ping, Jiunn-Yang, and Jian,
> > >>
> > >> Thanks for the reply. I appreciate your effort in trying to address a
> > >> common issue.
> > >>
> > >> To me, history and data are the same as the backlog. It's just that
> the
> > >> amount of backlog can vary. When the group is first created or when
> the
> > >> offset is out of range, the backlog is large. When a new partition is
> > >> created and discovered by the consumer, the backlog is small (5
> seconds of
> > >> data for the new consumer, 5 minutes for the classic consumer). The
> > >> question is how much backlog a user can tolerate. The to_start_time
> option
> > >> implicitly assumes that a user can tolerate 0 backlog in one case but
> 5
> > >> seconds or 5 minutes in another. This may or may not be what a user
> wants,
> > >> but at least it seems inconsistent. An alternative is to document all
> > >> cases
> > >> where a backlog can occur and let the user choose how much backlog
> they
> > >> can
> > >> tolerate, configuring it with the existing by_during option. This
> seems to
> > >> fit the current auto.offset.reset framework more naturally.
> > >>
> > >> Jun
> > >>
> > >>
> > >> On Thu, Apr 23, 2026 at 6:23 AM jian fu <[email protected]> wrote:
> > >>
> > >> > Hi All:
> > >> >
> > >> > Since Jun Yang referenced my earlier discussion, I’d also like to
> join
> > >> in
> > >> > and share some of my thoughts.
> > >> >
> > >> > The key area of minor divergence is this case's handle:
> > >> > " When the user starts the group for the first time, it faces a
> choice
> > >> on
> > >> > whether to process the backlog or not. When the offset is
> out-of-range,
> > >> the
> > >> > user faces the same
> > >> > choice regarding backlog processing. "
> > >> >
> > >> > so I think we have four options to handle two key choice:
> > >> > 1 latest:  drop history + drop the data
> > >> > 2 earliest:  not drop history + not drop the data
> > >> > 3 the KIP propose mode:   drop history + not drop the data
> > >> > 4 unreasonable mode:   not drop history + drop the data
> > >> >
> > >> > I think the 3 is reasonable mode for user (not consider the naming
> or
> > >> > implement). Image one case in life. you may subscribe one magazine
> > >> without
> > >> > buy the older magazines. but you must don't to lost some magazine
> after
> > >> > subscribe due to you don't buy history.
> > >> >
> > >> > Regards
> > >> > Jian
> > >> >
> > >> >
> > >> > 黃竣陽 <[email protected]> 于2026年4月23日周四 19:17写道：
> > >> >
> > >> > > Hello all,
> > >> > >
> > >> > > Thanks for the feedback. I'd like to advocate for keeping the
> original
> > >> > > to_start_time semantics.
> > >> > >
> > >> > > Earlier in this thread, both Jian and Ryan highlighted that
> branched
> > >> > logic
> > >> > > is the main UX concern:
> > >> > >
> > >> > > Jian: "If we can define one basic rule… it would make it easier
> for
> > >> > >         everyone to stay on the same page."
> > >> > > Ryan: "The documentation might be difficult if it has to
> > >> > >         list and explain all the cases."
> > >> > > Chia-Ping: "Having an opinionated config with branched logic
> makes it
> > >> > hard
> > >> > >         to document and reason about."
> > >> > >
> > >> > > to_start_time already follows this principle, it consistently
> issues a
> > >> > > ListOffsets request
> > >> > > anchored to the group creation timestamp. Differences in outcome
> are
> > >> > > simply due to what
> > >> > > data the broker retains, not different rules being applied.
> Changing
> > >> > > out-of-range to latest
> > >> > > would be the real inconsistency, since the policy would then
> branch
> > >> based
> > >> > > on the reset
> > >> > > scenario.
> > >> > >
> > >> > > Additionally, out-of-range and no-offset (group GC'd) are
> > >> fundamentally
> > >> > > different situations.
> > >> > > When the group exists, the creation timestamp is available and
> should
> > >> be
> > >> > > honored. When
> > >> > > the group is GC'd, the metadata is gone, this is an orthogonal
> problem
> > >> > > that affects all reset
> > >> > > policies equally.
> > >> > >
> > >> > > The strength of to_start_time is precisely its single, clean rule:
> > >> > "Always
> > >> > > seek to the group’s
> > >> > > creation time, and let ListOffsets resolve the rest."
> > >> > >
> > >> > > Best Regards,
> > >> > > Jiunn-Yang
> > >> > >
> > >> > > > Chia-Ping Tsai <[email protected]> 於 2026年4月23日 下午3:24 寫道：
> > >> > > >
> > >> > > > Hi all,
> > >> > > >
> > >> > > > BTW, regardless of where we land on the "out-of-range" debate,
> the
> > >> > > underlying infrastructure of persisting the "group creation time"
> is
> > >> > still
> > >> > > highly valuable and worth merging.
> > >> > > >
> > >> > > > From my conversations with users, there are diverse needs: some
> love
> > >> > the
> > >> > > "better earliest" idea to safely skip massive historical backlogs,
> > >> while
> > >> > > others only care about fixing the data loss in latest during
> partition
> > >> > > expansion.
> > >> > > >
> > >> > > > Simply having the creation time persisted and exposed is
> already a
> > >> > > massive step forward, as it gives users a reliable, objective
> anchor
> > >> to
> > >> > > manually fix the issue via a ConsumerRebalanceListener. However,
> much
> > >> > like
> > >> > > the concept of a DLQ (Dead Letter Queue) while users could
> implement
> > >> it
> > >> > > manually, providing a built-in reset policy makes the developer
> > >> > experience
> > >> > > significantly more convenient, robust, and out-of-the-box.
> > >> > > >
> > >> > > > I believe Ken might chime in later with a different perspective
> as
> > >> well
> > >> > > :)
> > >> > > >
> > >> > > > Best,
> > >> > > > Chia-Ping
> > >> > > >
> > >> > > >
> > >> > > >> Chia-Ping Tsai <[email protected]> 於 2026年4月23日 凌晨3:59 寫道：
> > >> > > >>
> > >> > > >> Hi Jun,
> > >> > > >>
> > >> > > >> Thanks for the feedback. I agree that shifting this policy
> toward a
> > >> > > "Smarter Latest" (rather than a better Earliest) is a more elegant
> > >> path.
> > >> > > >>
> > >> > > >> The refined behavior would be:
> > >> > > >>
> > >> > > >> Out-of-range: Strictly follow latest semantics. This ensures a
> > >> > > predictable "skip to end" behavior when users fall behind
> retention.
> > >> > > >>
> > >> > > >> No-offset (Initial Start & Expansion): Leverage Group Creation
> Time
> > >> > for
> > >> > > lookup.
> > >> > > >>
> > >> > > >> • For new groups, this naturally results in latest behavior
> since
> > >> > > creation time is "now".
> > >> > > >>
> > >> > > >> • For existing groups discovering new partitions, this results
> in
> > >> > > earliest behavior for those specific partitions.
> > >> > > >>
> > >> > > >> Group GC: If a group is purged, it is treated as a brand-new
> group
> > >> > with
> > >> > > a creation time of "now," consistently skipping to the end.
> > >> > > >>
> > >> > > >> WDYT?
> > >> > > >>
> > >> > > >>
> > >> > > >>> Jun Rao via dev <[email protected]> 於 2026年4月23日 凌晨1:34
> 寫道：
> > >> > > >>>
> > >> > > >>> Hi, Chia-Ping,
> > >> > > >>>
> > >> > > >>> Thanks for the reply.
> > >> > > >>>
> > >> > > >>> Let's try to understand from the user's perspective. When the
> user
> > >> > > starts
> > >> > > >>> the group for the first time, it faces a choice on whether to
> > >> process
> > >> > > the
> > >> > > >>> backlog or not. When the offset is out-of-range, the user
> faces
> > >> the
> > >> > > same
> > >> > > >>> choice regarding backlog processing. It seems that most users
> > >> want to
> > >> > > make
> > >> > > >>> the same choice regarding backlog processing.
> > >> > > >>>
> > >> > > >>> "Users who explicitly choose the to_start_time policy do so
> > >> precisely
> > >> > > >>> because they do not want to skip any records when
> encountering an
> > >> > > >>> out-of-range scenario."
> > >> > > >>> This argument is weak because that's how to_start_time is
> > >> designed,
> > >> > > but we
> > >> > > >>> need to justify why it is a good choice in the first place.
> > >> > > >>>
> > >> > > >>> Jun
> > >> > > >>>
> > >> > > >>>>> On Tue, Apr 21, 2026 at 12:35 PM Chia-Ping Tsai <
> > >> > [email protected]>
> > >> > > wrote:
> > >> > > >>>>
> > >> > > >>>> Hi Jun,
> > >> > > >>>>
> > >> > > >>>> Thanks for the clarification. I think I misunderstood your
> > >> previous
> > >> > > point.
> > >> > > >>>> Let me summarize the scenarios to ensure we are fully
> aligned.
> > >> > > >>>>
> > >> > > >>>> There are essentially three scenarios when a consumer needs
> to
> > >> reset
> > >> > > >>>> offsets:
> > >> > > >>>>
> > >> > > >>>> 1.
> > >> > > >>>>
> > >> > > >>>> Out-of-range (The group exists, but the offset has expired).
> > >> > > >>>> 2.
> > >> > > >>>>
> > >> > > >>>> Extended partition (The group exists, but encounters a newly
> > >> added
> > >> > > >>>> partition with no committed offset).
> > >> > > >>>> 3.
> > >> > > >>>>
> > >> > > >>>> No-offset (The group is completely new, or an existing group
> was
> > >> > > >>>> deleted by the GC).
> > >> > > >>>>
> > >> > > >>>> We all agree that the primary goal of this KIP is to catch
> up on
> > >> all
> > >> > > >>>> records for scenario 2. There are no objections here.
> > >> > > >>>>
> > >> > > >>>> Regarding the inconsistency you pointed out between 1) and 3)
> > >> under
> > >> > > the
> > >> > > >>>> current to_start_time design, I completely see your point. If
> > >> users
> > >> > > are
> > >> > > >>>> not fully aware that to_start_time is designed to read all
> > >> records
> > >> > > since
> > >> > > >>>> the creation of the group, they might get confused.
> > >> > > >>>>
> > >> > > >>>> However, to me, this "inconsistency" is actually a matter of
> > >> > > >>>> predictability. Users who explicitly choose the to_start_time
> > >> policy
> > >> > > do
> > >> > > >>>> so precisely because they do not want to skip any records
> when
> > >> > > encountering
> > >> > > >>>> an out-of-range scenario.
> > >> > > >>>>
> > >> > > >>>> (I would prefer to set aside the topic of group GC for a
> moment.
> > >> It
> > >> > is
> > >> > > >>>> much more important that we first focus our discussion on the
> > >> > > >>>> "out-of-range" scenario)
> > >> > > >>>>
> > >> > > >>>> Best,
> > >> > > >>>>
> > >> > > >>>> Chia-Ping
> > >> > > >>>>
> > >> > > >>>> Jun Rao via dev <[email protected]> 於 2026年4月22日週三
> 上午1:13寫道：
> > >> > > >>>>
> > >> > > >>>>> Hi, Chia-Ping,
> > >> > > >>>>>
> > >> > > >>>>> Hmm, is that true? With the earliest policy, we treat an
> > >> > out-of-range
> > >> > > >>>>> offset the same as no offset (because the group is deleted)
> and
> > >> > > always set
> > >> > > >>>>> it to the earliest offset, right? With to_start_time, an
> > >> > out-of-range
> > >> > > >>>>> offset is treated differently from no offset.
> > >> > > >>>>>
> > >> > > >>>>> Thanks,
> > >> > > >>>>>
> > >> > > >>>>> Jun
> > >> > > >>>>>
> > >> > > >>>>> On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai <
> > >> > [email protected]
> > >> > > >
> > >> > > >>>>> wrote:
> > >> > > >>>>>
> > >> > > >>>>>> hi Jun
> > >> > > >>>>>>
> > >> > > >>>>>> Nice point. Group GC is definitely an issue for
> to_start_time,
> > >> but
> > >> > > it is
> > >> > > >>>>>> actually an issue for other policies as well.
> > >> > > >>>>>>
> > >> > > >>>>>> For example, a consumer using the earliest policy will
> suddenly
> > >> > > read all
> > >> > > >>>>>> historical records from scratch if it sleeps for a long
> while
> > >> and
> > >> > > gets
> > >> > > >>>>>> GC'd; otherwise, it just resumes from previous offsets if
> the
> > >> > group
> > >> > > >>>>> still
> > >> > > >>>>>> exists. It is equally hard to explain to users: "Oh, your
> group
> > >> > was
> > >> > > >>>>> GC'd,
> > >> > > >>>>>> so your offset behavior changed."
> > >> > > >>>>>>
> > >> > > >>>>>> Therefore, it seems to me the right approach to fix this
> > >> > > "inconsistency"
> > >> > > >>>>>> is to offer a group-level GC timeout in a future KIP,
> allowing
> > >> > > users to
> > >> > > >>>>>> explicitly protect critical groups from GC. This saves not
> only
> > >> > > >>>>>> to_start_time, but all other reset policies too.
> > >> > > >>>>>>
> > >> > > >>>>>> Best,
> > >> > > >>>>>> Chia-Ping
> > >> > > >>>>>>
> > >> > > >>>>>> On 2026/04/20 20:19:47 Jun Rao via dev wrote:
> > >> > > >>>>>>> Hi, Jiunn-Yang and Chia-Ping,
> > >> > > >>>>>>>
> > >> > > >>>>>>> Thanks for the reply.
> > >> > > >>>>>>>
> > >> > > >>>>>>> The main concern I see with to_start_time is that its
> > >> behavoir on
> > >> > > how
> > >> > > >>>>>> much
> > >> > > >>>>>>> data to consume when the offset is out of range is not
> > >> consistent
> > >> > > and
> > >> > > >>>>> is
> > >> > > >>>>>>> hard to explain. If the group still exists, it will read
> from
> > >> the
> > >> > > >>>>>> earliest
> > >> > > >>>>>>> offset. Otherwise, it will read from the latest.
> > >> > > >>>>>>>
> > >> > > >>>>>>> Jun
> > >> > > >>>>>>>
> > >> > > >>>>>>> On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <
> > >> > > [email protected]>
> > >> > > >>>>>> wrote:
> > >> > > >>>>>>>
> > >> > > >>>>>>>> hi all,
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Just a note for a potential latest_v2:
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Since the purpose is to read all records from extended
> > >> > partitions,
> > >> > > >>>>> we
> > >> > > >>>>>>>> could leverage the group creation time to compare
> against the
> > >> > > >>>>> earliest
> > >> > > >>>>>>>> record of a partition when there is no committed offset.
> If
> > >> the
> > >> > > >>>>> group
> > >> > > >>>>>>>> creation time is larger than the earliest record's
> > >> timestamp, we
> > >> > > >>>>>> assume it
> > >> > > >>>>>>>> is not an extended partition. Otherwise, we treat it as
> an
> > >> > > extended
> > >> > > >>>>>>>> partition.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> This approach allows us to catch all "possible" extended
> > >> > > partitions,
> > >> > > >>>>>> which
> > >> > > >>>>>>>> includes both "true" extended partitions and old but
> > >> truncated
> > >> > > >>>>>> partitions.
> > >> > > >>>>>>>> While there is a rare edge case where the cost is
> > >> reprocessing
> > >> > > some
> > >> > > >>>>>> records
> > >> > > >>>>>>>> we don't necessarily want, it is very easy to implement
> and
> > >> > > >>>>> guarantees
> > >> > > >>>>>> we
> > >> > > >>>>>>>> will never miss the actual extended partitions.
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> Best,
> > >> > > >>>>>>>> Chia-Ping
> > >> > > >>>>>>>>
> > >> > > >>>>>>>> On 2026/04/20 13:33:31 黃竣陽 wrote:
> > >> > > >>>>>>>>> Hello all,
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>> I have added a new "Future Work: latest_strict Policy"
> > >> section
> > >> > to
> > >> > > >>>>> the
> > >> > > >>>>>>>> KIP.
> > >> > > >>>>>>>>> The idea is a future policy that uses latest semantics
> by
> > >> > default
> > >> > > >>>>> but
> > >> > > >>>>>>>> falls
> > >> > > >>>>>>>>> back to the group creation timestamp specifically for
> newly
> > >> > added
> > >> > > >>>>>>>> partitions
> > >> > > >>>>>>>>> during partition expansion. This would reuse the group
> > >> creation
> > >> > > >>>>> time
> > >> > > >>>>>>>> anchor
> > >> > > >>>>>>>>> introduced by this KIP, making it a natural extension
> with
> > >> > > minimal
> > >> > > >>>>>>>> additional
> > >> > > >>>>>>>>> protocol changes.
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>> Best Regards,
> > >> > > >>>>>>>>> Jiunn-Yang
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月18日
> 下午4:09
> > >> 寫道：
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> Hi all,
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> It is practically NP-hard to guess everyone's ideal use
> > >> case
> > >> > > >>>>> right
> > >> > > >>>>>> now.
> > >> > > >>>>>>>>>> Also, I believe we all want to avoid falling back to
> the
> > >> > > >>>>> intricate
> > >> > > >>>>>>>>>> multi-policy approach proposed in KIP-842.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> I prefer to keep this KIP focused and discuss a "v2
> latest"
> > >> > > >>>>> policy
> > >> > > >>>>>> in a
> > >> > > >>>>>>>>>> separate KIP. That future policy could build upon the
> > >> > > >>>>> to_start_time
> > >> > > >>>>>>>> anchor
> > >> > > >>>>>>>>>> to fix data loss specifically for extended partitions.
> We
> > >> > could
> > >> > > >>>>>> call it
> > >> > > >>>>>>>>>> something like latest_strict.
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> Thoughts?
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道：
> > >> > > >>>>>>>>>>
> > >> > > >>>>>>>>>>> Hello Jun,
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> Thanks for the reply,
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> When the offset goes out of range, the user faces two
> > >> > options:
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> 1. Skip to the end (latest behavior) — risk losing
> data
> > >> that
> > >> > > >>>>> was
> > >> > > >>>>>>>> produced
> > >> > > >>>>>>>>>>> during
> > >> > > >>>>>>>>>>> the group's lifetime but not yet consumed.
> > >> > > >>>>>>>>>>> 2. Seek back to the group creation time (to_start_time
> > >> > > >>>>> behavior) —
> > >> > > >>>>>>>>>>> potentially
> > >> > > >>>>>>>>>>> reprocess some data, but guarantee no data from the
> > >> group's
> > >> > > >>>>>> lifetime
> > >> > > >>>>>>>> is
> > >> > > >>>>>>>>>>> silently lost.
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> to_start_time chooses option 2 because its core
> promise is
> > >> > > >>>>> "never
> > >> > > >>>>>>>> silently
> > >> > > >>>>>>>>>>> lose data
> > >> > > >>>>>>>>>>> produced after the group started." If we fell back to
> > >> latest
> > >> > on
> > >> > > >>>>>>>>>>> out-of-range, we would
> > >> > > >>>>>>>>>>> break this guarantee.
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> I consider users who prefer option 1 can simply use
> > >> > > >>>>>>>>>>> auto.offset.reset=latest.
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>> Best Regards,
> > >> > > >>>>>>>>>>> Jiunn-Yang
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月18日
> > >> 凌晨1:57
> > >> > > >>>>> 寫道：
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> Hi, Jiunn-Yang and Chia-Ping,
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> Thanks for the reply.
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> "The core semantic of to_start_time is to read all
> > >> records
> > >> > > >>>>> since
> > >> > > >>>>>> the
> > >> > > >>>>>>>>>>>> creation of the group."
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> I am just questioning whether this actually covers a
> > >> common
> > >> > > >>>>> use
> > >> > > >>>>>>>> case. If
> > >> > > >>>>>>>>>>>> the offset doesn't go out of range, the logic makes
> > >> sense to
> > >> > > >>>>> me.
> > >> > > >>>>>> I'm
> > >> > > >>>>>>>> not
> > >> > > >>>>>>>>>>>> sure about the logic if the offset is out of range.
> If a
> > >> > user
> > >> > > >>>>>>>> chooses to
> > >> > > >>>>>>>>>>>> skip the historical data when starting the group, it
> > >> seems
> > >> > the
> > >> > > >>>>>> user
> > >> > > >>>>>>>>>>> likely
> > >> > > >>>>>>>>>>>> wants to do the same if the offset is out of range.
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> Jun
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <
> [email protected]>
> > >> > > >>>>> wrote:
> > >> > > >>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Hello Jun,
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Thank for the feedback,
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Adding to the points above:
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Regarding by_duration as an alternative to Scenario
> 1:
> > >> > beyond
> > >> > > >>>>>> clock
> > >> > > >>>>>>>> skew
> > >> > > >>>>>>>>>>>>> and retry issues, there is also a usability concern.
> > >> > > >>>>> by_duration
> > >> > > >>>>>>>>>>> requires
> > >> > > >>>>>>>>>>>>> users
> > >> > > >>>>>>>>>>>>> to reason about operational timing — "how long does
> > >> > partition
> > >> > > >>>>>>>> discovery
> > >> > > >>>>>>>>>>>>> take
> > >> > > >>>>>>>>>>>>> in my environment?”, and then translate that into a
> > >> > > >>>>>> configuration
> > >> > > >>>>>>>> value.
> > >> > > >>>>>>>>>>>>> to_start_time
> > >> > > >>>>>>>>>>>>> requires no such reasoning. It simply anchors to the
> > >> group
> > >> > > >>>>>> creation
> > >> > > >>>>>>>> time
> > >> > > >>>>>>>>>>>>> recorded
> > >> > > >>>>>>>>>>>>> by the broker.
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Regarding Scenario 2: I'd also like to clarify that
> > >> > > >>>>>> to_start_time
> > >> > > >>>>>>>> does
> > >> > > >>>>>>>>>>> not
> > >> > > >>>>>>>>>>>>> branch between
> > >> > > >>>>>>>>>>>>> "use latest" and "use earliest." It applies the same
> > >> > > >>>>>>>> ListOffsetsRequest
> > >> > > >>>>>>>>>>>>> with the group creation
> > >> > > >>>>>>>>>>>>> timestamp in all cases. The difference in outcome:
> > >> > > >>>>>>>>>>>>> - skipping old data on first start
> > >> > > >>>>>>>>>>>>> - consuming surviving data after truncation
> > >> > > >>>>>>>>>>>>> is a natural consequence of what data exists in the
> > >> > > >>>>> partition at
> > >> > > >>>>>>>> that
> > >> > > >>>>>>>>>>>>> point, not a different policy
> > >> > > >>>>>>>>>>>>> being applied. The rule is always the same.
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>> Best Regards,
> > >> > > >>>>>>>>>>>>> Jiunn-Yang
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日
> > >> 上午9:48
> > >> > 寫道：
> > >> > > >>>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>>> Jun Rao via dev <[email protected]> 於
> 2026年4月17日
> > >> > 凌晨4:57
> > >> > > >>>>>> 寫道：
> > >> > > >>>>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>>> Also, a group is deleted after the consumer has
> been
> > >> idle
> > >> > > >>>>>> longer
> > >> > > >>>>>>>>>>>>>>> than offsets.retention.minutes. What's the
> semantic of
> > >> > > >>>>>>>> to_start_time
> > >> > > >>>>>>>>>>> if
> > >> > > >>>>>>>>>>>>> the
> > >> > > >>>>>>>>>>>>>>> group creation time is unavailable?
> > >> > > >>>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>> If the group is recreated, a new creation time
> will be
> > >> > > >>>>>> recorded.
> > >> > > >>>>>>>> Hence,
> > >> > > >>>>>>>>>>>>> it acts like a new group. Plus, it throws an
> exception
> > >> > > >>>>> directly
> > >> > > >>>>>> if
> > >> > > >>>>>>>> the
> > >> > > >>>>>>>>>>>>> group truly has no creation time.
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>>>
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>>
> > >> > > >>>>>>>>
> > >> > > >>>>>>>
> > >> > > >>>>>>
> > >> > > >>>>>
> > >> > > >>>>
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Reply via email to