Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Chia-Ping Tsai Sat, 25 Apr 2026 04:30:05 -0700

hi Jun

Honestly, we've seen a similar "case storm" in our local community discussions. 
Some feel a new policy could revolutionize existing pipelines, while others 
find it overly complicated to mentally juggle all these offset edge cases.


I also realize that introducing a completely new policy just to overcome the 
"data loss on partition expansion" issue might be a bit overkill for now. We 
can always revisit a brand-new policy later.

For now, I'd like to pivot back to the original pain point: how to avoid losing 
"hot" records from newly expanded partitions when using the latest policy. The 
tricky part is that expanded partitions aren't always "hot" to consumers. For 
instance, if a partition is expanded while the consumer is offline for a long 
period, the user would likely prefer to skip to the end upon resuming, as those 
records are no longer fresh.

Therefore, I'd like to propose a new consumer config: 
auto.offset.reset.latest.max.age (Ryan's discussion inspires it). When a 
consumer is using the latest policy, it can rely on this threshold to determine 
its behavior on partitions without a committed offset. If the partition's "age" 
is within this threshold (i.e., it's a recently expanded partition), we fall 
back to earliest to catch the hot data. If it exceeds the age, or if the age is 
unavailable (e.g., older broker versions), it strictly adheres to latest.

This partition "age" could be returned via the consumer heartbeat. The age 
would be calculated server-side by the coordinator: coordinator's current time 
- partition creation time. This inherently means we would need to modify the 
partition records to store the creation time, as well as update the heartbeat 
RPC to pass this relative age.

We plan to draft a separate KIP for auto.offset.reset.latest.max.age and start 
a new thread for it to keep things focused. We can leave this current thread 
open for any broader discussions on completely new policies.

Any feedback on this new direction is highly welcome. Thanks everyone for the 
incredible brainstorming session!

Best,
Chia-Ping

On 2026/04/23 20:44:41 Jun Rao via dev wrote:
> Hi, Chia-Ping,
> 
> Thanks for the reply.
> 
> "read all records produced since the group's birth."
> Let's consider this requirement a bit more. For CDC use cases, users don't
> want to lose any data. The easiest option is to consume data with the
> earliest offset. Sometimes, there are good reasons to skip the backlog. For
> example, the downstream system already obtains a database snapshot through
> another channel. However, in this case, the user usually needs to set the
> initial offsets carefully to match the snapshot's timestamp and avoid data
> loss. Starting from the group creation time doesn't seem to meet the
> business need in this case.
> 
> Jun
> 
> 
> On Thu, Apr 23, 2026 at 11:49 AM Chia-Ping Tsai <[email protected]> wrote:
> 
> > hi Jun
> >
> > > This seems to
> > fit the current auto.offset.reset framework more naturally.
> >
> > Your point about the existing framework is well-taken, but it highlights a
> > key distinction this KIP aims to address.
> >
> > If a user simply wants a "Smarter Latest" (one that avoids data loss from
> > extended partitions), they could indeed use by_duration=5mins as a
> > reasonable workaround.
> >
> > However, there is currently no workaround for a policy that guarantees
> > "read all records produced since the group's birth." > This is a critical
> > requirement for data pipelines like OLTP (MySQL/Postgres) -> Kafka -> OLAP
> > (ClickHouse/Snowflake). These users often use latest initially to avoid a
> > massive historical backlog, but they have a "Zero Data Loss" requirement
> > once the pipeline is active.
> >
> > When these users encounter an "out-of-range" error, they want to consume
> > every surviving record in Kafka that belongs to their group's lifetime. If
> > we force them to jump to the end, it means they have to manually re-load
> > and backfill significantly more "lost records" from the source OLTP, which
> > is a high-cost operational burden.
> >
> > In short, the policy offered by this KIP is not just another option; it
> > provides a deterministic lifecycle anchor that cannot be emulated by the
> > current policies.
> >
> > Jun Rao via dev <[email protected]> 於 2026年4月24日週五 上午1:38寫道：
> >
> >> Hi, Chia-Ping, Jiunn-Yang, and Jian,
> >>
> >> Thanks for the reply. I appreciate your effort in trying to address a
> >> common issue.
> >>
> >> To me, history and data are the same as the backlog. It's just that the
> >> amount of backlog can vary. When the group is first created or when the
> >> offset is out of range, the backlog is large. When a new partition is
> >> created and discovered by the consumer, the backlog is small (5 seconds of
> >> data for the new consumer, 5 minutes for the classic consumer). The
> >> question is how much backlog a user can tolerate. The to_start_time option
> >> implicitly assumes that a user can tolerate 0 backlog in one case but 5
> >> seconds or 5 minutes in another. This may or may not be what a user wants,
> >> but at least it seems inconsistent. An alternative is to document all
> >> cases
> >> where a backlog can occur and let the user choose how much backlog they
> >> can
> >> tolerate, configuring it with the existing by_during option. This seems to
> >> fit the current auto.offset.reset framework more naturally.
> >>
> >> Jun
> >>
> >>
> >> On Thu, Apr 23, 2026 at 6:23 AM jian fu <[email protected]> wrote:
> >>
> >> > Hi All:
> >> >
> >> > Since Jun Yang referenced my earlier discussion, I’d also like to join
> >> in
> >> > and share some of my thoughts.
> >> >
> >> > The key area of minor divergence is this case's handle:
> >> > " When the user starts the group for the first time, it faces a choice
> >> on
> >> > whether to process the backlog or not. When the offset is out-of-range,
> >> the
> >> > user faces the same
> >> > choice regarding backlog processing. "
> >> >
> >> > so I think we have four options to handle two key choice:
> >> > 1 latest:  drop history + drop the data
> >> > 2 earliest:  not drop history + not drop the data
> >> > 3 the KIP propose mode:   drop history + not drop the data
> >> > 4 unreasonable mode:   not drop history + drop the data
> >> >
> >> > I think the 3 is reasonable mode for user (not consider the naming or
> >> > implement). Image one case in life. you may subscribe one magazine
> >> without
> >> > buy the older magazines. but you must don't to lost some magazine after
> >> > subscribe due to you don't buy history.
> >> >
> >> > Regards
> >> > Jian
> >> >
> >> >
> >> > 黃竣陽 <[email protected]> 于2026年4月23日周四 19:17写道：
> >> >
> >> > > Hello all,
> >> > >
> >> > > Thanks for the feedback. I'd like to advocate for keeping the original
> >> > > to_start_time semantics.
> >> > >
> >> > > Earlier in this thread, both Jian and Ryan highlighted that branched
> >> > logic
> >> > > is the main UX concern:
> >> > >
> >> > > Jian: "If we can define one basic rule… it would make it easier for
> >> > >         everyone to stay on the same page."
> >> > > Ryan: "The documentation might be difficult if it has to
> >> > >         list and explain all the cases."
> >> > > Chia-Ping: "Having an opinionated config with branched logic makes it
> >> > hard
> >> > >         to document and reason about."
> >> > >
> >> > > to_start_time already follows this principle, it consistently issues a
> >> > > ListOffsets request
> >> > > anchored to the group creation timestamp. Differences in outcome are
> >> > > simply due to what
> >> > > data the broker retains, not different rules being applied. Changing
> >> > > out-of-range to latest
> >> > > would be the real inconsistency, since the policy would then branch
> >> based
> >> > > on the reset
> >> > > scenario.
> >> > >
> >> > > Additionally, out-of-range and no-offset (group GC'd) are
> >> fundamentally
> >> > > different situations.
> >> > > When the group exists, the creation timestamp is available and should
> >> be
> >> > > honored. When
> >> > > the group is GC'd, the metadata is gone, this is an orthogonal problem
> >> > > that affects all reset
> >> > > policies equally.
> >> > >
> >> > > The strength of to_start_time is precisely its single, clean rule:
> >> > "Always
> >> > > seek to the group’s
> >> > > creation time, and let ListOffsets resolve the rest."
> >> > >
> >> > > Best Regards,
> >> > > Jiunn-Yang
> >> > >
> >> > > > Chia-Ping Tsai <[email protected]> 於 2026年4月23日 下午3:24 寫道：
> >> > > >
> >> > > > Hi all,
> >> > > >
> >> > > > BTW, regardless of where we land on the "out-of-range" debate, the
> >> > > underlying infrastructure of persisting the "group creation time" is
> >> > still
> >> > > highly valuable and worth merging.
> >> > > >
> >> > > > From my conversations with users, there are diverse needs: some love
> >> > the
> >> > > "better earliest" idea to safely skip massive historical backlogs,
> >> while
> >> > > others only care about fixing the data loss in latest during partition
> >> > > expansion.
> >> > > >
> >> > > > Simply having the creation time persisted and exposed is already a
> >> > > massive step forward, as it gives users a reliable, objective anchor
> >> to
> >> > > manually fix the issue via a ConsumerRebalanceListener. However, much
> >> > like
> >> > > the concept of a DLQ (Dead Letter Queue) while users could implement
> >> it
> >> > > manually, providing a built-in reset policy makes the developer
> >> > experience
> >> > > significantly more convenient, robust, and out-of-the-box.
> >> > > >
> >> > > > I believe Ken might chime in later with a different perspective as
> >> well
> >> > > :)
> >> > > >
> >> > > > Best,
> >> > > > Chia-Ping
> >> > > >
> >> > > >
> >> > > >> Chia-Ping Tsai <[email protected]> 於 2026年4月23日 凌晨3:59 寫道：
> >> > > >>
> >> > > >> Hi Jun,
> >> > > >>
> >> > > >> Thanks for the feedback. I agree that shifting this policy toward a
> >> > > "Smarter Latest" (rather than a better Earliest) is a more elegant
> >> path.
> >> > > >>
> >> > > >> The refined behavior would be:
> >> > > >>
> >> > > >> Out-of-range: Strictly follow latest semantics. This ensures a
> >> > > predictable "skip to end" behavior when users fall behind retention.
> >> > > >>
> >> > > >> No-offset (Initial Start & Expansion): Leverage Group Creation Time
> >> > for
> >> > > lookup.
> >> > > >>
> >> > > >> • For new groups, this naturally results in latest behavior since
> >> > > creation time is "now".
> >> > > >>
> >> > > >> • For existing groups discovering new partitions, this results in
> >> > > earliest behavior for those specific partitions.
> >> > > >>
> >> > > >> Group GC: If a group is purged, it is treated as a brand-new group
> >> > with
> >> > > a creation time of "now," consistently skipping to the end.
> >> > > >>
> >> > > >> WDYT?
> >> > > >>
> >> > > >>
> >> > > >>> Jun Rao via dev <[email protected]> 於 2026年4月23日 凌晨1:34 寫道：
> >> > > >>>
> >> > > >>> Hi, Chia-Ping,
> >> > > >>>
> >> > > >>> Thanks for the reply.
> >> > > >>>
> >> > > >>> Let's try to understand from the user's perspective. When the user
> >> > > starts
> >> > > >>> the group for the first time, it faces a choice on whether to
> >> process
> >> > > the
> >> > > >>> backlog or not. When the offset is out-of-range, the user faces
> >> the
> >> > > same
> >> > > >>> choice regarding backlog processing. It seems that most users
> >> want to
> >> > > make
> >> > > >>> the same choice regarding backlog processing.
> >> > > >>>
> >> > > >>> "Users who explicitly choose the to_start_time policy do so
> >> precisely
> >> > > >>> because they do not want to skip any records when encountering an
> >> > > >>> out-of-range scenario."
> >> > > >>> This argument is weak because that's how to_start_time is
> >> designed,
> >> > > but we
> >> > > >>> need to justify why it is a good choice in the first place.
> >> > > >>>
> >> > > >>> Jun
> >> > > >>>
> >> > > >>>>> On Tue, Apr 21, 2026 at 12:35 PM Chia-Ping Tsai <
> >> > [email protected]>
> >> > > wrote:
> >> > > >>>>
> >> > > >>>> Hi Jun,
> >> > > >>>>
> >> > > >>>> Thanks for the clarification. I think I misunderstood your
> >> previous
> >> > > point.
> >> > > >>>> Let me summarize the scenarios to ensure we are fully aligned.
> >> > > >>>>
> >> > > >>>> There are essentially three scenarios when a consumer needs to
> >> reset
> >> > > >>>> offsets:
> >> > > >>>>
> >> > > >>>> 1.
> >> > > >>>>
> >> > > >>>> Out-of-range (The group exists, but the offset has expired).
> >> > > >>>> 2.
> >> > > >>>>
> >> > > >>>> Extended partition (The group exists, but encounters a newly
> >> added
> >> > > >>>> partition with no committed offset).
> >> > > >>>> 3.
> >> > > >>>>
> >> > > >>>> No-offset (The group is completely new, or an existing group was
> >> > > >>>> deleted by the GC).
> >> > > >>>>
> >> > > >>>> We all agree that the primary goal of this KIP is to catch up on
> >> all
> >> > > >>>> records for scenario 2. There are no objections here.
> >> > > >>>>
> >> > > >>>> Regarding the inconsistency you pointed out between 1) and 3)
> >> under
> >> > > the
> >> > > >>>> current to_start_time design, I completely see your point. If
> >> users
> >> > > are
> >> > > >>>> not fully aware that to_start_time is designed to read all
> >> records
> >> > > since
> >> > > >>>> the creation of the group, they might get confused.
> >> > > >>>>
> >> > > >>>> However, to me, this "inconsistency" is actually a matter of
> >> > > >>>> predictability. Users who explicitly choose the to_start_time
> >> policy
> >> > > do
> >> > > >>>> so precisely because they do not want to skip any records when
> >> > > encountering
> >> > > >>>> an out-of-range scenario.
> >> > > >>>>
> >> > > >>>> (I would prefer to set aside the topic of group GC for a moment.
> >> It
> >> > is
> >> > > >>>> much more important that we first focus our discussion on the
> >> > > >>>> "out-of-range" scenario)
> >> > > >>>>
> >> > > >>>> Best,
> >> > > >>>>
> >> > > >>>> Chia-Ping
> >> > > >>>>
> >> > > >>>> Jun Rao via dev <[email protected]> 於 2026年4月22日週三 上午1:13寫道：
> >> > > >>>>
> >> > > >>>>> Hi, Chia-Ping,
> >> > > >>>>>
> >> > > >>>>> Hmm, is that true? With the earliest policy, we treat an
> >> > out-of-range
> >> > > >>>>> offset the same as no offset (because the group is deleted) and
> >> > > always set
> >> > > >>>>> it to the earliest offset, right? With to_start_time, an
> >> > out-of-range
> >> > > >>>>> offset is treated differently from no offset.
> >> > > >>>>>
> >> > > >>>>> Thanks,
> >> > > >>>>>
> >> > > >>>>> Jun
> >> > > >>>>>
> >> > > >>>>> On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai <
> >> > [email protected]
> >> > > >
> >> > > >>>>> wrote:
> >> > > >>>>>
> >> > > >>>>>> hi Jun
> >> > > >>>>>>
> >> > > >>>>>> Nice point. Group GC is definitely an issue for to_start_time,
> >> but
> >> > > it is
> >> > > >>>>>> actually an issue for other policies as well.
> >> > > >>>>>>
> >> > > >>>>>> For example, a consumer using the earliest policy will suddenly
> >> > > read all
> >> > > >>>>>> historical records from scratch if it sleeps for a long while
> >> and
> >> > > gets
> >> > > >>>>>> GC'd; otherwise, it just resumes from previous offsets if the
> >> > group
> >> > > >>>>> still
> >> > > >>>>>> exists. It is equally hard to explain to users: "Oh, your group
> >> > was
> >> > > >>>>> GC'd,
> >> > > >>>>>> so your offset behavior changed."
> >> > > >>>>>>
> >> > > >>>>>> Therefore, it seems to me the right approach to fix this
> >> > > "inconsistency"
> >> > > >>>>>> is to offer a group-level GC timeout in a future KIP, allowing
> >> > > users to
> >> > > >>>>>> explicitly protect critical groups from GC. This saves not only
> >> > > >>>>>> to_start_time, but all other reset policies too.
> >> > > >>>>>>
> >> > > >>>>>> Best,
> >> > > >>>>>> Chia-Ping
> >> > > >>>>>>
> >> > > >>>>>> On 2026/04/20 20:19:47 Jun Rao via dev wrote:
> >> > > >>>>>>> Hi, Jiunn-Yang and Chia-Ping,
> >> > > >>>>>>>
> >> > > >>>>>>> Thanks for the reply.
> >> > > >>>>>>>
> >> > > >>>>>>> The main concern I see with to_start_time is that its
> >> behavoir on
> >> > > how
> >> > > >>>>>> much
> >> > > >>>>>>> data to consume when the offset is out of range is not
> >> consistent
> >> > > and
> >> > > >>>>> is
> >> > > >>>>>>> hard to explain. If the group still exists, it will read from
> >> the
> >> > > >>>>>> earliest
> >> > > >>>>>>> offset. Otherwise, it will read from the latest.
> >> > > >>>>>>>
> >> > > >>>>>>> Jun
> >> > > >>>>>>>
> >> > > >>>>>>> On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <
> >> > > [email protected]>
> >> > > >>>>>> wrote:
> >> > > >>>>>>>
> >> > > >>>>>>>> hi all,
> >> > > >>>>>>>>
> >> > > >>>>>>>> Just a note for a potential latest_v2:
> >> > > >>>>>>>>
> >> > > >>>>>>>> Since the purpose is to read all records from extended
> >> > partitions,
> >> > > >>>>> we
> >> > > >>>>>>>> could leverage the group creation time to compare against the
> >> > > >>>>> earliest
> >> > > >>>>>>>> record of a partition when there is no committed offset. If
> >> the
> >> > > >>>>> group
> >> > > >>>>>>>> creation time is larger than the earliest record's
> >> timestamp, we
> >> > > >>>>>> assume it
> >> > > >>>>>>>> is not an extended partition. Otherwise, we treat it as an
> >> > > extended
> >> > > >>>>>>>> partition.
> >> > > >>>>>>>>
> >> > > >>>>>>>> This approach allows us to catch all "possible" extended
> >> > > partitions,
> >> > > >>>>>> which
> >> > > >>>>>>>> includes both "true" extended partitions and old but
> >> truncated
> >> > > >>>>>> partitions.
> >> > > >>>>>>>> While there is a rare edge case where the cost is
> >> reprocessing
> >> > > some
> >> > > >>>>>> records
> >> > > >>>>>>>> we don't necessarily want, it is very easy to implement and
> >> > > >>>>> guarantees
> >> > > >>>>>> we
> >> > > >>>>>>>> will never miss the actual extended partitions.
> >> > > >>>>>>>>
> >> > > >>>>>>>> Best,
> >> > > >>>>>>>> Chia-Ping
> >> > > >>>>>>>>
> >> > > >>>>>>>> On 2026/04/20 13:33:31 黃竣陽 wrote:
> >> > > >>>>>>>>> Hello all,
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> I have added a new "Future Work: latest_strict Policy"
> >> section
> >> > to
> >> > > >>>>> the
> >> > > >>>>>>>> KIP.
> >> > > >>>>>>>>> The idea is a future policy that uses latest semantics by
> >> > default
> >> > > >>>>> but
> >> > > >>>>>>>> falls
> >> > > >>>>>>>>> back to the group creation timestamp specifically for newly
> >> > added
> >> > > >>>>>>>> partitions
> >> > > >>>>>>>>> during partition expansion. This would reuse the group
> >> creation
> >> > > >>>>> time
> >> > > >>>>>>>> anchor
> >> > > >>>>>>>>> introduced by this KIP, making it a natural extension with
> >> > > minimal
> >> > > >>>>>>>> additional
> >> > > >>>>>>>>> protocol changes.
> >> > > >>>>>>>>>
> >> > > >>>>>>>>> Best Regards,
> >> > > >>>>>>>>> Jiunn-Yang
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09
> >> 寫道：
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> Hi all,
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> It is practically NP-hard to guess everyone's ideal use
> >> case
> >> > > >>>>> right
> >> > > >>>>>> now.
> >> > > >>>>>>>>>> Also, I believe we all want to avoid falling back to the
> >> > > >>>>> intricate
> >> > > >>>>>>>>>> multi-policy approach proposed in KIP-842.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> I prefer to keep this KIP focused and discuss a "v2 latest"
> >> > > >>>>> policy
> >> > > >>>>>> in a
> >> > > >>>>>>>>>> separate KIP. That future policy could build upon the
> >> > > >>>>> to_start_time
> >> > > >>>>>>>> anchor
> >> > > >>>>>>>>>> to fix data loss specifically for extended partitions. We
> >> > could
> >> > > >>>>>> call it
> >> > > >>>>>>>>>> something like latest_strict.
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> Thoughts?
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道：
> >> > > >>>>>>>>>>
> >> > > >>>>>>>>>>> Hello Jun,
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> Thanks for the reply,
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> When the offset goes out of range, the user faces two
> >> > options:
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> 1. Skip to the end (latest behavior) — risk losing data
> >> that
> >> > > >>>>> was
> >> > > >>>>>>>> produced
> >> > > >>>>>>>>>>> during
> >> > > >>>>>>>>>>> the group's lifetime but not yet consumed.
> >> > > >>>>>>>>>>> 2. Seek back to the group creation time (to_start_time
> >> > > >>>>> behavior) —
> >> > > >>>>>>>>>>> potentially
> >> > > >>>>>>>>>>> reprocess some data, but guarantee no data from the
> >> group's
> >> > > >>>>>> lifetime
> >> > > >>>>>>>> is
> >> > > >>>>>>>>>>> silently lost.
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> to_start_time chooses option 2 because its core promise is
> >> > > >>>>> "never
> >> > > >>>>>>>> silently
> >> > > >>>>>>>>>>> lose data
> >> > > >>>>>>>>>>> produced after the group started." If we fell back to
> >> latest
> >> > on
> >> > > >>>>>>>>>>> out-of-range, we would
> >> > > >>>>>>>>>>> break this guarantee.
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> I consider users who prefer option 1 can simply use
> >> > > >>>>>>>>>>> auto.offset.reset=latest.
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>> Best Regards,
> >> > > >>>>>>>>>>> Jiunn-Yang
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月18日
> >> 凌晨1:57
> >> > > >>>>> 寫道：
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> Hi, Jiunn-Yang and Chia-Ping,
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> Thanks for the reply.
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> "The core semantic of to_start_time is to read all
> >> records
> >> > > >>>>> since
> >> > > >>>>>> the
> >> > > >>>>>>>>>>>> creation of the group."
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> I am just questioning whether this actually covers a
> >> common
> >> > > >>>>> use
> >> > > >>>>>>>> case. If
> >> > > >>>>>>>>>>>> the offset doesn't go out of range, the logic makes
> >> sense to
> >> > > >>>>> me.
> >> > > >>>>>> I'm
> >> > > >>>>>>>> not
> >> > > >>>>>>>>>>>> sure about the logic if the offset is out of range. If a
> >> > user
> >> > > >>>>>>>> chooses to
> >> > > >>>>>>>>>>>> skip the historical data when starting the group, it
> >> seems
> >> > the
> >> > > >>>>>> user
> >> > > >>>>>>>>>>> likely
> >> > > >>>>>>>>>>>> wants to do the same if the offset is out of range.
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> Jun
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]>
> >> > > >>>>> wrote:
> >> > > >>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Hello Jun,
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Thank for the feedback,
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Adding to the points above:
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Regarding by_duration as an alternative to Scenario 1:
> >> > beyond
> >> > > >>>>>> clock
> >> > > >>>>>>>> skew
> >> > > >>>>>>>>>>>>> and retry issues, there is also a usability concern.
> >> > > >>>>> by_duration
> >> > > >>>>>>>>>>> requires
> >> > > >>>>>>>>>>>>> users
> >> > > >>>>>>>>>>>>> to reason about operational timing — "how long does
> >> > partition
> >> > > >>>>>>>> discovery
> >> > > >>>>>>>>>>>>> take
> >> > > >>>>>>>>>>>>> in my environment?”, and then translate that into a
> >> > > >>>>>> configuration
> >> > > >>>>>>>> value.
> >> > > >>>>>>>>>>>>> to_start_time
> >> > > >>>>>>>>>>>>> requires no such reasoning. It simply anchors to the
> >> group
> >> > > >>>>>> creation
> >> > > >>>>>>>> time
> >> > > >>>>>>>>>>>>> recorded
> >> > > >>>>>>>>>>>>> by the broker.
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Regarding Scenario 2: I'd also like to clarify that
> >> > > >>>>>> to_start_time
> >> > > >>>>>>>> does
> >> > > >>>>>>>>>>> not
> >> > > >>>>>>>>>>>>> branch between
> >> > > >>>>>>>>>>>>> "use latest" and "use earliest." It applies the same
> >> > > >>>>>>>> ListOffsetsRequest
> >> > > >>>>>>>>>>>>> with the group creation
> >> > > >>>>>>>>>>>>> timestamp in all cases. The difference in outcome:
> >> > > >>>>>>>>>>>>> - skipping old data on first start
> >> > > >>>>>>>>>>>>> - consuming surviving data after truncation
> >> > > >>>>>>>>>>>>> is a natural consequence of what data exists in the
> >> > > >>>>> partition at
> >> > > >>>>>>>> that
> >> > > >>>>>>>>>>>>> point, not a different policy
> >> > > >>>>>>>>>>>>> being applied. The rule is always the same.
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>> Best Regards,
> >> > > >>>>>>>>>>>>> Jiunn-Yang
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日
> >> 上午9:48
> >> > 寫道：
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日
> >> > 凌晨4:57
> >> > > >>>>>> 寫道：
> >> > > >>>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>>> Also, a group is deleted after the consumer has been
> >> idle
> >> > > >>>>>> longer
> >> > > >>>>>>>>>>>>>>> than offsets.retention.minutes. What's the semantic of
> >> > > >>>>>>>> to_start_time
> >> > > >>>>>>>>>>> if
> >> > > >>>>>>>>>>>>> the
> >> > > >>>>>>>>>>>>>>> group creation time is unavailable?
> >> > > >>>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>> If the group is recreated, a new creation time will be
> >> > > >>>>>> recorded.
> >> > > >>>>>>>> Hence,
> >> > > >>>>>>>>>>>>> it acts like a new group. Plus, it throws an exception
> >> > > >>>>> directly
> >> > > >>>>>> if
> >> > > >>>>>>>> the
> >> > > >>>>>>>>>>>>> group truly has no creation time.
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>>>
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>>
> >> > > >>>>>>>>
> >> > > >>>>>>>
> >> > > >>>>>>
> >> > > >>>>>
> >> > > >>>>
> >> > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSS] KIP-1282: Prevent data loss during partition expansion for dynamically added partitions

Reply via email to