Hello all, I've created a new KIP to introduce this config. Here is the discussion thread: <https://lists.apache.org/thread/bp4zk31zr1sdxjsspg7b7bqddmm9t4gn>
Feedback and comments are very welcome! Best Regards, Jiunn-Yang > Jun Rao via dev <[email protected]> 於 2026年4月28日 凌晨12:49 寫道: > > Hi, Chia-Ping, > > Thanks for the reply. > > I agree that we should only add new options that cover a common use case. > For auto.offset.reset.latest.max.age, it would be useful to compare it with > by_duration. > > Jun > > On Sat, Apr 25, 2026 at 4:30 AM Chia-Ping Tsai <[email protected]> wrote: > >> hi Jun >> >> Honestly, we've seen a similar "case storm" in our local community >> discussions. Some feel a new policy could revolutionize existing pipelines, >> while others find it overly complicated to mentally juggle all these offset >> edge cases. >> >> I also realize that introducing a completely new policy just to overcome >> the "data loss on partition expansion" issue might be a bit overkill for >> now. We can always revisit a brand-new policy later. >> >> For now, I'd like to pivot back to the original pain point: how to avoid >> losing "hot" records from newly expanded partitions when using the latest >> policy. The tricky part is that expanded partitions aren't always "hot" to >> consumers. For instance, if a partition is expanded while the consumer is >> offline for a long period, the user would likely prefer to skip to the end >> upon resuming, as those records are no longer fresh. >> >> Therefore, I'd like to propose a new consumer config: >> auto.offset.reset.latest.max.age (Ryan's discussion inspires it). When a >> consumer is using the latest policy, it can rely on this threshold to >> determine its behavior on partitions without a committed offset. If the >> partition's "age" is within this threshold (i.e., it's a recently expanded >> partition), we fall back to earliest to catch the hot data. If it exceeds >> the age, or if the age is unavailable (e.g., older broker versions), it >> strictly adheres to latest. >> >> This partition "age" could be returned via the consumer heartbeat. The age >> would be calculated server-side by the coordinator: coordinator's current >> time - partition creation time. This inherently means we would need to >> modify the partition records to store the creation time, as well as update >> the heartbeat RPC to pass this relative age. >> >> We plan to draft a separate KIP for auto.offset.reset.latest.max.age and >> start a new thread for it to keep things focused. We can leave this current >> thread open for any broader discussions on completely new policies. >> >> Any feedback on this new direction is highly welcome. Thanks everyone for >> the incredible brainstorming session! >> >> Best, >> Chia-Ping >> >> On 2026/04/23 20:44:41 Jun Rao via dev wrote: >>> Hi, Chia-Ping, >>> >>> Thanks for the reply. >>> >>> "read all records produced since the group's birth." >>> Let's consider this requirement a bit more. For CDC use cases, users >> don't >>> want to lose any data. The easiest option is to consume data with the >>> earliest offset. Sometimes, there are good reasons to skip the backlog. >> For >>> example, the downstream system already obtains a database snapshot >> through >>> another channel. However, in this case, the user usually needs to set the >>> initial offsets carefully to match the snapshot's timestamp and avoid >> data >>> loss. Starting from the group creation time doesn't seem to meet the >>> business need in this case. >>> >>> Jun >>> >>> >>> On Thu, Apr 23, 2026 at 11:49 AM Chia-Ping Tsai <[email protected]> >> wrote: >>> >>>> hi Jun >>>> >>>>> This seems to >>>> fit the current auto.offset.reset framework more naturally. >>>> >>>> Your point about the existing framework is well-taken, but it >> highlights a >>>> key distinction this KIP aims to address. >>>> >>>> If a user simply wants a "Smarter Latest" (one that avoids data loss >> from >>>> extended partitions), they could indeed use by_duration=5mins as a >>>> reasonable workaround. >>>> >>>> However, there is currently no workaround for a policy that guarantees >>>> "read all records produced since the group's birth." > This is a >> critical >>>> requirement for data pipelines like OLTP (MySQL/Postgres) -> Kafka -> >> OLAP >>>> (ClickHouse/Snowflake). These users often use latest initially to >> avoid a >>>> massive historical backlog, but they have a "Zero Data Loss" >> requirement >>>> once the pipeline is active. >>>> >>>> When these users encounter an "out-of-range" error, they want to >> consume >>>> every surviving record in Kafka that belongs to their group's >> lifetime. If >>>> we force them to jump to the end, it means they have to manually >> re-load >>>> and backfill significantly more "lost records" from the source OLTP, >> which >>>> is a high-cost operational burden. >>>> >>>> In short, the policy offered by this KIP is not just another option; it >>>> provides a deterministic lifecycle anchor that cannot be emulated by >> the >>>> current policies. >>>> >>>> Jun Rao via dev <[email protected]> 於 2026年4月24日週五 上午1:38寫道: >>>> >>>>> Hi, Chia-Ping, Jiunn-Yang, and Jian, >>>>> >>>>> Thanks for the reply. I appreciate your effort in trying to address a >>>>> common issue. >>>>> >>>>> To me, history and data are the same as the backlog. It's just that >> the >>>>> amount of backlog can vary. When the group is first created or when >> the >>>>> offset is out of range, the backlog is large. When a new partition is >>>>> created and discovered by the consumer, the backlog is small (5 >> seconds of >>>>> data for the new consumer, 5 minutes for the classic consumer). The >>>>> question is how much backlog a user can tolerate. The to_start_time >> option >>>>> implicitly assumes that a user can tolerate 0 backlog in one case but >> 5 >>>>> seconds or 5 minutes in another. This may or may not be what a user >> wants, >>>>> but at least it seems inconsistent. An alternative is to document all >>>>> cases >>>>> where a backlog can occur and let the user choose how much backlog >> they >>>>> can >>>>> tolerate, configuring it with the existing by_during option. This >> seems to >>>>> fit the current auto.offset.reset framework more naturally. >>>>> >>>>> Jun >>>>> >>>>> >>>>> On Thu, Apr 23, 2026 at 6:23 AM jian fu <[email protected]> wrote: >>>>> >>>>>> Hi All: >>>>>> >>>>>> Since Jun Yang referenced my earlier discussion, I’d also like to >> join >>>>> in >>>>>> and share some of my thoughts. >>>>>> >>>>>> The key area of minor divergence is this case's handle: >>>>>> " When the user starts the group for the first time, it faces a >> choice >>>>> on >>>>>> whether to process the backlog or not. When the offset is >> out-of-range, >>>>> the >>>>>> user faces the same >>>>>> choice regarding backlog processing. " >>>>>> >>>>>> so I think we have four options to handle two key choice: >>>>>> 1 latest: drop history + drop the data >>>>>> 2 earliest: not drop history + not drop the data >>>>>> 3 the KIP propose mode: drop history + not drop the data >>>>>> 4 unreasonable mode: not drop history + drop the data >>>>>> >>>>>> I think the 3 is reasonable mode for user (not consider the naming >> or >>>>>> implement). Image one case in life. you may subscribe one magazine >>>>> without >>>>>> buy the older magazines. but you must don't to lost some magazine >> after >>>>>> subscribe due to you don't buy history. >>>>>> >>>>>> Regards >>>>>> Jian >>>>>> >>>>>> >>>>>> 黃竣陽 <[email protected]> 于2026年4月23日周四 19:17写道: >>>>>> >>>>>>> Hello all, >>>>>>> >>>>>>> Thanks for the feedback. I'd like to advocate for keeping the >> original >>>>>>> to_start_time semantics. >>>>>>> >>>>>>> Earlier in this thread, both Jian and Ryan highlighted that >> branched >>>>>> logic >>>>>>> is the main UX concern: >>>>>>> >>>>>>> Jian: "If we can define one basic rule… it would make it easier >> for >>>>>>> everyone to stay on the same page." >>>>>>> Ryan: "The documentation might be difficult if it has to >>>>>>> list and explain all the cases." >>>>>>> Chia-Ping: "Having an opinionated config with branched logic >> makes it >>>>>> hard >>>>>>> to document and reason about." >>>>>>> >>>>>>> to_start_time already follows this principle, it consistently >> issues a >>>>>>> ListOffsets request >>>>>>> anchored to the group creation timestamp. Differences in outcome >> are >>>>>>> simply due to what >>>>>>> data the broker retains, not different rules being applied. >> Changing >>>>>>> out-of-range to latest >>>>>>> would be the real inconsistency, since the policy would then >> branch >>>>> based >>>>>>> on the reset >>>>>>> scenario. >>>>>>> >>>>>>> Additionally, out-of-range and no-offset (group GC'd) are >>>>> fundamentally >>>>>>> different situations. >>>>>>> When the group exists, the creation timestamp is available and >> should >>>>> be >>>>>>> honored. When >>>>>>> the group is GC'd, the metadata is gone, this is an orthogonal >> problem >>>>>>> that affects all reset >>>>>>> policies equally. >>>>>>> >>>>>>> The strength of to_start_time is precisely its single, clean rule: >>>>>> "Always >>>>>>> seek to the group’s >>>>>>> creation time, and let ListOffsets resolve the rest." >>>>>>> >>>>>>> Best Regards, >>>>>>> Jiunn-Yang >>>>>>> >>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月23日 下午3:24 寫道: >>>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> BTW, regardless of where we land on the "out-of-range" debate, >> the >>>>>>> underlying infrastructure of persisting the "group creation time" >> is >>>>>> still >>>>>>> highly valuable and worth merging. >>>>>>>> >>>>>>>> From my conversations with users, there are diverse needs: some >> love >>>>>> the >>>>>>> "better earliest" idea to safely skip massive historical backlogs, >>>>> while >>>>>>> others only care about fixing the data loss in latest during >> partition >>>>>>> expansion. >>>>>>>> >>>>>>>> Simply having the creation time persisted and exposed is >> already a >>>>>>> massive step forward, as it gives users a reliable, objective >> anchor >>>>> to >>>>>>> manually fix the issue via a ConsumerRebalanceListener. However, >> much >>>>>> like >>>>>>> the concept of a DLQ (Dead Letter Queue) while users could >> implement >>>>> it >>>>>>> manually, providing a built-in reset policy makes the developer >>>>>> experience >>>>>>> significantly more convenient, robust, and out-of-the-box. >>>>>>>> >>>>>>>> I believe Ken might chime in later with a different perspective >> as >>>>> well >>>>>>> :) >>>>>>>> >>>>>>>> Best, >>>>>>>> Chia-Ping >>>>>>>> >>>>>>>> >>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月23日 凌晨3:59 寫道: >>>>>>>>> >>>>>>>>> Hi Jun, >>>>>>>>> >>>>>>>>> Thanks for the feedback. I agree that shifting this policy >> toward a >>>>>>> "Smarter Latest" (rather than a better Earliest) is a more elegant >>>>> path. >>>>>>>>> >>>>>>>>> The refined behavior would be: >>>>>>>>> >>>>>>>>> Out-of-range: Strictly follow latest semantics. This ensures a >>>>>>> predictable "skip to end" behavior when users fall behind >> retention. >>>>>>>>> >>>>>>>>> No-offset (Initial Start & Expansion): Leverage Group Creation >> Time >>>>>> for >>>>>>> lookup. >>>>>>>>> >>>>>>>>> • For new groups, this naturally results in latest behavior >> since >>>>>>> creation time is "now". >>>>>>>>> >>>>>>>>> • For existing groups discovering new partitions, this results >> in >>>>>>> earliest behavior for those specific partitions. >>>>>>>>> >>>>>>>>> Group GC: If a group is purged, it is treated as a brand-new >> group >>>>>> with >>>>>>> a creation time of "now," consistently skipping to the end. >>>>>>>>> >>>>>>>>> WDYT? >>>>>>>>> >>>>>>>>> >>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月23日 凌晨1:34 >> 寫道: >>>>>>>>>> >>>>>>>>>> Hi, Chia-Ping, >>>>>>>>>> >>>>>>>>>> Thanks for the reply. >>>>>>>>>> >>>>>>>>>> Let's try to understand from the user's perspective. When the >> user >>>>>>> starts >>>>>>>>>> the group for the first time, it faces a choice on whether to >>>>> process >>>>>>> the >>>>>>>>>> backlog or not. When the offset is out-of-range, the user >> faces >>>>> the >>>>>>> same >>>>>>>>>> choice regarding backlog processing. It seems that most users >>>>> want to >>>>>>> make >>>>>>>>>> the same choice regarding backlog processing. >>>>>>>>>> >>>>>>>>>> "Users who explicitly choose the to_start_time policy do so >>>>> precisely >>>>>>>>>> because they do not want to skip any records when >> encountering an >>>>>>>>>> out-of-range scenario." >>>>>>>>>> This argument is weak because that's how to_start_time is >>>>> designed, >>>>>>> but we >>>>>>>>>> need to justify why it is a good choice in the first place. >>>>>>>>>> >>>>>>>>>> Jun >>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 21, 2026 at 12:35 PM Chia-Ping Tsai < >>>>>> [email protected]> >>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi Jun, >>>>>>>>>>> >>>>>>>>>>> Thanks for the clarification. I think I misunderstood your >>>>> previous >>>>>>> point. >>>>>>>>>>> Let me summarize the scenarios to ensure we are fully >> aligned. >>>>>>>>>>> >>>>>>>>>>> There are essentially three scenarios when a consumer needs >> to >>>>> reset >>>>>>>>>>> offsets: >>>>>>>>>>> >>>>>>>>>>> 1. >>>>>>>>>>> >>>>>>>>>>> Out-of-range (The group exists, but the offset has expired). >>>>>>>>>>> 2. >>>>>>>>>>> >>>>>>>>>>> Extended partition (The group exists, but encounters a newly >>>>> added >>>>>>>>>>> partition with no committed offset). >>>>>>>>>>> 3. >>>>>>>>>>> >>>>>>>>>>> No-offset (The group is completely new, or an existing group >> was >>>>>>>>>>> deleted by the GC). >>>>>>>>>>> >>>>>>>>>>> We all agree that the primary goal of this KIP is to catch >> up on >>>>> all >>>>>>>>>>> records for scenario 2. There are no objections here. >>>>>>>>>>> >>>>>>>>>>> Regarding the inconsistency you pointed out between 1) and 3) >>>>> under >>>>>>> the >>>>>>>>>>> current to_start_time design, I completely see your point. If >>>>> users >>>>>>> are >>>>>>>>>>> not fully aware that to_start_time is designed to read all >>>>> records >>>>>>> since >>>>>>>>>>> the creation of the group, they might get confused. >>>>>>>>>>> >>>>>>>>>>> However, to me, this "inconsistency" is actually a matter of >>>>>>>>>>> predictability. Users who explicitly choose the to_start_time >>>>> policy >>>>>>> do >>>>>>>>>>> so precisely because they do not want to skip any records >> when >>>>>>> encountering >>>>>>>>>>> an out-of-range scenario. >>>>>>>>>>> >>>>>>>>>>> (I would prefer to set aside the topic of group GC for a >> moment. >>>>> It >>>>>> is >>>>>>>>>>> much more important that we first focus our discussion on the >>>>>>>>>>> "out-of-range" scenario) >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> Chia-Ping >>>>>>>>>>> >>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月22日週三 >> 上午1:13寫道: >>>>>>>>>>> >>>>>>>>>>>> Hi, Chia-Ping, >>>>>>>>>>>> >>>>>>>>>>>> Hmm, is that true? With the earliest policy, we treat an >>>>>> out-of-range >>>>>>>>>>>> offset the same as no offset (because the group is deleted) >> and >>>>>>> always set >>>>>>>>>>>> it to the earliest offset, right? With to_start_time, an >>>>>> out-of-range >>>>>>>>>>>> offset is treated differently from no offset. >>>>>>>>>>>> >>>>>>>>>>>> Thanks, >>>>>>>>>>>> >>>>>>>>>>>> Jun >>>>>>>>>>>> >>>>>>>>>>>> On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai < >>>>>> [email protected] >>>>>>>> >>>>>>>>>>>> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> hi Jun >>>>>>>>>>>>> >>>>>>>>>>>>> Nice point. Group GC is definitely an issue for >> to_start_time, >>>>> but >>>>>>> it is >>>>>>>>>>>>> actually an issue for other policies as well. >>>>>>>>>>>>> >>>>>>>>>>>>> For example, a consumer using the earliest policy will >> suddenly >>>>>>> read all >>>>>>>>>>>>> historical records from scratch if it sleeps for a long >> while >>>>> and >>>>>>> gets >>>>>>>>>>>>> GC'd; otherwise, it just resumes from previous offsets if >> the >>>>>> group >>>>>>>>>>>> still >>>>>>>>>>>>> exists. It is equally hard to explain to users: "Oh, your >> group >>>>>> was >>>>>>>>>>>> GC'd, >>>>>>>>>>>>> so your offset behavior changed." >>>>>>>>>>>>> >>>>>>>>>>>>> Therefore, it seems to me the right approach to fix this >>>>>>> "inconsistency" >>>>>>>>>>>>> is to offer a group-level GC timeout in a future KIP, >> allowing >>>>>>> users to >>>>>>>>>>>>> explicitly protect critical groups from GC. This saves not >> only >>>>>>>>>>>>> to_start_time, but all other reset policies too. >>>>>>>>>>>>> >>>>>>>>>>>>> Best, >>>>>>>>>>>>> Chia-Ping >>>>>>>>>>>>> >>>>>>>>>>>>> On 2026/04/20 20:19:47 Jun Rao via dev wrote: >>>>>>>>>>>>>> Hi, Jiunn-Yang and Chia-Ping, >>>>>>>>>>>>>> >>>>>>>>>>>>>> Thanks for the reply. >>>>>>>>>>>>>> >>>>>>>>>>>>>> The main concern I see with to_start_time is that its >>>>> behavoir on >>>>>>> how >>>>>>>>>>>>> much >>>>>>>>>>>>>> data to consume when the offset is out of range is not >>>>> consistent >>>>>>> and >>>>>>>>>>>> is >>>>>>>>>>>>>> hard to explain. If the group still exists, it will read >> from >>>>> the >>>>>>>>>>>>> earliest >>>>>>>>>>>>>> offset. Otherwise, it will read from the latest. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Jun >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai < >>>>>>> [email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> hi all, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Just a note for a potential latest_v2: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Since the purpose is to read all records from extended >>>>>> partitions, >>>>>>>>>>>> we >>>>>>>>>>>>>>> could leverage the group creation time to compare >> against the >>>>>>>>>>>> earliest >>>>>>>>>>>>>>> record of a partition when there is no committed offset. >> If >>>>> the >>>>>>>>>>>> group >>>>>>>>>>>>>>> creation time is larger than the earliest record's >>>>> timestamp, we >>>>>>>>>>>>> assume it >>>>>>>>>>>>>>> is not an extended partition. Otherwise, we treat it as >> an >>>>>>> extended >>>>>>>>>>>>>>> partition. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> This approach allows us to catch all "possible" extended >>>>>>> partitions, >>>>>>>>>>>>> which >>>>>>>>>>>>>>> includes both "true" extended partitions and old but >>>>> truncated >>>>>>>>>>>>> partitions. >>>>>>>>>>>>>>> While there is a rare edge case where the cost is >>>>> reprocessing >>>>>>> some >>>>>>>>>>>>> records >>>>>>>>>>>>>>> we don't necessarily want, it is very easy to implement >> and >>>>>>>>>>>> guarantees >>>>>>>>>>>>> we >>>>>>>>>>>>>>> will never miss the actual extended partitions. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Chia-Ping >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2026/04/20 13:33:31 黃竣陽 wrote: >>>>>>>>>>>>>>>> Hello all, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I have added a new "Future Work: latest_strict Policy" >>>>> section >>>>>> to >>>>>>>>>>>> the >>>>>>>>>>>>>>> KIP. >>>>>>>>>>>>>>>> The idea is a future policy that uses latest semantics >> by >>>>>> default >>>>>>>>>>>> but >>>>>>>>>>>>>>> falls >>>>>>>>>>>>>>>> back to the group creation timestamp specifically for >> newly >>>>>> added >>>>>>>>>>>>>>> partitions >>>>>>>>>>>>>>>> during partition expansion. This would reuse the group >>>>> creation >>>>>>>>>>>> time >>>>>>>>>>>>>>> anchor >>>>>>>>>>>>>>>> introduced by this KIP, making it a natural extension >> with >>>>>>> minimal >>>>>>>>>>>>>>> additional >>>>>>>>>>>>>>>> protocol changes. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月18日 >> 下午4:09 >>>>> 寫道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi all, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> It is practically NP-hard to guess everyone's ideal use >>>>> case >>>>>>>>>>>> right >>>>>>>>>>>>> now. >>>>>>>>>>>>>>>>> Also, I believe we all want to avoid falling back to >> the >>>>>>>>>>>> intricate >>>>>>>>>>>>>>>>> multi-policy approach proposed in KIP-842. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I prefer to keep this KIP focused and discuss a "v2 >> latest" >>>>>>>>>>>> policy >>>>>>>>>>>>> in a >>>>>>>>>>>>>>>>> separate KIP. That future policy could build upon the >>>>>>>>>>>> to_start_time >>>>>>>>>>>>>>> anchor >>>>>>>>>>>>>>>>> to fix data loss specifically for extended partitions. >> We >>>>>> could >>>>>>>>>>>>> call it >>>>>>>>>>>>>>>>> something like latest_strict. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thoughts? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Hello Jun, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks for the reply, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> When the offset goes out of range, the user faces two >>>>>> options: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 1. Skip to the end (latest behavior) — risk losing >> data >>>>> that >>>>>>>>>>>> was >>>>>>>>>>>>>>> produced >>>>>>>>>>>>>>>>>> during >>>>>>>>>>>>>>>>>> the group's lifetime but not yet consumed. >>>>>>>>>>>>>>>>>> 2. Seek back to the group creation time (to_start_time >>>>>>>>>>>> behavior) — >>>>>>>>>>>>>>>>>> potentially >>>>>>>>>>>>>>>>>> reprocess some data, but guarantee no data from the >>>>> group's >>>>>>>>>>>>> lifetime >>>>>>>>>>>>>>> is >>>>>>>>>>>>>>>>>> silently lost. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> to_start_time chooses option 2 because its core >> promise is >>>>>>>>>>>> "never >>>>>>>>>>>>>>> silently >>>>>>>>>>>>>>>>>> lose data >>>>>>>>>>>>>>>>>> produced after the group started." If we fell back to >>>>> latest >>>>>> on >>>>>>>>>>>>>>>>>> out-of-range, we would >>>>>>>>>>>>>>>>>> break this guarantee. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I consider users who prefer option 1 can simply use >>>>>>>>>>>>>>>>>> auto.offset.reset=latest. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 2026年4月18日 >>>>> 凌晨1:57 >>>>>>>>>>>> 寫道: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi, Jiunn-Yang and Chia-Ping, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks for the reply. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> "The core semantic of to_start_time is to read all >>>>> records >>>>>>>>>>>> since >>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>> creation of the group." >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I am just questioning whether this actually covers a >>>>> common >>>>>>>>>>>> use >>>>>>>>>>>>>>> case. If >>>>>>>>>>>>>>>>>>> the offset doesn't go out of range, the logic makes >>>>> sense to >>>>>>>>>>>> me. >>>>>>>>>>>>> I'm >>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>> sure about the logic if the offset is out of range. >> If a >>>>>> user >>>>>>>>>>>>>>> chooses to >>>>>>>>>>>>>>>>>>> skip the historical data when starting the group, it >>>>> seems >>>>>> the >>>>>>>>>>>>> user >>>>>>>>>>>>>>>>>> likely >>>>>>>>>>>>>>>>>>> wants to do the same if the offset is out of range. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Jun >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 < >> [email protected]> >>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hello Jun, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thank for the feedback, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Adding to the points above: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding by_duration as an alternative to Scenario >> 1: >>>>>> beyond >>>>>>>>>>>>> clock >>>>>>>>>>>>>>> skew >>>>>>>>>>>>>>>>>>>> and retry issues, there is also a usability concern. >>>>>>>>>>>> by_duration >>>>>>>>>>>>>>>>>> requires >>>>>>>>>>>>>>>>>>>> users >>>>>>>>>>>>>>>>>>>> to reason about operational timing — "how long does >>>>>> partition >>>>>>>>>>>>>>> discovery >>>>>>>>>>>>>>>>>>>> take >>>>>>>>>>>>>>>>>>>> in my environment?”, and then translate that into a >>>>>>>>>>>>> configuration >>>>>>>>>>>>>>> value. >>>>>>>>>>>>>>>>>>>> to_start_time >>>>>>>>>>>>>>>>>>>> requires no such reasoning. It simply anchors to the >>>>> group >>>>>>>>>>>>> creation >>>>>>>>>>>>>>> time >>>>>>>>>>>>>>>>>>>> recorded >>>>>>>>>>>>>>>>>>>> by the broker. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regarding Scenario 2: I'd also like to clarify that >>>>>>>>>>>>> to_start_time >>>>>>>>>>>>>>> does >>>>>>>>>>>>>>>>>> not >>>>>>>>>>>>>>>>>>>> branch between >>>>>>>>>>>>>>>>>>>> "use latest" and "use earliest." It applies the same >>>>>>>>>>>>>>> ListOffsetsRequest >>>>>>>>>>>>>>>>>>>> with the group creation >>>>>>>>>>>>>>>>>>>> timestamp in all cases. The difference in outcome: >>>>>>>>>>>>>>>>>>>> - skipping old data on first start >>>>>>>>>>>>>>>>>>>> - consuming surviving data after truncation >>>>>>>>>>>>>>>>>>>> is a natural consequence of what data exists in the >>>>>>>>>>>> partition at >>>>>>>>>>>>>>> that >>>>>>>>>>>>>>>>>>>> point, not a different policy >>>>>>>>>>>>>>>>>>>> being applied. The rule is always the same. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 >>>>> 上午9:48 >>>>>> 寫道: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Jun Rao via dev <[email protected]> 於 >> 2026年4月17日 >>>>>> 凌晨4:57 >>>>>>>>>>>>> 寫道: >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Also, a group is deleted after the consumer has >> been >>>>> idle >>>>>>>>>>>>> longer >>>>>>>>>>>>>>>>>>>>>> than offsets.retention.minutes. What's the >> semantic of >>>>>>>>>>>>>>> to_start_time >>>>>>>>>>>>>>>>>> if >>>>>>>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>>>> group creation time is unavailable? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> If the group is recreated, a new creation time >> will be >>>>>>>>>>>>> recorded. >>>>>>>>>>>>>>> Hence, >>>>>>>>>>>>>>>>>>>> it acts like a new group. Plus, it throws an >> exception >>>>>>>>>>>> directly >>>>>>>>>>>>> if >>>>>>>>>>>>>>> the >>>>>>>>>>>>>>>>>>>> group truly has no creation time. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>
