Hi, Chia-Ping, Thanks for the reply.
Let's try to understand from the user's perspective. When the user starts the group for the first time, it faces a choice on whether to process the backlog or not. When the offset is out-of-range, the user faces the same choice regarding backlog processing. It seems that most users want to make the same choice regarding backlog processing. "Users who explicitly choose the to_start_time policy do so precisely because they do not want to skip any records when encountering an out-of-range scenario." This argument is weak because that's how to_start_time is designed, but we need to justify why it is a good choice in the first place. Jun On Tue, Apr 21, 2026 at 12:35 PM Chia-Ping Tsai <[email protected]> wrote: > Hi Jun, > > Thanks for the clarification. I think I misunderstood your previous point. > Let me summarize the scenarios to ensure we are fully aligned. > > There are essentially three scenarios when a consumer needs to reset > offsets: > > 1. > > Out-of-range (The group exists, but the offset has expired). > 2. > > Extended partition (The group exists, but encounters a newly added > partition with no committed offset). > 3. > > No-offset (The group is completely new, or an existing group was > deleted by the GC). > > We all agree that the primary goal of this KIP is to catch up on all > records for scenario 2. There are no objections here. > > Regarding the inconsistency you pointed out between 1) and 3) under the > current to_start_time design, I completely see your point. If users are > not fully aware that to_start_time is designed to read all records since > the creation of the group, they might get confused. > > However, to me, this "inconsistency" is actually a matter of > predictability. Users who explicitly choose the to_start_time policy do > so precisely because they do not want to skip any records when encountering > an out-of-range scenario. > > (I would prefer to set aside the topic of group GC for a moment. It is > much more important that we first focus our discussion on the > "out-of-range" scenario) > > Best, > > Chia-Ping > > Jun Rao via dev <[email protected]> 於 2026年4月22日週三 上午1:13寫道: > >> Hi, Chia-Ping, >> >> Hmm, is that true? With the earliest policy, we treat an out-of-range >> offset the same as no offset (because the group is deleted) and always set >> it to the earliest offset, right? With to_start_time, an out-of-range >> offset is treated differently from no offset. >> >> Thanks, >> >> Jun >> >> On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai <[email protected]> >> wrote: >> >> > hi Jun >> > >> > Nice point. Group GC is definitely an issue for to_start_time, but it is >> > actually an issue for other policies as well. >> > >> > For example, a consumer using the earliest policy will suddenly read all >> > historical records from scratch if it sleeps for a long while and gets >> > GC'd; otherwise, it just resumes from previous offsets if the group >> still >> > exists. It is equally hard to explain to users: "Oh, your group was >> GC'd, >> > so your offset behavior changed." >> > >> > Therefore, it seems to me the right approach to fix this "inconsistency" >> > is to offer a group-level GC timeout in a future KIP, allowing users to >> > explicitly protect critical groups from GC. This saves not only >> > to_start_time, but all other reset policies too. >> > >> > Best, >> > Chia-Ping >> > >> > On 2026/04/20 20:19:47 Jun Rao via dev wrote: >> > > Hi, Jiunn-Yang and Chia-Ping, >> > > >> > > Thanks for the reply. >> > > >> > > The main concern I see with to_start_time is that its behavoir on how >> > much >> > > data to consume when the offset is out of range is not consistent and >> is >> > > hard to explain. If the group still exists, it will read from the >> > earliest >> > > offset. Otherwise, it will read from the latest. >> > > >> > > Jun >> > > >> > > On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <[email protected]> >> > wrote: >> > > >> > > > hi all, >> > > > >> > > > Just a note for a potential latest_v2: >> > > > >> > > > Since the purpose is to read all records from extended partitions, >> we >> > > > could leverage the group creation time to compare against the >> earliest >> > > > record of a partition when there is no committed offset. If the >> group >> > > > creation time is larger than the earliest record's timestamp, we >> > assume it >> > > > is not an extended partition. Otherwise, we treat it as an extended >> > > > partition. >> > > > >> > > > This approach allows us to catch all "possible" extended partitions, >> > which >> > > > includes both "true" extended partitions and old but truncated >> > partitions. >> > > > While there is a rare edge case where the cost is reprocessing some >> > records >> > > > we don't necessarily want, it is very easy to implement and >> guarantees >> > we >> > > > will never miss the actual extended partitions. >> > > > >> > > > Best, >> > > > Chia-Ping >> > > > >> > > > On 2026/04/20 13:33:31 黃竣陽 wrote: >> > > > > Hello all, >> > > > > >> > > > > I have added a new "Future Work: latest_strict Policy" section to >> the >> > > > KIP. >> > > > > The idea is a future policy that uses latest semantics by default >> but >> > > > falls >> > > > > back to the group creation timestamp specifically for newly added >> > > > partitions >> > > > > during partition expansion. This would reuse the group creation >> time >> > > > anchor >> > > > > introduced by this KIP, making it a natural extension with minimal >> > > > additional >> > > > > protocol changes. >> > > > > >> > > > > Best Regards, >> > > > > Jiunn-Yang >> > > > > >> > > > > > Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09 寫道: >> > > > > > >> > > > > > Hi all, >> > > > > > >> > > > > > It is practically NP-hard to guess everyone's ideal use case >> right >> > now. >> > > > > > Also, I believe we all want to avoid falling back to the >> intricate >> > > > > > multi-policy approach proposed in KIP-842. >> > > > > > >> > > > > > I prefer to keep this KIP focused and discuss a "v2 latest" >> policy >> > in a >> > > > > > separate KIP. That future policy could build upon the >> to_start_time >> > > > anchor >> > > > > > to fix data loss specifically for extended partitions. We could >> > call it >> > > > > > something like latest_strict. >> > > > > > >> > > > > > Thoughts? >> > > > > > >> > > > > > >> > > > > > 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道: >> > > > > > >> > > > > >> Hello Jun, >> > > > > >> >> > > > > >> Thanks for the reply, >> > > > > >> >> > > > > >> When the offset goes out of range, the user faces two options: >> > > > > >> >> > > > > >> 1. Skip to the end (latest behavior) — risk losing data that >> was >> > > > produced >> > > > > >> during >> > > > > >> the group's lifetime but not yet consumed. >> > > > > >> 2. Seek back to the group creation time (to_start_time >> behavior) — >> > > > > >> potentially >> > > > > >> reprocess some data, but guarantee no data from the group's >> > lifetime >> > > > is >> > > > > >> silently lost. >> > > > > >> >> > > > > >> to_start_time chooses option 2 because its core promise is >> "never >> > > > silently >> > > > > >> lose data >> > > > > >> produced after the group started." If we fell back to latest on >> > > > > >> out-of-range, we would >> > > > > >> break this guarantee. >> > > > > >> >> > > > > >> I consider users who prefer option 1 can simply use >> > > > > >> auto.offset.reset=latest. >> > > > > >> >> > > > > >> Best Regards, >> > > > > >> Jiunn-Yang >> > > > > >> >> > > > > >>> Jun Rao via dev <[email protected]> 於 2026年4月18日 凌晨1:57 >> 寫道: >> > > > > >>> >> > > > > >>> Hi, Jiunn-Yang and Chia-Ping, >> > > > > >>> >> > > > > >>> Thanks for the reply. >> > > > > >>> >> > > > > >>> "The core semantic of to_start_time is to read all records >> since >> > the >> > > > > >>> creation of the group." >> > > > > >>> >> > > > > >>> I am just questioning whether this actually covers a common >> use >> > > > case. If >> > > > > >>> the offset doesn't go out of range, the logic makes sense to >> me. >> > I'm >> > > > not >> > > > > >>> sure about the logic if the offset is out of range. If a user >> > > > chooses to >> > > > > >>> skip the historical data when starting the group, it seems the >> > user >> > > > > >> likely >> > > > > >>> wants to do the same if the offset is out of range. >> > > > > >>> >> > > > > >>> Jun >> > > > > >>> >> > > > > >>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]> >> wrote: >> > > > > >>> >> > > > > >>>> Hello Jun, >> > > > > >>>> >> > > > > >>>> Thank for the feedback, >> > > > > >>>> >> > > > > >>>> Adding to the points above: >> > > > > >>>> >> > > > > >>>> Regarding by_duration as an alternative to Scenario 1: beyond >> > clock >> > > > skew >> > > > > >>>> and retry issues, there is also a usability concern. >> by_duration >> > > > > >> requires >> > > > > >>>> users >> > > > > >>>> to reason about operational timing — "how long does partition >> > > > discovery >> > > > > >>>> take >> > > > > >>>> in my environment?”, and then translate that into a >> > configuration >> > > > value. >> > > > > >>>> to_start_time >> > > > > >>>> requires no such reasoning. It simply anchors to the group >> > creation >> > > > time >> > > > > >>>> recorded >> > > > > >>>> by the broker. >> > > > > >>>> >> > > > > >>>> Regarding Scenario 2: I'd also like to clarify that >> > to_start_time >> > > > does >> > > > > >> not >> > > > > >>>> branch between >> > > > > >>>> "use latest" and "use earliest." It applies the same >> > > > ListOffsetsRequest >> > > > > >>>> with the group creation >> > > > > >>>> timestamp in all cases. The difference in outcome: >> > > > > >>>> - skipping old data on first start >> > > > > >>>> - consuming surviving data after truncation >> > > > > >>>> is a natural consequence of what data exists in the >> partition at >> > > > that >> > > > > >>>> point, not a different policy >> > > > > >>>> being applied. The rule is always the same. >> > > > > >>>> >> > > > > >>>> Best Regards, >> > > > > >>>> Jiunn-Yang >> > > > > >>>> >> > > > > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 上午9:48 寫道: >> > > > > >>>>> >> > > > > >>>>> >> > > > > >>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日 凌晨4:57 >> > 寫道: >> > > > > >>>>>> >> > > > > >>>>>> Also, a group is deleted after the consumer has been idle >> > longer >> > > > > >>>>>> than offsets.retention.minutes. What's the semantic of >> > > > to_start_time >> > > > > >> if >> > > > > >>>> the >> > > > > >>>>>> group creation time is unavailable? >> > > > > >>>>> >> > > > > >>>>> If the group is recreated, a new creation time will be >> > recorded. >> > > > Hence, >> > > > > >>>> it acts like a new group. Plus, it throws an exception >> directly >> > if >> > > > the >> > > > > >>>> group truly has no creation time. >> > > > > >>>> >> > > > > >>>> >> > > > > >> >> > > > > >> >> > > > > >> > > > > >> > > > >> > > >> > >> >
