Hi, Chia-Ping, Hmm, is that true? With the earliest policy, we treat an out-of-range offset the same as no offset (because the group is deleted) and always set it to the earliest offset, right? With to_start_time, an out-of-range offset is treated differently from no offset.
Thanks, Jun On Tue, Apr 21, 2026 at 12:54 AM Chia-Ping Tsai <[email protected]> wrote: > hi Jun > > Nice point. Group GC is definitely an issue for to_start_time, but it is > actually an issue for other policies as well. > > For example, a consumer using the earliest policy will suddenly read all > historical records from scratch if it sleeps for a long while and gets > GC'd; otherwise, it just resumes from previous offsets if the group still > exists. It is equally hard to explain to users: "Oh, your group was GC'd, > so your offset behavior changed." > > Therefore, it seems to me the right approach to fix this "inconsistency" > is to offer a group-level GC timeout in a future KIP, allowing users to > explicitly protect critical groups from GC. This saves not only > to_start_time, but all other reset policies too. > > Best, > Chia-Ping > > On 2026/04/20 20:19:47 Jun Rao via dev wrote: > > Hi, Jiunn-Yang and Chia-Ping, > > > > Thanks for the reply. > > > > The main concern I see with to_start_time is that its behavoir on how > much > > data to consume when the offset is out of range is not consistent and is > > hard to explain. If the group still exists, it will read from the > earliest > > offset. Otherwise, it will read from the latest. > > > > Jun > > > > On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <[email protected]> > wrote: > > > > > hi all, > > > > > > Just a note for a potential latest_v2: > > > > > > Since the purpose is to read all records from extended partitions, we > > > could leverage the group creation time to compare against the earliest > > > record of a partition when there is no committed offset. If the group > > > creation time is larger than the earliest record's timestamp, we > assume it > > > is not an extended partition. Otherwise, we treat it as an extended > > > partition. > > > > > > This approach allows us to catch all "possible" extended partitions, > which > > > includes both "true" extended partitions and old but truncated > partitions. > > > While there is a rare edge case where the cost is reprocessing some > records > > > we don't necessarily want, it is very easy to implement and guarantees > we > > > will never miss the actual extended partitions. > > > > > > Best, > > > Chia-Ping > > > > > > On 2026/04/20 13:33:31 黃竣陽 wrote: > > > > Hello all, > > > > > > > > I have added a new "Future Work: latest_strict Policy" section to the > > > KIP. > > > > The idea is a future policy that uses latest semantics by default but > > > falls > > > > back to the group creation timestamp specifically for newly added > > > partitions > > > > during partition expansion. This would reuse the group creation time > > > anchor > > > > introduced by this KIP, making it a natural extension with minimal > > > additional > > > > protocol changes. > > > > > > > > Best Regards, > > > > Jiunn-Yang > > > > > > > > > Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09 寫道: > > > > > > > > > > Hi all, > > > > > > > > > > It is practically NP-hard to guess everyone's ideal use case right > now. > > > > > Also, I believe we all want to avoid falling back to the intricate > > > > > multi-policy approach proposed in KIP-842. > > > > > > > > > > I prefer to keep this KIP focused and discuss a "v2 latest" policy > in a > > > > > separate KIP. That future policy could build upon the to_start_time > > > anchor > > > > > to fix data loss specifically for extended partitions. We could > call it > > > > > something like latest_strict. > > > > > > > > > > Thoughts? > > > > > > > > > > > > > > > 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道: > > > > > > > > > >> Hello Jun, > > > > >> > > > > >> Thanks for the reply, > > > > >> > > > > >> When the offset goes out of range, the user faces two options: > > > > >> > > > > >> 1. Skip to the end (latest behavior) — risk losing data that was > > > produced > > > > >> during > > > > >> the group's lifetime but not yet consumed. > > > > >> 2. Seek back to the group creation time (to_start_time behavior) — > > > > >> potentially > > > > >> reprocess some data, but guarantee no data from the group's > lifetime > > > is > > > > >> silently lost. > > > > >> > > > > >> to_start_time chooses option 2 because its core promise is "never > > > silently > > > > >> lose data > > > > >> produced after the group started." If we fell back to latest on > > > > >> out-of-range, we would > > > > >> break this guarantee. > > > > >> > > > > >> I consider users who prefer option 1 can simply use > > > > >> auto.offset.reset=latest. > > > > >> > > > > >> Best Regards, > > > > >> Jiunn-Yang > > > > >> > > > > >>> Jun Rao via dev <[email protected]> 於 2026年4月18日 凌晨1:57 寫道: > > > > >>> > > > > >>> Hi, Jiunn-Yang and Chia-Ping, > > > > >>> > > > > >>> Thanks for the reply. > > > > >>> > > > > >>> "The core semantic of to_start_time is to read all records since > the > > > > >>> creation of the group." > > > > >>> > > > > >>> I am just questioning whether this actually covers a common use > > > case. If > > > > >>> the offset doesn't go out of range, the logic makes sense to me. > I'm > > > not > > > > >>> sure about the logic if the offset is out of range. If a user > > > chooses to > > > > >>> skip the historical data when starting the group, it seems the > user > > > > >> likely > > > > >>> wants to do the same if the offset is out of range. > > > > >>> > > > > >>> Jun > > > > >>> > > > > >>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]> wrote: > > > > >>> > > > > >>>> Hello Jun, > > > > >>>> > > > > >>>> Thank for the feedback, > > > > >>>> > > > > >>>> Adding to the points above: > > > > >>>> > > > > >>>> Regarding by_duration as an alternative to Scenario 1: beyond > clock > > > skew > > > > >>>> and retry issues, there is also a usability concern. by_duration > > > > >> requires > > > > >>>> users > > > > >>>> to reason about operational timing — "how long does partition > > > discovery > > > > >>>> take > > > > >>>> in my environment?”, and then translate that into a > configuration > > > value. > > > > >>>> to_start_time > > > > >>>> requires no such reasoning. It simply anchors to the group > creation > > > time > > > > >>>> recorded > > > > >>>> by the broker. > > > > >>>> > > > > >>>> Regarding Scenario 2: I'd also like to clarify that > to_start_time > > > does > > > > >> not > > > > >>>> branch between > > > > >>>> "use latest" and "use earliest." It applies the same > > > ListOffsetsRequest > > > > >>>> with the group creation > > > > >>>> timestamp in all cases. The difference in outcome: > > > > >>>> - skipping old data on first start > > > > >>>> - consuming surviving data after truncation > > > > >>>> is a natural consequence of what data exists in the partition at > > > that > > > > >>>> point, not a different policy > > > > >>>> being applied. The rule is always the same. > > > > >>>> > > > > >>>> Best Regards, > > > > >>>> Jiunn-Yang > > > > >>>> > > > > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 上午9:48 寫道: > > > > >>>>> > > > > >>>>> > > > > >>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日 凌晨4:57 > 寫道: > > > > >>>>>> > > > > >>>>>> Also, a group is deleted after the consumer has been idle > longer > > > > >>>>>> than offsets.retention.minutes. What's the semantic of > > > to_start_time > > > > >> if > > > > >>>> the > > > > >>>>>> group creation time is unavailable? > > > > >>>>> > > > > >>>>> If the group is recreated, a new creation time will be > recorded. > > > Hence, > > > > >>>> it acts like a new group. Plus, it throws an exception directly > if > > > the > > > > >>>> group truly has no creation time. > > > > >>>> > > > > >>>> > > > > >> > > > > >> > > > > > > > > > > > > > >
