hi Jun Nice point. Group GC is definitely an issue for to_start_time, but it is actually an issue for other policies as well.
For example, a consumer using the earliest policy will suddenly read all historical records from scratch if it sleeps for a long while and gets GC'd; otherwise, it just resumes from previous offsets if the group still exists. It is equally hard to explain to users: "Oh, your group was GC'd, so your offset behavior changed." Therefore, it seems to me the right approach to fix this "inconsistency" is to offer a group-level GC timeout in a future KIP, allowing users to explicitly protect critical groups from GC. This saves not only to_start_time, but all other reset policies too. Best, Chia-Ping On 2026/04/20 20:19:47 Jun Rao via dev wrote: > Hi, Jiunn-Yang and Chia-Ping, > > Thanks for the reply. > > The main concern I see with to_start_time is that its behavoir on how much > data to consume when the offset is out of range is not consistent and is > hard to explain. If the group still exists, it will read from the earliest > offset. Otherwise, it will read from the latest. > > Jun > > On Mon, Apr 20, 2026 at 10:13 AM Chia-Ping Tsai <[email protected]> wrote: > > > hi all, > > > > Just a note for a potential latest_v2: > > > > Since the purpose is to read all records from extended partitions, we > > could leverage the group creation time to compare against the earliest > > record of a partition when there is no committed offset. If the group > > creation time is larger than the earliest record's timestamp, we assume it > > is not an extended partition. Otherwise, we treat it as an extended > > partition. > > > > This approach allows us to catch all "possible" extended partitions, which > > includes both "true" extended partitions and old but truncated partitions. > > While there is a rare edge case where the cost is reprocessing some records > > we don't necessarily want, it is very easy to implement and guarantees we > > will never miss the actual extended partitions. > > > > Best, > > Chia-Ping > > > > On 2026/04/20 13:33:31 黃竣陽 wrote: > > > Hello all, > > > > > > I have added a new "Future Work: latest_strict Policy" section to the > > KIP. > > > The idea is a future policy that uses latest semantics by default but > > falls > > > back to the group creation timestamp specifically for newly added > > partitions > > > during partition expansion. This would reuse the group creation time > > anchor > > > introduced by this KIP, making it a natural extension with minimal > > additional > > > protocol changes. > > > > > > Best Regards, > > > Jiunn-Yang > > > > > > > Chia-Ping Tsai <[email protected]> 於 2026年4月18日 下午4:09 寫道: > > > > > > > > Hi all, > > > > > > > > It is practically NP-hard to guess everyone's ideal use case right now. > > > > Also, I believe we all want to avoid falling back to the intricate > > > > multi-policy approach proposed in KIP-842. > > > > > > > > I prefer to keep this KIP focused and discuss a "v2 latest" policy in a > > > > separate KIP. That future policy could build upon the to_start_time > > anchor > > > > to fix data loss specifically for extended partitions. We could call it > > > > something like latest_strict. > > > > > > > > Thoughts? > > > > > > > > > > > > 黃竣陽 <[email protected]> 於 2026年4月18日週六 下午3:24寫道: > > > > > > > >> Hello Jun, > > > >> > > > >> Thanks for the reply, > > > >> > > > >> When the offset goes out of range, the user faces two options: > > > >> > > > >> 1. Skip to the end (latest behavior) — risk losing data that was > > produced > > > >> during > > > >> the group's lifetime but not yet consumed. > > > >> 2. Seek back to the group creation time (to_start_time behavior) — > > > >> potentially > > > >> reprocess some data, but guarantee no data from the group's lifetime > > is > > > >> silently lost. > > > >> > > > >> to_start_time chooses option 2 because its core promise is "never > > silently > > > >> lose data > > > >> produced after the group started." If we fell back to latest on > > > >> out-of-range, we would > > > >> break this guarantee. > > > >> > > > >> I consider users who prefer option 1 can simply use > > > >> auto.offset.reset=latest. > > > >> > > > >> Best Regards, > > > >> Jiunn-Yang > > > >> > > > >>> Jun Rao via dev <[email protected]> 於 2026年4月18日 凌晨1:57 寫道: > > > >>> > > > >>> Hi, Jiunn-Yang and Chia-Ping, > > > >>> > > > >>> Thanks for the reply. > > > >>> > > > >>> "The core semantic of to_start_time is to read all records since the > > > >>> creation of the group." > > > >>> > > > >>> I am just questioning whether this actually covers a common use > > case. If > > > >>> the offset doesn't go out of range, the logic makes sense to me. I'm > > not > > > >>> sure about the logic if the offset is out of range. If a user > > chooses to > > > >>> skip the historical data when starting the group, it seems the user > > > >> likely > > > >>> wants to do the same if the offset is out of range. > > > >>> > > > >>> Jun > > > >>> > > > >>> On Fri, Apr 17, 2026 at 5:23 AM 黃竣陽 <[email protected]> wrote: > > > >>> > > > >>>> Hello Jun, > > > >>>> > > > >>>> Thank for the feedback, > > > >>>> > > > >>>> Adding to the points above: > > > >>>> > > > >>>> Regarding by_duration as an alternative to Scenario 1: beyond clock > > skew > > > >>>> and retry issues, there is also a usability concern. by_duration > > > >> requires > > > >>>> users > > > >>>> to reason about operational timing — "how long does partition > > discovery > > > >>>> take > > > >>>> in my environment?”, and then translate that into a configuration > > value. > > > >>>> to_start_time > > > >>>> requires no such reasoning. It simply anchors to the group creation > > time > > > >>>> recorded > > > >>>> by the broker. > > > >>>> > > > >>>> Regarding Scenario 2: I'd also like to clarify that to_start_time > > does > > > >> not > > > >>>> branch between > > > >>>> "use latest" and "use earliest." It applies the same > > ListOffsetsRequest > > > >>>> with the group creation > > > >>>> timestamp in all cases. The difference in outcome: > > > >>>> - skipping old data on first start > > > >>>> - consuming surviving data after truncation > > > >>>> is a natural consequence of what data exists in the partition at > > that > > > >>>> point, not a different policy > > > >>>> being applied. The rule is always the same. > > > >>>> > > > >>>> Best Regards, > > > >>>> Jiunn-Yang > > > >>>> > > > >>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月17日 上午9:48 寫道: > > > >>>>> > > > >>>>> > > > >>>>>> Jun Rao via dev <[email protected]> 於 2026年4月17日 凌晨4:57 寫道: > > > >>>>>> > > > >>>>>> Also, a group is deleted after the consumer has been idle longer > > > >>>>>> than offsets.retention.minutes. What's the semantic of > > to_start_time > > > >> if > > > >>>> the > > > >>>>>> group creation time is unavailable? > > > >>>>> > > > >>>>> If the group is recreated, a new creation time will be recorded. > > Hence, > > > >>>> it acts like a new group. Plus, it throws an exception directly if > > the > > > >>>> group truly has no creation time. > > > >>>> > > > >>>> > > > >> > > > >> > > > > > > > > >
