Hello all, I’d like to bump this thread. If there is no further feedback, I will call a vote.
Best Regards, Jiunn-Yang > 黃竣陽 <[email protected]> 於 2026年3月18日 晚上7:28 寫道: > > Hello all, > > I have updated the KIP. Any feedback or suggestions are welcome. > > Best Regards, > Jiunn-Yang > >> Chia-Ping Tsai <[email protected]> 於 2026年3月16日 晚上11:19 寫道: >> >> Hi Jiunn-Yang, >> >> Thanks for the updates. I really love the simplification! :) >> >> It seems the KIP page is a bit out-of-date now. Would you mind updating the >> KIP to match the above discussion? >> >> For example, the KIP still uses two timestamps. Plus, we need to describe >> the risk caused by server time skew that you mentioned. >> >> Best, Chia-Ping >> >> 黃竣陽 <[email protected]> 於 2026年3月16日週一 下午10:53寫道: >> >>> Hello all, >>> >>> Thanks for all feedbacks, >>> >>> I believe we can simplify the approach further. Instead of comparing group >>> creation >>> time against partition creation time to decide between EARLIEST and >>> LATEST, we >>> can just always use the group creation timestamp to issue a >>> ListOffsetsRequest for >>> all partitions — whether there is no committed offset or the committed >>> offset is out of >>> range. If the partition was created after the group, the lookup will >>> naturally resolve to >>> the beginning of that partition. If the partition predates the group, the >>> lookup will find the >>> first message at or after the group's creation time. Even for the >>> out-of-range case (e.g., >>> AdminClient.deleteRecords or log segment deletion), the same rule applies. >>> >>> This approach is also more semantically accurate. The name by_start_time >>> implies >>> "start consuming from when my group started." Using the group creation >>> timestamp >>> as the sole reference point for all scenarios is exactly what this name >>> promises — >>> no hidden branching between EARLIEST and LATEST, no conditional >>> comparisons. >>> The behavior directly reflects the name. >>> >>> This eliminates all conditional logic — the behavior becomes a single, >>> predictable rule: >>> always seek to the group's creation time. This simplification also aligns >>> well with the basic >>> rule proposed earlier in the discussion: the strategy only affects the >>> initial startup decision, >>> and after startup, the behavior remains consistent — we always try to >>> consume as much >>> data as possible regardless of the scenario. >>> >>> Additionally, this reduces the scope of the protocol change, since we no >>> longer need to pass >>> partition creation timestamps in the heartbeat response. Only the group >>> creation time is required. >>> >>> There are two points still worth discussing: Server clock accuracy: >>> >>> 1, The broker's clock may have some drift, which could affect the >>> precision of the timestamp-based >>> lookup. However, this is equally true for the previous approach that >>> compared two server-side >>> timestamps, so it does not introduce a new concern. >>> >>> 2. User data timestamps: If producers set custom timestamps on their >>> records that do not reflect real >>> time, the ListOffsetsRequest may return unexpected results. However, this >>> is fundamentally a data >>> quality issue that users should be responsible for. >>> >>> >>> Best Regards, >>> Jiunn-Yang >>> >>>> jian fu <[email protected]> 於 2026年3月16日 晚上8:10 寫道: >>>> >>>> Hi all, >>>> >>>> Thank you for introducing/discussion this KIP to address a long-standing >>>> issue for users. >>>> >>>> I think the current issue with the strategy is not only the data loss >>>> during partition expansion, but also that it is difficult for users to >>>> understand or remember in different scenarios. >>>> >>>> If we can define one basic rule and get agreed in the KIP, it would make >>> it >>>> easier for everyone to stay on the same page. >>>> >>>> The basic rule would be: >>>> >>>> 1 The strategy only affects the initial startup of the consumer, to >>> decide >>>> whether historical data should be discarded. >>>> 2 After startup, no matter which strategy is used, the behavior should >>>> remain the same: >>>> >>>> 2.1 we try not to lose any data. For example, in cases like partition >>>> expansion, >>>> 2.2 we should still attempt to consume all available data in future. If >>>> some data loss is unavoidable, such as when the start offset changes >>> (e.g., >>>> due to slow consumption or user-triggered data deletion), we should lose >>>> the same amount of data regardless of the strategy. This would require a >>>> small behavior change. Previously, when an out-of-range situation >>> occurred >>>> because of a start offset change, the latest strategy could lose more >>> data >>>> than earliest .After change, both strategies would behave the same: they >>>> would reset to the start offset (earliest available offset), which will >>>> also reduces the number of dropped messages. >>>> >>>> I think this rule would significantly reduce the difficulty for users to >>>> understand the behavior or confuse for some cases. Users would only need >>> to >>>> focus on the initial start of the consume group, and there would be no >>>> differences in behavior in future scenarios. >>>> >>>> WDTY? Thanks >>>> >>>> Regards >>>> >>>> Jian >>>> >>>> Ryan Leslie (BLOOMBERG/ NEW YORK) <[email protected]> 于2026年3月16日周一 >>>> 08:39写道: >>>> >>>>> Hey Chia, >>>>> >>>>> Sorry, I see the KIP had some recent updates. I see the current idea is >>> to >>>>> store partition time and group start time. I think this is good. For >>> the 2a >>>>> case (stale offset) it mentions the user is explicitly reset to >>> earliest if >>>>> they use by_start_time. As you said, they could use by_duration instead >>> to >>>>> avoid a backlog but according to the KIP there is possibility of data >>> loss >>>>> during partition expansion. But I agree with you that it's fair to >>> assume >>>>> most using this feature are interested in data completeness, and users >>> are >>>>> still free to manually adjust offsets. >>>>> >>>>> One other comment is the feature may be tricky for users to fully grasp >>>>> since it combines multiple things. Most users won't really know what >>> their >>>>> exact group startup time is. If an offset was deleted, when consuming >>>>> starts again it might end up being reset to earliest, or some other time >>>>> that they are not too aware of. It's less deterministic than before. >>> And if >>>>> the offset is out of range, then instead they just always get earliest, >>>>> independent of the group start time. And that behavior is a bit >>> different >>>>> than the other one. While this may be sound reasoning to kafka devs, >>> having >>>>> users understand what behavior to expect can be harder. The >>> documentation >>>>> might be difficult if it has to list out and explain all the cases as >>> the >>>>> KIP does. >>>>> >>>>> The deleteRecords / stale offset use case is perhaps a separate one from >>>>> partition expansion. We have sort of combined them into the one >>> opinionated >>>>> feature now. >>>>> >>>>> Regardless, thank you guys once again for focusing on this long-lived >>>>> issue in Kafka :) >>>>> >>>>> From: [email protected] At: 03/15/26 18:30:45 UTC-4:00To: >>>>> [email protected] >>>>> Subject: Re: [DISCUSS] KIP-1282: Prevent data loss during partition >>>>> expansion for dynamically added partitions >>>>> >>>>> Hi Ryan, >>>>> >>>>> It seems to me the by_duration policy is already perfectly suited to >>> tackle >>>>> the long downtime scenario (Case 2a). >>>>> >>>>> If a user wants to avoid a massive backlog after being offline for a >>> long >>>>> time, they can configure something like by_duration=10m. >>>>> >>>>> This allows them to skip the massive backlog of older records, while >>> still >>>>> capturing the most recent data (including anything in newly expanded >>>>> partitions). >>>>> >>>>> Since by_duration and latest already serve the users who want to skip >>> data, >>>>> shouldn't by_start_time remain strictly focused on users whose primary >>> goal >>>>> is data completeness (even if it means processing a backlog)? >>>>> >>>>> Best, >>>>> Chia-Ping >>>>> >>>>> Ryan Leslie (BLOOMBERG/ NEW YORK) <[email protected]> 於 >>> 2026年3月16日週一 >>>>> 上午4:57寫道: >>>>> >>>>>> Hey Chia, >>>>>> >>>>>> That's an interesting point that there is another subcase, 2c, caused >>> by >>>>>> the user having deleted records. While the current behavior of 'latest' >>>>>> would not be changed by adding something like 'auto.no.offset.reset' >>>>>> mentioned earlier, the current suggestion of by_start_time would >>> address >>>>>> not only the case of newly added partitions, but this additional case >>>>> when >>>>>> records are deleted and the offset is invalid. It may be worth >>> explicitly >>>>>> highlighting this as another intended use case in the KIP. >>>>>> >>>>>> Also, have you given anymore thought to the case where a consumer >>> crashes >>>>>> during the partition expansion and its timestamp is lost or incorrect >>>>> when >>>>>> it restarts? If we decided to do something where the timestamp is just >>>>>> stored by the broker as the creation time of the consumer group, then >>>>> after >>>>>> a short while (a typical retention period) it may just degenerate into >>>>>> 'earliest' anyway. In that case the behavior of 2a is also implicitly >>>>>> changed to 'earliest' which again the user may not want since they >>>>>> intentionally had the consumer down. This is why I feel it can get kind >>>>> of >>>>>> confusing to try to cover all reset cases in a single tunable, >>>>>> auto.offset.reset. It's because there are several different cases. >>>>>> >>>>>> From: [email protected] At: 03/15/26 10:44:06 UTC-4:00To: >>>>>> [email protected] >>>>>> Subject: Re: [DISCUSS] KIP-1282: Prevent data loss during partition >>>>>> expansion for dynamically added partitions >>>>>> >>>>>> Hi Ryan, >>>>>> >>>>>> Thanks for the detailed breakdown! Categorizing the scenarios into 1a, >>>>> 1b, >>>>>> and >>>>>> 2a is incredibly helpful. We are completely aligned that Case 1b (a >>> newly >>>>>> added >>>>>> partition) is the true issue and the primary motivation for this KIP. >>>>>> >>>>>> Regarding the "out of range" scenario (Case 2a), the situation you >>>>>> described (a >>>>>> consumer being offline for a long time) is very real. However, I'd like >>>>> to >>>>>> offer another common operational perspective: >>>>>> >>>>>> What if a user explicitly uses AdminClient.deleteRecords to advance the >>>>>> start >>>>>> offset just to skip a batch of corrupted/poison-pill records? >>>>>> >>>>>> In this scenario, the consumer hasn't been hanging or offline for long; >>>>> it >>>>>> simply hits an OffsetOutOfRange error. If we force the policy to jump >>> to >>>>>> latest, the consumer will unexpectedly skip all the remaining healthy >>>>>> records >>>>>> up to the High Watermark. This would cause massive, unintended data >>> loss >>>>>> for an >>>>>> operation that merely meant to skip a few bad records. >>>>>> >>>>>> Therefore, to make the semantic guarantee accurate and predictable: if >>> a >>>>>> user >>>>>> explicitly opts into policy=by_start_time (meaning "I want every record >>>>>> since I >>>>>> started"), our contract should be to let them process as much surviving >>>>>> data as >>>>>> possible. Falling back to earliest during an out-of-range event >>> fulfills >>>>>> this >>>>>> promise safely. >>>>>> >>>>>> On 2026/03/13 23:07:04 "Ryan Leslie (BLOOMBERG/ NEW YORK)" wrote: >>>>>>> Hi Chia, >>>>>>> >>>>>>> I have a similar thought that knowing if the consumer group is new is >>>>>> helpful. It may be possible to avoid having to know the age of the >>>>>> partition. >>>>>> Thinking out loud a bit here, but: >>>>>>> >>>>>>> The two cases for auto.offset.reset are: >>>>>>> >>>>>>> 1. no offset >>>>>>> 2. offset is out of range >>>>>>> >>>>>>> These have subcases such as: >>>>>>> >>>>>>> 1. no offset >>>>>>> a. group is new >>>>>>> b. partition is new >>>>>>> c. offset previously existed was explicitly deleted >>>>>>> >>>>>>> 2. offset is out of range >>>>>>> a. group has stopped consuming or had no consumers for a while, >>>>> offset >>>>>> is >>>>>> stale >>>>>>> b. invalid offset was explicitly committed >>>>>>> >>>>>>> Users of auto.offset.reset LATEST are perhaps mostly interested in not >>>>>> processing a backlog of messages when they first start up. But if there >>>>> is >>>>>> a >>>>>> new partition they don't want to lose data. A typical user might want: >>>>>>> >>>>>>> 1a - latest >>>>>>> >>>>>>> 1b - earliest >>>>>>> >>>>>>> 1c - Unclear, but this may be an advanced user. Since they may have >>>>>> explicitly used AdminClient to remove the offset, they can explicitly >>> use >>>>>> it to >>>>>> configure a new one >>>>>>> >>>>>>> 2a - latest may be appropriate - the group has most likely already >>> lost >>>>>> messages >>>>>>> >>>>>>> 2b - Similar to 2c, this may be an advanced user already in control >>>>> over >>>>>> offsets with AdminClient >>>>>>> >>>>>>> If we worry most about cases 1a, 1b, and 2a, only 1b is the exception >>>>>> (which >>>>>> is why this KIP exists). What we really want to know is if partition is >>>>>> new, >>>>>> but perhaps as long as we ensure that 1a and 2a are more explicitly >>>>>> configurable, >>>>>>> then we may get support for 1b without a partition timestamp. For >>>>>> example, if >>>>>> the following new config existed: >>>>>>> >>>>>>> Applies if the group already exists and no offset is committed for a >>>>>> partition. Group existence is defined as the group already having some >>>>>> offsets >>>>>> committed for partitions. Otherwise, auto.offset.reset applies. >>>>>>> Note: we could possibly make this not applicable to static partition >>>>>> assignment cases >>>>>>> >>>>>>> auto.no.offset.reset = earliest >>>>>>> >>>>>>> Handles all other cases as it does today. auto.no.offset.resets take >>>>>> precedence >>>>>>> >>>>>>> auto.offset.reset = latest >>>>>>> >>>>>>> The question is if the consumer can reliably determine if the group is >>>>>> new >>>>>> (has offsets or not) since there may be multiple consumers all >>> committing >>>>>> at >>>>>> the same time around startup. Otherwise we would have to rely on the >>> fact >>>>>> that >>>>>> the new consumer protocol centralizes things >>>>>>> through the coordinator, but this may require more significant changes >>>>>> as you >>>>>> mentioned... >>>>>>> >>>>>>> I agree with Jiunn that we could consider supporting the >>>>>> auto.offset.reset >>>>>> timestamp separately, but I'm still concerned about the case where a >>>>>> consumer >>>>>> crashes during the partition expansion and it's timestamp is lost or >>>>>> incorrect >>>>>> when it restarts. A feature with a false sense of security can be >>>>>> problematic. >>>>>>> >>>>>>> From: [email protected] At: 03/10/26 04:02:58 UTC-4:00To: >>>>>> [email protected] >>>>>>> Subject: Re: [DISCUSS] KIP-1282: Prevent data loss during partition >>>>>> expansion >>>>>> for dynamically added partitions >>>>>>> >>>>>>> Hi Ryan, >>>>>>> >>>>>>> Thanks for sharing your thoughts! We initially tried to leave the RPCs >>>>> as >>>>>>> they were, but you're absolutely right that the current KIP feels a >>> bit >>>>>>> like a kludge. >>>>>>> >>>>>>> If we are open to touching the protocol (and can accept the potential >>>>>> time >>>>>>> skew between nodes), I have another idea: what if we add a creation >>>>>>> timestamp to both the partition and the group? >>>>>>> >>>>>>> This information could be returned to the consumer via the new >>>>> Heartbeat >>>>>>> RPC. The async consumer could then seamlessly leverage these >>> timestamps >>>>>> to >>>>>>> make a deterministic decision between using "latest" or "earliest." >>>>>>> >>>>>>> This approach would only work for the AsyncConsumer using subscribe() >>>>>>> (since it relies on the group state), which actually serves as another >>>>>>> great incentive for users to migrate to the new consumer! >>>>>>> >>>>>>> Thoughts on this direction? >>>>>>> >>>>>>> Best, >>>>>>> >>>>>>> Chia-Ping >>>>>>> >>>>>>> Ryan Leslie (BLOOMBERG/ NEW YORK) <[email protected]> 於 >>>>> 2026年3月10日週二 >>>>>>> 上午6:38寫道: >>>>>>> >>>>>>>> Hi Chia and Jiunn, >>>>>>>> >>>>>>>> Thanks for the response. I agree that the explicit timestamp gives >>>>>> enough >>>>>>>> flexibility for the user to avoid the issue I mentioned with the >>>>>> implicit >>>>>>>> timestamp at startup not matching the time the group instance >>>>> started. >>>>>>>> >>>>>>>> One potential downside is that the user may have to store this >>>>>> timestamp >>>>>>>> somewhere in between restarts. For the group instance id, that is not >>>>>>>> always the case since sometimes it can be derived from the >>>>> environment >>>>>> such >>>>>>>> as the hostname, or hardcoded in an environment variable where it >>>>>> typically >>>>>>>> doesn't need to be updated. >>>>>>>> >>>>>>>> Also, since static instances may be long-lived, preserving just the >>>>>>>> initial timestamp of the first instance might feel a bit awkward, >>>>>> since you >>>>>>>> may end up with static instances restarting and passing timestamps >>>>> that >>>>>>>> could be old like two months ago. The user could instead store >>>>>> something >>>>>>>> like the last time of restart (and subtract metadata max age from it >>>>>> to be >>>>>>>> safe), but it can be considered a burden and may fail if shutdown was >>>>>> not >>>>>>>> graceful, i.e. a crash. >>>>>>>> >>>>>>>> I agree that this KIP provides a workable solution to avoid data loss >>>>>>>> without protocol or broker changes, so I'm +1. But it does still >>>>> feel a >>>>>>>> little like a kludge since what the user really needs is an easy, >>>>>> almost >>>>>>>> implicit, way to not lose data when a recently added partition is >>>>>>>> discovered, and currently there is no metadata for the creation time >>>>>> of a >>>>>>>> partition. The user may not want to even have the same policy applied >>>>>> to >>>>>>>> older partitions for which their offset was deleted. >>>>>>>> >>>>>>>> Even for a consumer group not using static membership, suppose >>>>>> partitions >>>>>>>> are added by a producer and new messages are published. If at the >>>>> same >>>>>> time >>>>>>>> there is consumer group, e.g. with 1 consumer only, and it has >>>>> crashed, >>>>>>>> when it comes back up it may lose messages unless it knows what >>>>>> timestamp >>>>>>>> to pass. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>> From: [email protected] At: 03/07/26 02:46:28 UTC-5:00To: >>>>>>>> [email protected] >>>>>>>> Subject: Re: [DISCUSS] KIP-1282: Prevent data loss during partition >>>>>>>> expansion for dynamically added partitions >>>>>>>> >>>>>>>> Hello Sikka, >>>>>>>> >>>>>>>>> If consumer restarts (app crash, bounce etc.) after dynamically >>>>>> adding >>>>>>>> partitions >>>>>>>>> it would consume unread messages from last committed offset for >>>>>> existing >>>>>>>>> partitions but would still miss the messages from new partition. >>>>>>>> >>>>>>>> For dynamic consumers, a restart inherently means leaving and >>>>> rejoining >>>>>>>> the >>>>>>>> group >>>>>>>> as a new member, so recalculating startupTimestamp = now() is >>>>>> semantically >>>>>>>> correct — >>>>>>>> the consumer is genuinely starting fresh. >>>>>>>> >>>>>>>> The gap you described only applies to static membership, where the >>>>>>>> consumer can >>>>>>>> restart >>>>>>>> without triggering a rebalance, yet the local timestamp still gets >>>>>> reset. >>>>>>>> For >>>>>>>> this scenario, as >>>>>>>> Chia suggested, we could extend the configuration to accept an >>>>> explicit >>>>>>>> timestamp >>>>>>>> This would allow users to pin a fixed reference point across >>>>> restarts, >>>>>>>> effectively closing the gap for >>>>>>>> static membership. For dynamic consumers, the default by_start_time >>>>>>>> without an >>>>>>>> explicit timestamp >>>>>>>> already provides the correct behavior and a significant improvement >>>>>> over >>>>>>>> latest, which would miss >>>>>>>> data even without a restart. >>>>>>>> >>>>>>>>> If the offset are deleted mention in mentioned in Scenario 2 (Log >>>>>>>> truncation) >>>>>>>>> how this solution would address that scenario ? >>>>>>>> >>>>>>>> For the log truncation scenario, when segments are deleted and the >>>>>>>> consumer's >>>>>>>> committed >>>>>>>> offset becomes out of range, auto.offset.reset is triggered. With >>>>>> latest, >>>>>>>> the >>>>>>>> consumer simply jumps >>>>>>>> to the end of the partition, skipping all remaining available data. >>>>>> With >>>>>>>> by_start_time, the consumer looks up >>>>>>>> the position based on the startup timestamp rather than relying on >>>>>>>> offsets. >>>>>>>> Since the lookup is timestamp-based, >>>>>>>> it is not affected by offset invalidation due to truncation. Any data >>>>>> with >>>>>>>> timestamps at or after the startup time >>>>>>>> will still be found and consumed. >>>>>>>> >>>>>>>>> Do we need to care about Clock Skew or SystemTime Issues on >>>>> consumer >>>>>>>> client >>>>>>>> side. >>>>>>>>> Should we use timestamp on the server/broker side ? >>>>>>>> >>>>>>>> Clock skew is a fair concern, but using a server-side timestamp does >>>>>> not >>>>>>>> necessarily make things safer. >>>>>>>> It would mean comparing the Group Coordinator's time against the >>>>>> Partition >>>>>>>> Leader's time, which are often >>>>>>>> different nodes. Without strict clock synchronization across the >>>>> Kafka >>>>>>>> cluster, >>>>>>>> this "happens-before" relationship >>>>>>>> remains fundamentally unpredictable. On the other hand, >>>>>> auto.offset.reset >>>>>>>> is >>>>>>>> strictly a client-level configuration — >>>>>>>> consumers within the same group can intentionally use different >>>>>> policies. >>>>>>>> Tying >>>>>>>> the timestamp to a global server-side >>>>>>>> state would be a semantic mismatch. A local timestamp aligns much >>>>>> better >>>>>>>> with >>>>>>>> the client-level nature of this config. >>>>>>>> >>>>>>>>> Do you plan to have any metrics or observability when consumer >>>>> resets >>>>>>>> offset >>>>>>>> by_start_time >>>>>>>> >>>>>>>> That's a great suggestion. I plan to expose the startup timestamp >>>>> used >>>>>> by >>>>>>>> by_start_time as a client-level metric, >>>>>>>> so users can easily verify which reference point the consumer is >>>>> using >>>>>>>> during >>>>>>>> debugging. >>>>>>>> >>>>>>>> Best Regards, >>>>>>>> Jiunn-Yang >>>>>>>> >>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年3月6日 晚上10:44 寫道: >>>>>>>>> >>>>>>>>> Hi Ryan, >>>>>>>>> >>>>>>>>> That is a fantastic point. A static member restarting and >>>>> capturing a >>>>>>>> newer >>>>>>>> local timestamp is definitely a critical edge case. >>>>>>>>> >>>>>>>>> Since users already need to inject a unique group.instance.id into >>>>>> the >>>>>>>> configuration for static members, my idea is to allow the >>>>>>>> auto.offset.reset >>>>>>>> policy to carry a dedicated timestamp to explicitly "lock" the >>>>> startup >>>>>>>> time >>>>>>>> (for example, using a format like >>>>>> auto.offset.reset=startup:<timestamp>). >>>>>>>>> >>>>>>>>> This means that if users want to leverage this new policy with >>>>> static >>>>>>>> membership, their deployment configuration would simply include these >>>>>> two >>>>>>>> specific injected values (the static ID and the locked timestamp). >>>>>>>>> >>>>>>>>> This approach elegantly maintains the configuration semantics at >>>>> the >>>>>>>> member-level, and most importantly, it avoids any need to update the >>>>>> RPC >>>>>>>> protocol. >>>>>>>>> >>>>>>>>> What do you think of this approach for the static membership >>>>>> scenario? >>>>>>>>> >>>>>>>>> On 2026/03/05 19:45:13 "Ryan Leslie (BLOOMBERG/ NEW YORK)" wrote: >>>>>>>>>> Hey Jiunn, >>>>>>>>>> >>>>>>>>>> Glad to see some progress around this issue. >>>>>>>>>> >>>>>>>>>> I had a similar thought to David, that if the time is only known >>>>>> client >>>>>>>> side >>>>>>>> there are still edge cases for data loss. One case is static >>>>> membership >>>>>>>> where >>>>>>>> from the perspective of a client they are free to restart their >>>>>> consumer >>>>>>>> task >>>>>>>> without actually having left or meaningfully affected the group. >>>>>> However, >>>>>>>> I >>>>>>>> think with the proposed implementation the timestamp is still reset >>>>>> here. >>>>>>>> So if >>>>>>>> the restart happens just after a partition is added and published to, >>>>>> but >>>>>>>> before the consumer metadata refreshed, the group still runs the risk >>>>>> of >>>>>>>> data >>>>>>>> loss. >>>>>>>>>> >>>>>>>>>> It could be argued that keeping the group 'stable' is a >>>>> requirement >>>>>> for >>>>>>>> this >>>>>>>> feature to work, but sometimes it's not possible to accomplish. >>>>>>>>>> >>>>>>>>>> From: [email protected] At: 03/05/26 14:32:20 UTC-5:00To: >>>>>>>> [email protected] >>>>>>>>>> Subject: Re: [DISCUSS] KIP-1282: Prevent data loss during >>>>> partition >>>>>>>> expansion for dynamically added partitions >>>>>>>>>> >>>>>>>>>> Hi Jiunn, >>>>>>>>>> >>>>>>>>>> Thanks for the KIP! >>>>>>>>>> >>>>>>>>>> I was also considering this solution while we discussed in the >>>>>> jira. It >>>>>>>>>> seems to work in most of the cases but not in all. For instance, >>>>>> let’s >>>>>>>>>> imagine a partition created just before a new consumer joins or >>>>>> rejoins >>>>>>>> the >>>>>>>>>> group and this consumer gets the new partition. In this case, the >>>>>>>> consumer >>>>>>>>>> will have a start time which is older than the partition creation >>>>>> time. >>>>>>>>>> This could also happen with the truncation case. It makes the >>>>>> behavior >>>>>>>> kind >>>>>>>>>> of unpredictable again. >>>>>>>>>> >>>>>>>>>> Instead of relying on a local timestamp, one idea would to rely >>>>> on a >>>>>>>>>> timestamp provided by the server. For instance, we could define >>>>> the >>>>>> time >>>>>>>>>> since the group became non-empty. This would define the >>>>> subscription >>>>>>>> time >>>>>>>>>> for the consumer group. The downside is that it only works if the >>>>>>>> consumer >>>>>>>>>> is part of a group. >>>>>>>>>> >>>>>>>>>> In your missing semantic section, I don’t fully understand how the >>>>>> 4th >>>>>>>>>> point is improved by the KIP. It says start from earliest but with >>>>>> the >>>>>>>>>> change it would start from application start time. Could you >>>>>> elaborate? >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> David >>>>>>>>>> >>>>>>>>>> Le jeu. 5 mars 2026 à 12:47, 黃竣陽 <[email protected]> a écrit : >>>>>>>>>> >>>>>>>>>>> Hello chia, >>>>>>>>>>> >>>>>>>>>>> Thanks for your feedback, I have updated the KIP. >>>>>>>>>>> >>>>>>>>>>> Best Regards, >>>>>>>>>>> Jiunn-Yang >>>>>>>>>>> >>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年3月5日 晚上7:24 寫道: >>>>>>>>>>>> >>>>>>>>>>>> hi Jiunn >>>>>>>>>>>> >>>>>>>>>>>> chia_00: Would you mind mentioning KAFKA-19236 in the KIP? It >>>>>> would be >>>>>>>>>>> helpful to let readers know that "Dynamically at partition >>>>>> discovery" >>>>>>>> is >>>>>>>>>>> being tracked as a separate issue >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Chia-Ping >>>>>>>>>>>> >>>>>>>>>>>> On 2026/03/05 11:14:31 黃竣陽 wrote: >>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>> >>>>>>>>>>>>> I would like to start a discussion on KIP-1282: Prevent data >>>>> loss >>>>>>>>>>> during partition expansion for dynamically added partitions >>>>>>>>>>>>> <https://cwiki.apache.org/confluence/x/mIY8G> >>>>>>>>>>>>> >>>>>>>>>>>>> This proposal aims to introduce a new auto.offset.reset policy >>>>>>>>>>> by_start_time, anchoring the >>>>>>>>>>>>> offset reset to the consumer's startup timestamp rather than >>>>>>>> partition >>>>>>>>>>> discovery time, to prevent >>>>>>>>>>>>> silent data loss during partition expansion. >>>>>>>>>>>>> >>>>>>>>>>>>> Best regards, >>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>>> >>> >>> >
