Hello Jun, Thanks for the feedback, I have updated the KIP motivation section.
Best Regards, Jiunn-Yang > Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道: > > Hi, Jiunn-Yang, > > Thanks for the reply. I think we need a stronger motivation for the KIP. > > The KIP says "The core insight is that not all partitions without a > committed offset are the same. A newly expanded partition (hot) is > fundamentally different from a partition the consumer has never seen > because it predates the group (cold)." Why is the hot partition > fundamentally different from the cold? > > The KIP says "The existing by_duration policy is also insufficient because: > > - The calculated seek time (now() - duration) varies across nodes due to > clock skew. To be safe, users must set an overly large duration, causing > unnecessary reprocessing. > - On network errors, the client recalculates the seek time on retry, > shifting the target timestamp forward and risking data loss." > > However, both of these situations are rare. If these issues persist, more > severe problems likely exist elsewhere. Rare situations don't need a common > solution. If users care about those rare situations, they can implement > customized logic using ConsumerRebalanceListener.onPartitionsAssigned(). > > Jun > > > On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote: > >> Hello chia, >> >> Thanks for the feedback, >> >>> If the creation time exists, the returned value should always be greater >> than or equal to zero, right? >> I have explicitly mentioned this in the KIP. >> >>>> New Old (MetadataResponse v0–13) positive any field >> absent UnsupportedVersionException >> >> The earliest point at which we can detect the version mismatch is during >> the >> first metadata fetch after assignment, which occurs inside poll(). >> Therefore, the >> user would encounter an UnsupportedVersionException from poll(). I’ll >> clarify this in the KIP. >> >> Best Regards, >> Jiunn-Yang >> >>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道: >>> >>> hi Jiunn >>> >>>> PartitionAgeMs (int64, default -1): The age of this partition in >> milliseconds, computed server-side by the broker as broker_current_time - >> partition_creation_time. Returns -1 if the broker does not support this >> feature or the partition creation time is unknown. >>> >>> If the creation time exists, the returned value should always be greater >> than or equal to zero, right? >>> >>>> New Old (MetadataResponse v0–13) positive any field >> absent UnsupportedVersionException >>> >>> Will user encounter UnsupportedVersionException when calling `poll()`? >>> >>> Best, >>> Chia-Ping >>> >>> >>> On 2026/05/16 04:30:49 黃竣陽 wrote: >>>> Hello Jun, chia, >>>> >>>> I've updated KIP-1327 with a design change based on the discussion >>>> feedback. >>>> >>>> The updated design decouples the new-partition reset behavior from >>>> the base auto.offset.reset policy: >>>> >>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset >> values >>>> (latest, earliest, by_duration, none). >>>> - For new ("hot") partitions, the consumer resets to >> auto.offset.reset.new.partitions >>>> config setting >>>> - For existing ("cold") partitions, the base auto.offset.reset policy >> continues >>>> to apply unchanged. >>>> - The new-partition reset behavior is represented by a separate >> internal config >>>> (auto.offset.reset.new.partitions, currently fixed to earliest). This >> decoupled design makes >>>> it straightforward to promote the behavior to a public user-facing >> configuration in a future KIP. >>>> >>>> Best Regards, >>>> Jiunn-Yang >>>> >>>> >>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道: >>>>> >>>>> hi Jun >>>>> >>>>> I see what you mean now. The proposal from me is listed below: >>>>> >>>>> 1) Add auto.offset.reset.new.partitions with a default value of >> earliest. It fixes the data loss from both by_duration and latest, and it >> does not change the logic of auto.offset.reset=earliest. >>>>> 2) Mark auto.offset.reset.new.partitions as an internal >> configuration. auto.offset.reset.new.partitions=earliest already >> addresses the issue, and we can discuss the use cases of other values in a >> separate KIP. >>>>> 3) Both configs, auto.offset.reset.new.partitions and >> auto.offset.reset.latest.max.age.ms, will be applied to all for >> consistency. >>>>> >>>>> WDYT? >>>>> >>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote: >>>>>> Hi, Chia-Ping, >>>>>> >>>>>> Thanks for the reply. >>>>>> >>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is >> expanded >>>>>> with new partitions, consumers using the latest auto offset reset >> policy >>>>>> will silently miss all records produced to those partitions before the >>>>>> consumer discovers them.". If a user sets >>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue could >> also >>>>>> happen, right? >>>>>> >>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the same >>>>>> values as auto.offset.reset. So a user could set it by_duration if >> needed. >>>>>> >>>>>> Jun >>>>>> >>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected]> >> wrote: >>>>>> >>>>>>> hi Jun >>>>>>> >>>>>>> Thanks for the feedback. I might be missing something important from >> your >>>>>>> suggestion, so please bear with me as I try to clarify with a few >> questions: >>>>>>> >>>>>>> 1. Is there a strong use case for extending this logic to other reset >>>>>>> policies? Unlike latest, policies like earliest or by_duration don't >> seem >>>>>>> to suffer from the same silent data loss issue when a partition is >> expanded. >>>>>>> >>>>>>> 2. What values would we expect users to configure for >>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or >> latest, >>>>>>> we might run into the exact same edge cases. For example, if a >> consumer is >>>>>>> offline for a while and a new partition is created during that >> downtime, >>>>>>> the user might actually want to skip to latest when resuming, rather >> than >>>>>>> reading from earliest just because the partition is technically >> "new" to >>>>>>> the group. >>>>>>> >>>>>>> This is exactly why we opted for introducing a max.age threshold. It >> gives >>>>>>> users a time-bound way to define what is genuinely "hot/new" and >> what is >>>>>>> just an old partition they haven't seen yet. >>>>>>> >>>>>>> Best, >>>>>>> Chia-Ping >>>>>>> >>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote: >>>>>>>> Hi, Jiunn-Yang, >>>>>>>> >>>>>>>> Thanks for the KIP. >>>>>>>> >>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only >> applies when >>>>>>>> auto.offset.reset is latest. However, it seems that the motivation >>>>>>> equally >>>>>>>> applies when auto.offset.reset is set to other values like >> by_duration. >>>>>>> The >>>>>>>> intention is that we want to have a separate way to control newly >> created >>>>>>>> partitions vs existing partitions when the group starts. Have we >>>>>>> considered >>>>>>>> adding a new config like auto.offset.reset.new.partitions? If this >> new >>>>>>>> config is not set, the offset reset policy defaults to the policy >> used >>>>>>> for >>>>>>>> existing partitions. The user could set it explicitly to customize >> the >>>>>>>> behavior for new partitions. >>>>>>>> >>>>>>>> Jun >>>>>>>> >>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> I’d like to manually bump this thread. >>>>>>>>> >>>>>>>>> Best Regards, >>>>>>>>> Jiunn-Yang >>>>>>>>> >>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道: >>>>>>>>>> >>>>>>>>>> Hello all, >>>>>>>>>> >>>>>>>>>> Thanks for the feedback. >>>>>>>>>> >>>>>>>>>> DJ01/DJ02: >>>>>>>>>> >>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata >> struct >>>>>>>>> gains a new >>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side by >> the >>>>>>>>> broker as >>>>>>>>>> broker_current_time - partition_creation_time. >>>>>>>>>> >>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager >> detects >>>>>>> a >>>>>>>>> newly assigned >>>>>>>>>> partition, it explicitly invalidates the metadata for the affected >>>>>>> topic >>>>>>>>> and forces a fresh MetadataRequest >>>>>>>>>> before making the offset reset decision, even if the topic ID is >>>>>>> already >>>>>>>>> in the cache. >>>>>>>>>> >>>>>>>>>> MB0: >>>>>>>>>> >>>>>>>>>> The consumer learns the broker's maximum supported >> MetadataResponse >>>>>>>>> version via the >>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated >>>>>>> version is >>>>>>>>> unsupported, the consumer >>>>>>>>>> knows the broker does not support PartitionAgeMs at all and can >>>>>>> throw an >>>>>>>>> UnsupportedVersionException >>>>>>>>>> immediately, rather than silently falling back to latest and >> risking >>>>>>>>> data loss without any operator-visible signal. >>>>>>>>>> >>>>>>>>>> MB1/MB2/MB3: >>>>>>>>>> >>>>>>>>>> I have addressed these changes in the KIP. >>>>>>>>>> >>>>>>>>>> Best Regards, >>>>>>>>>> Jiunn-Yang >>>>>>>>>> >>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道: >>>>>>>>>>> >>>>>>>>>>> hi David >>>>>>>>>>> >>>>>>>>>>> I agree with the direction of moving the 'age' resolution from >> the >>>>>>>>> Heartbeat API to the Metadata API to keep the control plane clean. >> The >>>>>>> main >>>>>>>>> trade-off, as we noted before, is introducing inter-broker clock >> skew. >>>>>>> The >>>>>>>>> Group Coordinator approach provided a single source of truth for >> time. >>>>>>>>>>> >>>>>>>>>>> However, realistically, this time skew should be negligible. >> Given >>>>>>> that >>>>>>>>> the max.age threshold will likely be configured in minutes or >> hours, a >>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact the >>>>>>>>> fallback decision. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> Chia-Ping >>>>>>>>>>> >>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 下午3:29 >> 寫道: >>>>>>>>>>>> >>>>>>>>>>>> Hi all, >>>>>>>>>>>> >>>>>>>>>>>> Thanks for the KIP! >>>>>>>>>>>> >>>>>>>>>>>> Sorry, I haven't really followed the previous conversation but I >>>>>>> took a >>>>>>>>>>>> quick look at this one. >>>>>>>>>>>> >>>>>>>>>>>> DJ01: I don't clearly understand the flow with the >>>>>>>>> ConsumerGroupHeartbeat >>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP >> states >>>>>>> that >>>>>>>>>>>> partition ages are returned only when this boolean is set. >>>>>>> Implicitly, >>>>>>>>> this >>>>>>>>>>>> means that when the consumer receives a new partition, it will >>>>>>> issue a >>>>>>>>> new >>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my >>>>>>>>> understanding >>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain >> how it >>>>>>>>> fits >>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, etc.). >>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if >>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this given >>>>>>> that >>>>>>>>> a new >>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply >> include >>>>>>> the >>>>>>>>>>>> metadata. Generally, we should be rather cautious about not >>>>>>> overloading >>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The API >> is >>>>>>> a >>>>>>>>>>>> control plane API for assigning or revoking partitions. The fact >>>>>>> that >>>>>>>>> we >>>>>>>>>>>> don't want to add it to the corresponding Streams API also >> suggests >>>>>>>>>>>> something is not quite right. What would we do if we want to >>>>>>> support >>>>>>>>>>>> Streams in the future? >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> David >>>>>>>>>>>> >>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev < >>>>>>>>>>>>> [email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Jiunn, >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap. >>>>>>>>>>>>> >>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges >> field. >>>>>>> Can a >>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on >>>>>>> TopicPartitions) >>>>>>>>> be >>>>>>>>>>>>> used here and avoid version bump? >>>>>>>>>>>>> >>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or a >>>>>>> ConfigDef >>>>>>>>>>>>> validator? Probably it should based on the metadata.max.age.ms >> ? >>>>>>>>> Sizing >>>>>>>>>>>>> instructions can be part of javadocs I guess. >>>>>>>>>>>>> >>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, would >> it >>>>>>> be >>>>>>>>> better >>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to the >>>>>>>>>>>>> StreamsConfig block list >>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS) >>>>>>>>> for a >>>>>>>>>>>>> clear warning, incase users configure it? This is the most >>>>>>> familiar >>>>>>>>>>>>> consumer config and users might easily mistakenly configure >> it. Or >>>>>>>>> may be >>>>>>>>>>>>> it's not worth it to add. >>>>>>>>>>>>> >>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to >> earliest" >>>>>>>>> reads as >>>>>>>>>>>>> if the config were being changed per-partition which isn't >>>>>>> supported. >>>>>>>>> May >>>>>>>>>>>>> be rephrasing to something like "consumer resolves the initial >>>>>>>>> position to >>>>>>>>>>>>> start offset for that partition" as if earliest was applied to >>>>>>> that >>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Murali >>>>>>>>>>>>> >>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]> >> wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>> Hi chia, >>>>>>>>>>>>>> >>>>>>>>>>>>>> I have updated the KIP to include this change. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Best Regards, >>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>> >>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 寫道: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> hi Jiunn-Yang >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via the >>>>>>> Admin >>>>>>>>> API? >>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and >>>>>>> troubleshoot >>>>>>>>> the >>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Best, >>>>>>>>>>>>>>> Chia-Ping >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote: >>>>>>>>>>>>>>>> Hello everyone, >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot >> Data >>>>>>>>> Loss >>>>>>>>>>>>> on >>>>>>>>>>>>>> Partition Expansion for Latest Policy >>>>>>>>>>>>>>>> < >>>>>>>>>>>>> >>>>>>>>> >>>>>>> >> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$ >>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> This proposal aims to introduces >>>>>>> auto.offset.reset.latest.max.age, >>>>>>>>> a >>>>>>>>>>>>>> consumer config that lets the >>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot) >> partitions >>>>>>>>> from >>>>>>>>>>>>>> long-existing (cold) ones. Partitions >>>>>>>>>>>>>>>> younger than the configured threshold automatically fall >> back >>>>>>> to >>>>>>>>>>>>>> earliest, preventing silent data loss >>>>>>>>>>>>>>>> during topic expansion without forcing a full historical >>>>>>> reprocess. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Best regards, >>>>>>>>>>>>>>>> Jiunn-Yang >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>> >>>> >> >>
