Hello Jun,

Thanks for the feedback, I have updated the KIP motivation section.

Best Regards,
Jiunn-Yang

> Jun Rao via dev <[email protected]> 於 2026年5月30日 凌晨1:12 寫道:
> 
> Hi, Jiunn-Yang,
> 
> Thanks for the reply. I think we need a stronger motivation for the KIP.
> 
> The KIP says "The core insight is that not all partitions without a
> committed offset are the same. A newly expanded partition (hot) is
> fundamentally different from a partition the consumer has never seen
> because it predates the group (cold)." Why is the hot partition
> fundamentally different from the cold?
> 
> The KIP says "The existing by_duration policy is also insufficient because:
> 
>   - The calculated seek time (now() - duration) varies across nodes due to
>   clock skew. To be safe, users must set an overly large duration, causing
>   unnecessary reprocessing.
>   - On network errors, the client recalculates the seek time on retry,
>   shifting the target timestamp forward and risking data loss."
> 
> However, both of these situations are rare. If these issues persist, more
> severe problems likely exist elsewhere. Rare situations don't need a common
> solution. If users care about those rare situations, they can implement
> customized logic using ConsumerRebalanceListener.onPartitionsAssigned().
> 
> Jun
> 
> 
> On Sun, May 17, 2026 at 6:50 AM 黃竣陽 <[email protected]> wrote:
> 
>> Hello chia,
>> 
>> Thanks for the feedback,
>> 
>>> If the creation time exists, the returned value should always be greater
>> than or equal to zero, right?
>> I have explicitly mentioned this in the KIP.
>> 
>>>> New  Old (MetadataResponse v0–13)    positive        any     field
>> absent    UnsupportedVersionException
>> 
>> The earliest point at which we can detect the version mismatch is during
>> the
>> first metadata fetch after assignment, which occurs inside poll().
>> Therefore, the
>> user would encounter an UnsupportedVersionException from poll(). I’ll
>> clarify this in the KIP.
>> 
>> Best Regards,
>> Jiunn-Yang
>> 
>>> Chia-Ping Tsai <[email protected]> 於 2026年5月17日 下午4:50 寫道:
>>> 
>>> hi Jiunn
>>> 
>>>> PartitionAgeMs (int64, default -1): The age of this partition in
>> milliseconds, computed server-side by the broker as broker_current_time -
>> partition_creation_time. Returns -1 if the broker does not support this
>> feature or the partition creation time is unknown.
>>> 
>>> If the creation time exists, the returned value should always be greater
>> than or equal to zero, right?
>>> 
>>>> New  Old (MetadataResponse v0–13)    positive        any     field
>> absent    UnsupportedVersionException
>>> 
>>> Will user encounter UnsupportedVersionException when calling `poll()`?
>>> 
>>> Best,
>>> Chia-Ping
>>> 
>>> 
>>> On 2026/05/16 04:30:49 黃竣陽 wrote:
>>>> Hello Jun, chia,
>>>> 
>>>> I've updated KIP-1327 with a design change based on the discussion
>>>> feedback.
>>>> 
>>>> The updated design decouples the new-partition reset behavior from
>>>> the base auto.offset.reset policy:
>>>> 
>>>> - auto.offset.reset.max.age.ms now applies to all auto.offset.reset
>> values
>>>> (latest, earliest, by_duration, none).
>>>> - For new ("hot") partitions, the consumer resets to
>> auto.offset.reset.new.partitions
>>>> config setting
>>>> - For existing ("cold") partitions, the base auto.offset.reset policy
>> continues
>>>> to apply unchanged.
>>>> - The new-partition reset behavior is represented by a separate
>> internal config
>>>> (auto.offset.reset.new.partitions, currently fixed to earliest). This
>> decoupled design makes
>>>> it straightforward to promote the behavior to a public user-facing
>> configuration in a future KIP.
>>>> 
>>>> Best Regards,
>>>> Jiunn-Yang
>>>> 
>>>> 
>>>>> Chia-Ping Tsai <[email protected]> 於 2026年5月16日 清晨7:46 寫道:
>>>>> 
>>>>> hi Jun
>>>>> 
>>>>> I see what you mean now. The proposal from me is listed below:
>>>>> 
>>>>> 1) Add auto.offset.reset.new.partitions with a default value of
>> earliest. It fixes the data loss from both by_duration and latest, and it
>> does not change the logic of auto.offset.reset=earliest.
>>>>> 2) Mark auto.offset.reset.new.partitions as an internal
>> configuration. auto.offset.reset.new.partitions=earliest already
>> addresses the issue, and we can discuss the use cases of other values in a
>> separate KIP.
>>>>> 3) Both configs, auto.offset.reset.new.partitions and
>> auto.offset.reset.latest.max.age.ms, will be applied to all for
>> consistency.
>>>>> 
>>>>> WDYT?
>>>>> 
>>>>> On 2026/05/15 20:53:20 Jun Rao via dev wrote:
>>>>>> Hi, Chia-Ping,
>>>>>> 
>>>>>> Thanks for the reply.
>>>>>> 
>>>>>> 1. In the motivation section, the KIP says "When a Kafka topic is
>> expanded
>>>>>> with new partitions, consumers using the latest auto offset reset
>> policy
>>>>>> will silently miss all records produced to those partitions before the
>>>>>> consumer discovers them.". If a user sets
>>>>>> auto.offset.reset=by_duration=1sec, the same record loss issue could
>> also
>>>>>> happen, right?
>>>>>> 
>>>>>> 2. I was thinking auto.offset.reset.new.partitions will take the same
>>>>>> values as auto.offset.reset. So a user could set it by_duration if
>> needed.
>>>>>> 
>>>>>> Jun
>>>>>> 
>>>>>> On Thu, May 14, 2026 at 4:06 PM Chia-Ping Tsai <[email protected]>
>> wrote:
>>>>>> 
>>>>>>> hi Jun
>>>>>>> 
>>>>>>> Thanks for the feedback. I might be missing something important from
>> your
>>>>>>> suggestion, so please bear with me as I try to clarify with a few
>> questions:
>>>>>>> 
>>>>>>> 1. Is there a strong use case for extending this logic to other reset
>>>>>>> policies? Unlike latest, policies like earliest or by_duration don't
>> seem
>>>>>>> to suffer from the same silent data loss issue when a partition is
>> expanded.
>>>>>>> 
>>>>>>> 2. What values would we expect users to configure for
>>>>>>> auto.offset.reset.new.partitions? If they set it to earliest or
>> latest,
>>>>>>> we might run into the exact same edge cases. For example, if a
>> consumer is
>>>>>>> offline for a while and a new partition is created during that
>> downtime,
>>>>>>> the user might actually want to skip to latest when resuming, rather
>> than
>>>>>>> reading from earliest just because the partition is technically
>> "new" to
>>>>>>> the group.
>>>>>>> 
>>>>>>> This is exactly why we opted for introducing a max.age threshold. It
>> gives
>>>>>>> users a time-bound way to define what is genuinely "hot/new" and
>> what is
>>>>>>> just an old partition they haven't seen yet.
>>>>>>> 
>>>>>>> Best,
>>>>>>> Chia-Ping
>>>>>>> 
>>>>>>> On 2026/05/14 20:48:09 Jun Rao via dev wrote:
>>>>>>>> Hi, Jiunn-Yang,
>>>>>>>> 
>>>>>>>> Thanks for the KIP.
>>>>>>>> 
>>>>>>>> I find auto.offset.reset.latest.max.age a bit weird. It only
>> applies when
>>>>>>>> auto.offset.reset is latest. However, it seems that the motivation
>>>>>>> equally
>>>>>>>> applies when auto.offset.reset is set to other values like
>> by_duration.
>>>>>>> The
>>>>>>>> intention is that we want to have a separate way to control newly
>> created
>>>>>>>> partitions vs existing partitions when the group starts. Have we
>>>>>>> considered
>>>>>>>> adding a new config like auto.offset.reset.new.partitions? If this
>> new
>>>>>>>> config is not set, the offset reset policy defaults to the policy
>> used
>>>>>>> for
>>>>>>>> existing partitions. The user could set it explicitly to customize
>> the
>>>>>>>> behavior for new partitions.
>>>>>>>> 
>>>>>>>> Jun
>>>>>>>> 
>>>>>>>> On Thu, May 7, 2026 at 5:07 AM 黃竣陽 <[email protected]> wrote:
>>>>>>>> 
>>>>>>>>> Hi all,
>>>>>>>>> 
>>>>>>>>> I’d like to manually bump this thread.
>>>>>>>>> 
>>>>>>>>> Best Regards,
>>>>>>>>> Jiunn-Yang
>>>>>>>>> 
>>>>>>>>>> 黃竣陽 <[email protected]> 於 2026年5月1日 晚上10:37 寫道:
>>>>>>>>>> 
>>>>>>>>>> Hello all,
>>>>>>>>>> 
>>>>>>>>>> Thanks for the feedback.
>>>>>>>>>> 
>>>>>>>>>> DJ01/DJ02:
>>>>>>>>>> 
>>>>>>>>>> MetadataResponse bumps from v13 to v14. The PartitionMetadata
>> struct
>>>>>>>>> gains a new
>>>>>>>>>> field PartitionAgeMs (int64, default -1), computed server-side by
>> the
>>>>>>>>> broker as
>>>>>>>>>> broker_current_time - partition_creation_time.
>>>>>>>>>> 
>>>>>>>>>> Also add the consumer heartbeat flow. when MembershipManager
>> detects
>>>>>>> a
>>>>>>>>> newly assigned
>>>>>>>>>> partition, it explicitly invalidates the metadata for the affected
>>>>>>> topic
>>>>>>>>> and forces a fresh MetadataRequest
>>>>>>>>>> before making the offset reset decision, even if the topic ID is
>>>>>>> already
>>>>>>>>> in the cache.
>>>>>>>>>> 
>>>>>>>>>> MB0:
>>>>>>>>>> 
>>>>>>>>>> The consumer learns the broker's maximum supported
>> MetadataResponse
>>>>>>>>> version via the
>>>>>>>>>> ApiVersions negotiation at connection time. If the negotiated
>>>>>>> version is
>>>>>>>>> unsupported, the consumer
>>>>>>>>>> knows the broker does not support PartitionAgeMs at all and can
>>>>>>> throw an
>>>>>>>>> UnsupportedVersionException
>>>>>>>>>> immediately, rather than silently falling back to latest and
>> risking
>>>>>>>>> data loss without any operator-visible signal.
>>>>>>>>>> 
>>>>>>>>>> MB1/MB2/MB3:
>>>>>>>>>> 
>>>>>>>>>> I have addressed these changes in the KIP.
>>>>>>>>>> 
>>>>>>>>>> Best Regards,
>>>>>>>>>> Jiunn-Yang
>>>>>>>>>> 
>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月29日 下午4:04 寫道:
>>>>>>>>>>> 
>>>>>>>>>>> hi David
>>>>>>>>>>> 
>>>>>>>>>>> I agree with the direction of moving the 'age' resolution from
>> the
>>>>>>>>> Heartbeat API to the Metadata API to keep the control plane clean.
>> The
>>>>>>> main
>>>>>>>>> trade-off, as we noted before, is introducing inter-broker clock
>> skew.
>>>>>>> The
>>>>>>>>> Group Coordinator approach provided a single source of truth for
>> time.
>>>>>>>>>>> 
>>>>>>>>>>> However, realistically, this time skew should be negligible.
>> Given
>>>>>>> that
>>>>>>>>> the max.age threshold will likely be configured in minutes or
>> hours, a
>>>>>>>>> typical NTP skew (in milliseconds) between brokers won't impact the
>>>>>>>>> fallback decision.
>>>>>>>>>>> 
>>>>>>>>>>> Best,
>>>>>>>>>>> Chia-Ping
>>>>>>>>>>> 
>>>>>>>>>>>> David Jacot via dev <[email protected]> 於 2026年4月29日 下午3:29
>> 寫道:
>>>>>>>>>>>> 
>>>>>>>>>>>> Hi all,
>>>>>>>>>>>> 
>>>>>>>>>>>> Thanks for the KIP!
>>>>>>>>>>>> 
>>>>>>>>>>>> Sorry, I haven't really followed the previous conversation but I
>>>>>>> took a
>>>>>>>>>>>> quick look at this one.
>>>>>>>>>>>> 
>>>>>>>>>>>> DJ01: I don't clearly understand the flow with the
>>>>>>>>> ConsumerGroupHeartbeat
>>>>>>>>>>>> API after reading the KIP. There is a new boolean; the KIP
>> states
>>>>>>> that
>>>>>>>>>>>> partition ages are returned only when this boolean is set.
>>>>>>> Implicitly,
>>>>>>>>> this
>>>>>>>>>>>> means that when the consumer receives a new partition, it will
>>>>>>> issue a
>>>>>>>>> new
>>>>>>>>>>>> HB request with the boolean set to receive the ages. Is my
>>>>>>>>> understanding
>>>>>>>>>>>> correct? We should perhaps clarify the flow and also explain
>> how it
>>>>>>>>> fits
>>>>>>>>>>>> into the existing flow (e.g. list offsets, fetch offsets, etc.).
>>>>>>>>>>>> DJ02: It my understanding is correct, I wonder if
>>>>>>>>>>>> the ConsumerGroupHeartbeat API is the right place for this given
>>>>>>> that
>>>>>>>>> a new
>>>>>>>>>>>> round trip is done anyway. Alternatively, it could simply
>> include
>>>>>>> the
>>>>>>>>>>>> metadata. Generally, we should be rather cautious about not
>>>>>>> overloading
>>>>>>>>>>>> the ConsumerGroupHeartbeat API with unrelated concepts. The API
>> is
>>>>>>> a
>>>>>>>>>>>> control plane API for assigning or revoking partitions. The fact
>>>>>>> that
>>>>>>>>> we
>>>>>>>>>>>> don't want to add it to the corresponding Streams API also
>> suggests
>>>>>>>>>>>> something is not quite right. What would we do if we want to
>>>>>>> support
>>>>>>>>>>>> Streams in the future?
>>>>>>>>>>>> 
>>>>>>>>>>>> Best,
>>>>>>>>>>>> David
>>>>>>>>>>>> 
>>>>>>>>>>>>> On Wed, Apr 29, 2026 at 12:28 AM Muralidhar Basani via dev <
>>>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Hi Jiunn,
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for this great kip. Good to know about the gap.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> mb-0 - why a new v2 version bump for RequestPartitionAges
>> field.
>>>>>>> Can a
>>>>>>>>>>>>> tagged field (for ex: on response, PartitionAges on
>>>>>>> TopicPartitions)
>>>>>>>>> be
>>>>>>>>>>>>> used here and avoid version bump?
>>>>>>>>>>>>> 
>>>>>>>>>>>>> mb-1 - For the new config, is there a recommended value or a
>>>>>>> ConfigDef
>>>>>>>>>>>>> validator? Probably it should based on the metadata.max.age.ms
>> ?
>>>>>>>>> Sizing
>>>>>>>>>>>>> instructions can be part of javadocs I guess.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> mb-2 - (minor) As there are no changes to Kafka Streams, would
>> it
>>>>>>> be
>>>>>>>>> better
>>>>>>>>>>>>> to add this new config auto.offset.reset.latest.max.age to the
>>>>>>>>>>>>> StreamsConfig block list
>>>>>>> (NON_CONFIGURABLE_CONSUMER_DEFAULT_CONFIGS)
>>>>>>>>> for a
>>>>>>>>>>>>> clear warning, incase users configure it? This is the most
>>>>>>> familiar
>>>>>>>>>>>>> consumer config and users might easily mistakenly configure
>> it. Or
>>>>>>>>> may be
>>>>>>>>>>>>> it's not worth it to add.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> mb-3 - (minor) The phrasing "the consumer falls back to
>> earliest"
>>>>>>>>> reads as
>>>>>>>>>>>>> if the config were being changed per-partition which isn't
>>>>>>> supported.
>>>>>>>>> May
>>>>>>>>>>>>> be rephrasing to something like "consumer resolves the initial
>>>>>>>>> position to
>>>>>>>>>>>>> start offset for that partition" as if earliest was applied to
>>>>>>> that
>>>>>>>>>>>>> partition only and auto.offset.reset config is unchanged.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Murali
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On Tue, Apr 28, 2026 at 2:48 PM 黃竣陽 <[email protected]>
>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Hi chia,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> I have updated the KIP to include this change.
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> Best Regards,
>>>>>>>>>>>>>> Jiunn-Yang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Chia-Ping Tsai <[email protected]> 於 2026年4月28日 晚上8:03 寫道:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> hi Jiunn-Yang
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> chia_0: Should we expose the partition creation time via the
>>>>>>> Admin
>>>>>>>>> API?
>>>>>>>>>>>>>> I assume it would be valuable for users to diagnose and
>>>>>>> troubleshoot
>>>>>>>>> the
>>>>>>>>>>>>>> behavior of auto.offset.reset.latest.max.age
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>> Chia-Ping
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On 2026/04/28 10:47:58 黃竣陽 wrote:
>>>>>>>>>>>>>>>> Hello everyone,
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> I would like to start a discussion on KIP-1327 Prevent Hot
>> Data
>>>>>>>>> Loss
>>>>>>>>>>>>> on
>>>>>>>>>>>>>> Partition Expansion for Latest Policy
>>>>>>>>>>>>>>>> <
>>>>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>> https://urldefense.com/v3/__https://cwiki.apache.org/confluence/x/KY4mGQ__;!!Ayb5sqE7!qF4q1QzF1RRgP61D7A2xuEai1ky7fepKDKFFvpNBuePikH-ULmT87TvuuZzy5kau5E4y5zMZAmfQQiwZomM$
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> This proposal aims to introduces
>>>>>>> auto.offset.reset.latest.max.age,
>>>>>>>>> a
>>>>>>>>>>>>>> consumer config that lets the
>>>>>>>>>>>>>>>> latest reset policy distinguish newly expanded (hot)
>> partitions
>>>>>>>>> from
>>>>>>>>>>>>>> long-existing (cold) ones. Partitions
>>>>>>>>>>>>>>>> younger than the configured threshold automatically fall
>> back
>>>>>>> to
>>>>>>>>>>>>>> earliest, preventing silent data loss
>>>>>>>>>>>>>>>> during topic expansion without forcing a full historical
>>>>>>> reprocess.
>>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>>> Best regards,
>>>>>>>>>>>>>>>> Jiunn-Yang
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> 
>>>> 
>> 
>> 

Reply via email to