Hi Manan,

Thanks for the discussion.

MG1: Good point. I've updated the KIP to add topic-level overrides for
both configs so operators can tune the threshold per topic if the
cluster-wide default doesn't fit a particular workload.

MG2: This shouldn't cause skew. The localLogStartOffset sorting only
pushes down newly-bootstrapped replicas that have significantly less
local data. For existing replicas with comparable local data, they'll
be within the threshold and fall back to the original assignment
order, same behavior as today. We're not introducing a new dimension
that would systematically favor certain racks or brokers.

MG3: When tiered storage is not enabled for a topic, we will not send
AlterPartition requests to report localLogStartOffset. There will be
no extra control-plane overhead for clusters or topics that don't use
tiered storage. When enabled, the additional AlterPartition calls only
fire when an ISR member's localLogStartOffset actually changes, which
reuses the existing protocol and should be infrequent.

Thanks,
Tom

On Wed, Apr 29, 2026 at 5:42 PM Thomas Thornton
<[email protected]> wrote:
>
> Hi Ivan,
>
> Thanks for the feedback.
>
> IY1: Yes, the sort is stable. Replicas within the threshold are
> considered equivalent and retain their original assignment order.
> We're only reordering replicas that have significantly less local
> data, the existing replicas keep the same relative ordering as before.
> Updated that part of the KIP to reflect this.
>
> IY2: Good idea. I've updated the KIP to add topic-level overrides for
> both configs. This follows the standard Kafka pattern (like
> `retention.ms`, `log.retention.ms`). The cluster-wide default applies
> unless overridden per topic.
>
> Thanks,
> Tom
>
> On Fri, Apr 24, 2026 at 3:18 PM Ivan Yurchenko <[email protected]> wrote:
> >
> > Hi Thomas,
> >
> > Thank you for the KIP. The motivation makes sense to me. I have a couple of 
> > comments:
> >
> > IY1:
> > > When `leader.election.prefer.early.local.log.start.offset is enabled`, 
> > > the key change is to sort targetReplicas by local-log-start-offset 
> > > (ascending) before selecting a leader. This ensures replicas with more 
> > > local data (lower local-log-start-offset) are considered first in both 
> > > election paths.
> >
> > I assume here it meant to say "sort stably", to preserve the original 
> > preference order as much as possible?
> >
> > IY2:
> > Can we find a reason for a particular topic to not follow the new leader 
> > election algorithm, or it is strictly better and once enabled it's not 
> > expected to be disabled? If the answer is yes, would you consider adding 
> > the topic-level versions of the new configs 
> > leader.election.prefer.early.local.log.start.offset and 
> > leader.election.local.log.start.offset.threshold?
> >
> > Best,
> > Ivan
> >
> >
> > On Mon, Mar 30, 2026, at 20:43, Thomas Thornton via dev wrote:
> >
> > Hi all,
> >
> > We want to start a discussion thread for KIP-1303: Deprioritize Tiered
> > Storage Followers In Leader Election.
> >
> > The adopted KIP-1023 introduced an optimization allowing followers to
> > skip replicating data already in remote storage, dramatically reducing
> > ISR join time. However, as noted in KIP-1023, this creates a risk: if
> > such a follower becomes leader, it may need to serve consumer requests
> > from remote storage, impacting performance.
> >
> > This KIP proposes to mitigate this risk by preferring replicas with
> > more local data (lower localLogStartOffset) during leader election.
> > Key changes include:
> > 1) New config leader.election.prefer.early.local.log.start.offset to
> > enable the feature
> > 2) New config leader.election.local.log.start.offset.threshold to
> > avoid leader churn from minor retention timing differences
> > 3) Extending FetchRequest and AlterPartition to propagate
> > localLogStartOffset from followers → leader → controller
> >
> > The full KIP is available here:
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1303%3A+Deprioritize+Tiered+Storage+Followers+In+Leader+Election
> >
> > Thanks,
> > Tom
> >
> >

Reply via email to