Hi Thomas, Thanks for the detailed write-up. I like the direction of using local log coverage to steer leader election and reduce the KIP-1023 risk of thin-local leaders, without hard-blocking election when the cluster has no better option. A few follow-ups would help me understand how this will behave in mixed production environments:
MG1: How does the default message-based gap perform across very different partition workloads? On quiet topics vs hot partitions, the same offset delta can represent quite different “amount of local” in practice. Would you consider adding the topic-level versions of the new configs as Ivan pointed out above? MG2: If a subset of replicas (or a rack) systematically has a lower localLogStartOffset and thus is preferred for leadership more often, do you see that as a material risk for cross-rack traffic or per-broker load skew? If so, is there any guidance in scope for this KIP, or a pointer to follow-up work (e.g. placement, reassignment, or monitoring) so teams can keep leadership balanced while still using this feature? MG3: How much extra control-plane and metadata work do you expect from LSO updates and added AlterPartition calls? Particularly for large clusters. Also, when the feature is off or tiered storage isn’t used, does this path remain completely quiet? Regards, Manan Gupta On Fri, Apr 24, 2026 at 6:48 PM Ivan Yurchenko <[email protected]> wrote: > Hi Thomas, > > Thank you for the KIP. The motivation makes sense to me. I have a couple > of comments: > > IY1: > > When `leader.election.prefer.early.local.log.start.offset is enabled`, > the key change is to sort targetReplicas by local-log-start-offset > (ascending) before selecting a leader. This ensures replicas with more > local data (lower local-log-start-offset) are considered first in both > election paths. > > I assume here it meant to say "sort stably", to preserve the original > preference order as much as possible? > > IY2: > Can we find a reason for a particular topic to not follow the new leader > election algorithm, or it is strictly better and once enabled it's not > expected to be disabled? If the answer is yes, would you consider adding > the topic-level versions of the new configs > `leader.election.prefer.early.local.log.start.offset` and > `leader.election.local.log.start.offset.threshold`? > > Best, > Ivan > > > On Mon, Mar 30, 2026, at 20:43, Thomas Thornton via dev wrote: > > Hi all, > > > > We want to start a discussion thread for KIP-1303: Deprioritize Tiered > > Storage Followers In Leader Election. > > > > The adopted KIP-1023 introduced an optimization allowing followers to > > skip replicating data already in remote storage, dramatically reducing > > ISR join time. However, as noted in KIP-1023, this creates a risk: if > > such a follower becomes leader, it may need to serve consumer requests > > from remote storage, impacting performance. > > > > This KIP proposes to mitigate this risk by preferring replicas with > > more local data (lower localLogStartOffset) during leader election. > > Key changes include: > > 1) New config leader.election.prefer.early.local.log.start.offset to > > enable the feature > > 2) New config leader.election.local.log.start.offset.threshold to > > avoid leader churn from minor retention timing differences > > 3) Extending FetchRequest and AlterPartition to propagate > > localLogStartOffset from followers → leader → controller > > > > The full KIP is available here: > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1303%3A+Deprioritize+Tiered+Storage+Followers+In+Leader+Election > > > > Thanks, > > Tom > > >
