[
https://issues.apache.org/jira/browse/KAFKA-15354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755805#comment-17755805
]
Sagar Rao commented on KAFKA-15354:
-----------------------------------
[~dengziming], I took a look at this. I believe this is happening because when
we are trying to find the first replica of a new partition,
[here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L362],
we set the index back to 0 when the epochs don't match
[here|https://github.com/apache/kafka/blob/trunk/metadata/src/main/java/org/apache/kafka/metadata/placement/StripedReplicaPlacer.java#L190].
In the test case you supplied, when we are adding partition 2, the epoch known
to the brokers in rack 1 is 1 but the new incoming epoch is 2. So, the index is
reset back to 0. I think that's why in this round as well we see broker 1 being
assigned the leader. WDYT?
> Partition leader is not evenly distributed in kraft mode
> --------------------------------------------------------
>
> Key: KAFKA-15354
> URL: https://issues.apache.org/jira/browse/KAFKA-15354
> Project: Kafka
> Issue Type: Bug
> Reporter: Deng Ziming
> Priority: Major
>
> In StripedReplicaPlacerTest, we can create a test below to reproduce this bug.
> {code:java}
> // code placeholder
> @Test
> public void testReplicaDistribution() {
> MockRandom random = new MockRandom();
> StripedReplicaPlacer placer = new StripedReplicaPlacer(random);
> TopicAssignment assignment = place(placer, 0, 4, (short) 2, Arrays.asList(
> new UsableBroker(0, Optional.of("0"), false),
> new UsableBroker(1, Optional.of("0"), false),
> new UsableBroker(2, Optional.of("1"), false),
> new UsableBroker(3, Optional.of("1"), false)));
> System.out.println(assignment);
> } {code}
> In StripedReplicaPlacer, we only ensure leader are distributed evenly across
> racks, but we didn't ensure leader are evenly distributed across nodes. in
> the test above, we have 4 node: 1 2 3 4, and create 4 partitions but the
> leaders areĀ 1 2 1 2. while in zk mode, this is ensured, see
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-36+Rack+aware+replica+assignment
--
This message was sent by Atlassian Jira
(v8.20.10#820010)