[ https://issues.apache.org/jira/browse/KAFKA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Georgy updated KAFKA-12900: --------------------------- Attachment: KAFKA-12900.patch > JBOD: Partitions count calculation does not take into account topic name > ------------------------------------------------------------------------ > > Key: KAFKA-12900 > URL: https://issues.apache.org/jira/browse/KAFKA-12900 > Project: Kafka > Issue Type: Bug > Components: core, jbod > Affects Versions: 2.8.0 > Reporter: Georgy > Priority: Major > Attachments: KAFKA-12900.patch > > > In [KAFKA-188|https://issues.apache.org/jira/browse/KAFKA-188] multiple data > directories support was implemented. New partitions are spread to multiple > log dirs based on partitions count calculation, log dir with least partitions > count is selected as next dir. > The problem exists because we do not take into account topic names when we do > such calculations. As a result some "fat" partitions can be located on fewer > disks than they should be. > Example: > Fat topic "F" with partitions: F1, F2, ... , F6 > Thin topic "t" with partitions: t1, t2, ... , t6 > Log dirs on broker: dir1, dir2, dir3 > What we have now in some cases: > dir1: t1 t2 t4 t6 > dir2: F1 F3 F4 F5 > dir3: F2 t3 t5 F6 > There is a skew but in terms of partition calculation it is "balanced" > because all of the log dirs have the same partition count. > It would be better if we count partitions in all log dirs for the current > topic which partition is going to be written. And then log dir with least > partitions count for that topic should be the next one. As a result > partitions from example above could be spread like this: > dir1: t1 F1 t6 F6 > dir2: F2 t2 t4 F4 > dir3: F3 t3 t5 F5 > In my case there will be no skew because the producer's partitioner is "round > robin" by default and partition sizes are the same. > I've prepared a patch, please check it. -- This message was sent by Atlassian Jira (v8.3.4#803005)