[
https://issues.apache.org/jira/browse/KAFKA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Georgy updated KAFKA-12900:
---------------------------
Attachment: KAFKA-12900.patch
> JBOD: Partitions count calculation does not take into account topic name
> ------------------------------------------------------------------------
>
> Key: KAFKA-12900
> URL: https://issues.apache.org/jira/browse/KAFKA-12900
> Project: Kafka
> Issue Type: Bug
> Components: core, jbod
> Affects Versions: 2.8.0
> Reporter: Georgy
> Priority: Major
> Attachments: KAFKA-12900.patch
>
>
> In [KAFKA-188|https://issues.apache.org/jira/browse/KAFKA-188] multiple data
> directories support was implemented. New partitions are spread to multiple
> log dirs based on partitions count calculation, log dir with least partitions
> count is selected as next dir.
> The problem exists because we do not take into account topic names when we do
> such calculations. As a result some "fat" partitions can be located on fewer
> disks than they should be.
> Example:
> Fat topic "F" with partitions: F1, F2, ... , F6
> Thin topic "t" with partitions: t1, t2, ... , t6
> Log dirs on broker: dir1, dir2, dir3
> What we have now in some cases:
> dir1: t1 t2 t4 t6
> dir2: F1 F3 F4 F5
> dir3: F2 t3 t5 F6
> There is a skew but in terms of partition calculation it is "balanced"
> because all of the log dirs have the same partition count.
> It would be better if we count partitions in all log dirs for the current
> topic which partition is going to be written. And then log dir with least
> partitions count for that topic should be the next one. As a result
> partitions from example above could be spread like this:
> dir1: t1 F1 t6 F6
> dir2: F2 t2 t4 F4
> dir3: F3 t3 t5 F5
> In my case there will be no skew because the producer's partitioner is "round
> robin" by default and partition sizes are the same.
> I've prepared a patch, please check it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)