[jira] [Updated] (KAFKA-12900) JBOD: Partitions count calculation does not take into account topic name

Georgy (Jira) Sat, 05 Jun 2021 17:02:04 -0700


     [ 
https://issues.apache.org/jira/browse/KAFKA-12900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Georgy updated KAFKA-12900:
---------------------------
    Attachment: KAFKA-12900.patch

> JBOD: Partitions count calculation does not take into account topic name
> ------------------------------------------------------------------------
>
>                 Key: KAFKA-12900
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12900
>             Project: Kafka
>          Issue Type: Bug
>          Components: core, jbod
>    Affects Versions: 2.8.0
>            Reporter: Georgy
>            Priority: Major
>         Attachments: KAFKA-12900.patch
>
>
> In [KAFKA-188|https://issues.apache.org/jira/browse/KAFKA-188] multiple data 
> directories support was implemented. New partitions are spread to multiple 
> log dirs based on partitions count calculation, log dir with least partitions 
> count is selected as next dir.
> The problem exists because we do not take into account topic names when we do 
> such calculations. As a result some "fat" partitions can be located on fewer 
> disks than they should be.
> Example:
> Fat topic "F" with partitions:  F1,  F2, ... , F6
> Thin topic "t" with partitions:  t1,  t2, ... ,  t6
> Log dirs on broker: dir1, dir2, dir3
> What we have now in some cases:
> dir1: t1  t2  t4  t6 
> dir2: F1 F3 F4 F5
> dir3: F2 t3  t5 F6
> There is a skew but in terms of partition calculation it is "balanced" 
> because all of the log dirs have the same partition count.
> It would be better if we count partitions in all log dirs for the current 
> topic which partition is going to be written. And then log dir with least 
> partitions count for that topic should be the next one. As a result 
> partitions from example above could be spread like this:
> dir1:  t1   F1  t6  F6
> dir2: F2  t2  t4  F4
> dir3: F3  t3  t5  F5
> In my case there will be no skew because the producer's partitioner is "round 
> robin" by default and partition sizes are the same.
> I've prepared a patch, please check it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Updated] (KAFKA-12900) JBOD: Partitions count calculation does not take into account topic name

Reply via email to