[
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14201821#comment-14201821
]
chenshangan edited comment on KAFKA-188 at 11/7/14 9:16 AM:
------------------------------------------------------------
assign a new partition to the dir with the fewest partitions, it works fine if
all of the partitions have most or less the same size. But if partition size
varies quite a lot, it will cause disk usage imbalance. So it's better to take
the disk usage into account.
was (Author: [email protected]):
assign a new partition to the dir with the fewest partitions, it works fine if
all of the partitions have most or less the same size. But if partition size
varies quite a lot, it will cause disk usage imbalance. So it's better to take
the disk usage into acount.
> Support multiple data directories
> ---------------------------------
>
> Key: KAFKA-188
> URL: https://issues.apache.org/jira/browse/KAFKA-188
> Project: Kafka
> Issue Type: New Feature
> Reporter: Jay Kreps
> Assignee: Jay Kreps
> Fix For: 0.8.0
>
> Attachments: KAFKA-188-v2.patch, KAFKA-188-v3.patch,
> KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch,
> KAFKA-188-v7.patch, KAFKA-188-v8.patch, KAFKA-188.patch
>
>
> Currently we allow only a single data directory. This means that a multi-disk
> configuration needs to be a RAID array or LVM volume or something like that
> to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0
> striping. Common wisdom in Hadoop land has it that a JBOD setup that just
> mounts each disk as a separate directory and does application-level balancing
> over these results in about 30% write-improvement. For example see this claim
> here:
> http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID
> controller should be able to balance writes as well as the application so it
> may depend on the details of the setup.
> Nonetheless this would be really easy to implement, all you need to do is add
> multiple data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the
> others it might unbalance the load across the disks. The partition->disk
> assignment policy should probably attempt to evenly spread each topic to
> avoid this, rather than just trying keep the number of partitions balanced
> between disks.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)