[
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jay Kreps updated KAFKA-188:
----------------------------
Attachment: KAFKA-188-v2.patch
Updated patch:
1. Split HWM file per data directory
2. Move to a "least partitions" partition assignment strategy
3. Add a unit test for the assignment strategy
I think I may have also fixed the transient failure in
LogManager.testTimeBasedFlush, though it remains a time-bomb due to its
reliance on the scheduler and wall-clock time.
One thing to think about is that the use of "least loaded" does have a few
corner cases of its own. In general it won't differ much from round robin. The
case where it will differ is the case where we add a new data directory to an
existing server or lose a single data directory on a server. In this case ALL
new partitions will be created in the empty data directory until it becomes
full. The problem this could create is that any new topics created during this
time period will have all partitions assigned to the empty data dir. This may
lead to imbalance of load. I think despite this, this strategy is better than
(1) round robin, (2) RAID, or (3) something more complicated we might think of
now.
This patch is ready for review.
> Support multiple data directories
> ---------------------------------
>
> Key: KAFKA-188
> URL: https://issues.apache.org/jira/browse/KAFKA-188
> Project: Kafka
> Issue Type: New Feature
> Reporter: Jay Kreps
> Attachments: KAFKA-188.patch, KAFKA-188-v2.patch
>
>
> Currently we allow only a single data directory. This means that a multi-disk
> configuration needs to be a RAID array or LVM volume or something like that
> to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0
> striping. Common wisdom in Hadoop land has it that a JBOD setup that just
> mounts each disk as a separate directory and does application-level balancing
> over these results in about 30% write-improvement. For example see this claim
> here:
> http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID
> controller should be able to balance writes as well as the application so it
> may depend on the details of the setup.
> Nonetheless this would be really easy to implement, all you need to do is add
> multiple data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the
> others it might unbalance the load across the disks. The partition->disk
> assignment policy should probably attempt to evenly spread each topic to
> avoid this, rather than just trying keep the number of partitions balanced
> between disks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira