[jira] [Updated] (KAFKA-188) Support multiple data directories

Jay Kreps (JIRA) Mon, 29 Oct 2012 15:12:14 -0700

     [ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Jay Kreps updated KAFKA-188:
----------------------------

    Attachment: KAFKA-188-v6.patch

Okay this patch addresses Jun's comments:

50. The zeros are actually correct, basically I am initializing to 0 and ten 
overwriting with the count if there is one. The goal is to ensure that there is 
an entry for each directory even if it has no logs (otherwise it would never 
get any logs assigned). It is possible to do this with some kind of case 
statement, but I think this is more readable.

51. Okay I used the logic above. The logic is now slightly different from what 
was there before. Now I filter any partition which has no replica from the 
file. I also filter any replica which has no log, though my understanding is 
that that shouldn't happen.

52. Left this. The idea is that previously for tests you could do new 
Properties so it makes sense to be able to do new VerifiableProperties. Not 
essential so happy either way.
                
> Support multiple data directories
> ---------------------------------
>
>                 Key: KAFKA-188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-188
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>         Attachments: KAFKA-188.patch, KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
> KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch
>
>
> Currently we allow only a single data directory. This means that a multi-disk 
> configuration needs to be a RAID array or LVM volume or something like that 
> to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0 
> striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
> mounts each disk as a separate directory and does application-level balancing 
> over these results in about 30% write-improvement. For example see this claim 
> here:
>   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID 
> controller should be able to balance writes as well as the application so it 
> may depend on the details of the setup.
> Nonetheless this would be really easy to implement, all you need to do is add 
> multiple data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the 
> others it might unbalance the load across the disks. The partition->disk 
> assignment policy should probably attempt to evenly spread each topic to 
> avoid this, rather than just trying keep the number of partitions balanced 
> between disks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (KAFKA-188) Support multiple data directories

Reply via email to