[jira] [Commented] (KAFKA-188) Support multiple data directories

Jun Rao (JIRA) Tue, 30 Oct 2012 07:54:17 -0700

    [ 
https://issues.apache.org/jira/browse/KAFKA-188?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13486921#comment-13486921
 ]


Jun Rao commented on KAFKA-188:
-------------------------------

Our system tests fail with the latest patch. 

python -B system_test_runner.py 2>&1 | tee test.out

Saw the following in broker log.

[2012-10-30 07:40:19,682] FATAL Fatal error during KafkaServerStable startup. 
Prepare to shutdown (kafka.server.KafkaServerStartable)
java.io.IOException: No such file or directory
        at java.io.UnixFileSystem.createFileExclusively(Native Method)
        at java.io.File.createNewFile(File.java:883)
        at kafka.utils.FileLock.<init>(FileLock.scala:12)
        at kafka.log.LogManager$$anonfun$10.apply(LogManager.scala:64)
        at kafka.log.LogManager$$anonfun$10.apply(LogManager.scala:64)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
        at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:206)
        at 
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
        at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:34)
        at scala.collection.TraversableLike$class.map(TraversableLike.scala:206)
        at scala.collection.mutable.ArrayOps.map(ArrayOps.scala:34)
        at kafka.log.LogManager.<init>(LogManager.scala:64)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:60)
        at 
kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:34)
        at kafka.Kafka$.main(Kafka.scala:46)
        at kafka.Kafka.main(Kafka.scala)
[2012-10-30 07:40:19,683] INFO [Kafka Server 1], shutting down 
(kafka.server.KafkaServer)

                
> Support multiple data directories
> ---------------------------------
>
>                 Key: KAFKA-188
>                 URL: https://issues.apache.org/jira/browse/KAFKA-188
>             Project: Kafka
>          Issue Type: New Feature
>            Reporter: Jay Kreps
>         Attachments: KAFKA-188.patch, KAFKA-188-v2.patch, KAFKA-188-v3.patch, 
> KAFKA-188-v4.patch, KAFKA-188-v5.patch, KAFKA-188-v6.patch
>
>
> Currently we allow only a single data directory. This means that a multi-disk 
> configuration needs to be a RAID array or LVM volume or something like that 
> to be mounted as a single directory.
> For a high-throughput low-reliability configuration this would mean RAID0 
> striping. Common wisdom in Hadoop land has it that a JBOD setup that just 
> mounts each disk as a separate directory and does application-level balancing 
> over these results in about 30% write-improvement. For example see this claim 
> here:
>   http://old.nabble.com/Re%3A-RAID-vs.-JBOD-p21466110.html
> It is not clear to me why this would be the case--it seems the RAID 
> controller should be able to balance writes as well as the application so it 
> may depend on the details of the setup.
> Nonetheless this would be really easy to implement, all you need to do is add 
> multiple data directories and balance partition creation over these disks.
> One problem this might cause is if a particular topic is much larger than the 
> others it might unbalance the load across the disks. The partition->disk 
> assignment policy should probably attempt to evenly spread each topic to 
> avoid this, rather than just trying keep the number of partitions balanced 
> between disks.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (KAFKA-188) Support multiple data directories

Reply via email to