Improve the disk utilization of HDFS
------------------------------------

                 Key: HDFS-738
                 URL: https://issues.apache.org/jira/browse/HDFS-738
             Project: Hadoop HDFS
          Issue Type: Improvement
            Reporter: Zheng Shao


HDFS data node currently assigns writers to disks randomly. This is good if 
there are a large number of readers/writers on a single data node, but might 
create a lot of contentions if there are only 4 readers/writers on a 4-disk 
node.

A better way is to introduce a base class DiskHandler, for registering all disk 
operations (read/write), as well as getting the best disk for writing new 
blocks. A good strategy of the DiskHandler would be to distribute the load of 
the writes to the disks with more free spaces as well as less recent 
activities. There can be many strategies.

This could help improve the HDFS multi-threaded write throughput a lot - we are 
seeing <25MB/s/disk on a 4-disk/node 4-node cluster (replication is already 
considered) given 8 concurrent writers (24 writers considering replication). I 
believe we can improve that to 2x.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to