[jira] Created: (HADOOP-4801) DFS read performance suboptimal when client co-located on nodes with data

George Porter (JIRA) Sun, 07 Dec 2008 11:36:37 -0800

DFS read performance suboptimal when client co-located on nodes with data
-------------------------------------------------------------------------


                 Key: HADOOP-4801
                 URL: https://issues.apache.org/jira/browse/HADOOP-4801
             Project: Hadoop Core
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.19.0
            Reporter: George Porter


One of the major strategies Hadoop uses to get scalable data processing is to 
move the code to the data.  However, putting the DFS client on the same 
physical node as the data blocks it acts on doesn't improve read performance as 
much as expected.

After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
is due to the HDFS streaming protocol causing many more read I/O operations 
(iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB disk 
block from the DataNode process (running in a separate JVM) running on the same 
machine.  The DataNode will satisfy the single disk block request by sending 
data back to the HDFS client in 64-KB chunks.  In BlockSender.java, this is 
done in the sendChunk() method, relying on Java's transferTo() method.  
Depending on the host O/S and JVM implementation, transferTo() is implemented 
as either a sendfilev() syscall or a pair of mmap() and write().  In either 
case, each chunk is read from the disk by issuing a separate I/O operation for 
each chunk.  The result is that the single request for a 64-MB block ends up 
hitting the disk as over a thousand smaller requests for 64-KB each.

Since the DFSClient runs in a different JVM and process than the DataNode, 
shuttling data from the disk to the DFSClient also results in context switches 
each time network packets get sent (in this case, the 64-kb chunk turns into a 
large number of 1500 byte packet send operations).  Thus we see a large number 
of context switches for each block send operation.

I'd like to get some feedback on the best way to address this, but I think 
providing a mechanism for a DFSClient to directly open data blocks that happen 
to be on the same machine.  It could do this by examining the set of 
LocatedBlocks returned by the NameNode, marking those that should be resident 
on the local host.  Since the DataNode and DFSClient (probably) share the same 
hadoop configuration, the DFSClient should be able to find the files holding 
the block data, and it could directly open them and send data back to the 
client.  This would avoid the context switches imposed by the network layer, 
and would allow for much larger read buffers than 64KB, which should reduce the 
number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Created: (HADOOP-4801) DFS read performance suboptimal when client co-located on nodes with data

Reply via email to