[ 
https://issues.apache.org/jira/browse/HADOOP-4801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12656701#action_12656701
 ] 

George Porter commented on HADOOP-4801:
---------------------------------------

Thanks for the comments.  I'm working on a macro benchmark with a much larger 
dataset now.  I think that some of the performance gains as the number of reads 
increases is due to reads hitting an already warm cache (that would explain why 
more reads == higher performance, even on larger sized files).  I definitely 
agree that adding additional datapaths is not desirable.  However, if local 
reads dominate and are the common case, then we may want to look into 
optimizing that common case.  Like I said, I'm not really attached to this 
approach (its mostly a hack) but wanted to at least get a sense of what gains 
might be possible as a point of reference, especially if we up the number of 
disks per core to the double digits (e.g., 16 disks per core).

In terms of security, wouldn't the ideal approach be to encrypt each datablock 
with its own key?  The namenode would keep that key as part of the HDFS 
metadata, and when an appropriately authorized client issues a 
getBlockLocations(), those keys are sent back to the client?  Then the 
DataNodes don't have to worry about security at all (that is enforced in the 
namenode, a logically centralized place).

What's nice about that too is that if the DataNodes were to be compromised, 
hacked, or just decommissioned, you wouldn't have to worry about leftover data 
floating around out there.  This would especially be true in virtual datacenter 
or datacenter on demand environments where machines are spun up due to peak 
loads, then released when no longer needed.

> DFS read performance suboptimal when client co-located on nodes with data
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-4801
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4801
>             Project: Hadoop Core
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.19.0
>            Reporter: George Porter
>         Attachments: HADOOP-4801.1.patch
>
>
> One of the major strategies Hadoop uses to get scalable data processing is to 
> move the code to the data.  However, putting the DFS client on the same 
> physical node as the data blocks it acts on doesn't improve read performance 
> as much as expected.
> After looking at Hadoop and O/S traces (via HADOOP-4049), I think the problem 
> is due to the HDFS streaming protocol causing many more read I/O operations 
> (iops) than necessary.  Consider the case of a DFSClient fetching a 64 MB 
> disk block from the DataNode process (running in a separate JVM) running on 
> the same machine.  The DataNode will satisfy the single disk block request by 
> sending data back to the HDFS client in 64-KB chunks.  In BlockSender.java, 
> this is done in the sendChunk() method, relying on Java's transferTo() 
> method.  Depending on the host O/S and JVM implementation, transferTo() is 
> implemented as either a sendfilev() syscall or a pair of mmap() and write().  
> In either case, each chunk is read from the disk by issuing a separate I/O 
> operation for each chunk.  The result is that the single request for a 64-MB 
> block ends up hitting the disk as over a thousand smaller requests for 64-KB 
> each.
> Since the DFSClient runs in a different JVM and process than the DataNode, 
> shuttling data from the disk to the DFSClient also results in context 
> switches each time network packets get sent (in this case, the 64-kb chunk 
> turns into a large number of 1500 byte packet send operations).  Thus we see 
> a large number of context switches for each block send operation.
> I'd like to get some feedback on the best way to address this, but I think 
> providing a mechanism for a DFSClient to directly open data blocks that 
> happen to be on the same machine.  It could do this by examining the set of 
> LocatedBlocks returned by the NameNode, marking those that should be resident 
> on the local host.  Since the DataNode and DFSClient (probably) share the 
> same hadoop configuration, the DFSClient should be able to find the files 
> holding the block data, and it could directly open them and send data back to 
> the client.  This would avoid the context switches imposed by the network 
> layer, and would allow for much larger read buffers than 64KB, which should 
> reduce the number of iops imposed by each read block operation.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to