[ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12737713#action_12737713
 ] 

Jay Booth commented on HDFS-516:
--------------------------------

Here's some architectural overview and a general request for comments on the 
matter, I'll be away and busy the next few days but should be able to get back 
to this in the middle of next week.

The basic workflow is I created a RadFileSystem (RandomAccessDistributed FS) 
which wraps DistributedFileSystem and delegates to it for everything except for 
getFSDataInputStream.  That returns a custom FSDataInputStream which wraps a 
CachingByteService which itself wraps a RadFSByteService.  The caching byte 
services share a cache which is managed by the RadFSClient class (could maybe 
factor that away and put it in RadFileSystem instead).  They try to hit the 
cache, and if they miss, they call the underlying RadFSClientByteService to get 
the requested page plus a few pages of lookahead.  The RadFSClientByteService 
calls the namenode to get appropriate block locations (todo, cache these 
effectively) and then calls RadNode, which is embedded in DataNode via 
ServicePlugin and maintains an IPCServer and a set of FileChannels to the local 
blocks.  On repeated requests for the same data, the RadFSClient tends to favor 
going to the same host, figuring that the benefit of hitting the DataNode's OS 
cache for the given bytes outweighs the penalty of hopping a rack in terms of 
reducing latency (untested assumption).  

The intended use case is pretty different from MapReduce so I think this should 
be a contrib module that has to be explicitly invoked by clients.  It really 
underperforms DFS in terms of streaming but should (haven't tested extensively 
outside of localhost) significantly outperform it in terms of random reads.  In 
terms of files with 'hot paths', such as lucene indices or binary search over a 
normal file, cache hit percentage is likely to be pretty high so it should 
probably perform pretty well.  Currently, it makes a fresh request to the 
NameNode for every read, which is inefficient but more likely to be correct.  
Going forward, I'd like to tighten this up, make sure it plays nice with append 
and get it into a future Hadoop release.   

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to