[jira] Commented: (HDFS-516) Low Latency distributed reads

Raghu Angadi (JIRA) Mon, 03 Aug 2009 13:31:38 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738565#action_12738565
 ]


Raghu Angadi commented on HDFS-516:
-----------------------------------

Jay, random read is an (increasingly more) important feature for HDFS to 
support. Currently latency is the biggest draw back. See HDFS-236. It is good 
to see your work on this. You could also run simple benchmark in HDFS-236 that 
does simple random read on a file and does not depend on a sequence file.

>From your architecture description this reduces the latency through following 
>improvements :
  
   * Connection caching (Through RPC).
   * File Channel  caching on Server
   * Local cache on the client.

These are complementary to existing datanode. I might be a lot more simpler to 
add these features to existing implementation rather than requiring a user to 
choose an implementation based on the access. As such you will have to 
re-implement many features (BlockLocations on client, CRC verification, 
effcient bulk transfers AVRO-24, etc )




> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HDFS-516) Low Latency distributed reads

Reply via email to