[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12738565#action_12738565 ]
Raghu Angadi commented on HDFS-516: ----------------------------------- Jay, random read is an (increasingly more) important feature for HDFS to support. Currently latency is the biggest draw back. See HDFS-236. It is good to see your work on this. You could also run simple benchmark in HDFS-236 that does simple random read on a file and does not depend on a sequence file. >From your architecture description this reduces the latency through following >improvements : * Connection caching (Through RPC). * File Channel caching on Server * Local cache on the client. These are complementary to existing datanode. I might be a lot more simpler to add these features to existing implementation rather than requiring a user to choose an implementation based on the access. As such you will have to re-implement many features (BlockLocations on client, CRC verification, effcient bulk transfers AVRO-24, etc ) > Low Latency distributed reads > ----------------------------- > > Key: HDFS-516 > URL: https://issues.apache.org/jira/browse/HDFS-516 > Project: Hadoop HDFS > Issue Type: New Feature > Reporter: Jay Booth > Priority: Minor > Attachments: radfs.patch > > Original Estimate: 168h > Remaining Estimate: 168h > > I created a method for low latency random reads using NIO on the server side > and simulated OS paging with LRU caching and lookahead on the client side. > Some applications could include lucene searching (term->doc and doc->offset > mappings are likely to be in local cache, thus much faster than nutch's > current FsDirectory impl and binary search through record files (bytes at > 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.