[ 
https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jay Booth updated HDFS-516:
---------------------------

    Attachment: hdfs-516-20090831.patch

New patch, IPC server was too slow for IO operations (like 40 times slower than 
DFS without caching) so I wrote a custom ByteServer that's streamlined to avoid 
object creation or byte copying whenever possible and defaults to tcp nodelay.  
Client connections pool using commons-pool.  Uses static methods in 
hdfs.rad.ByteServiceProtocol for all serialization, faster than reflection.  On 
the laptop in pseudodistributed, I'm seeing 5X faster than DFS for random 
searches.

Refactored a bunch on the client side, eliminated a few redundant classes, 
still need to make lookahead happen via a separate thread in caching 
byteservice and tweak a couple things in ByteServer for performance, then this 
thing will be pretty fast.  I'm gonna run some numbers on EC2 tonight/tomorrow 
and see what I come up with.

Also, cleaned up unit tests to JUnit 4 and added some javadoc, probably missed 
a bunch of places and could certainly expand on all of it.  Haven't added 
license to the header of every file yet, license explicitly granted here, will 
get to that for next patch.  

> Low Latency distributed reads
> -----------------------------
>
>                 Key: HDFS-516
>                 URL: https://issues.apache.org/jira/browse/HDFS-516
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Jay Booth
>            Priority: Minor
>         Attachments: hdfs-516-20090824.patch, hdfs-516-20090831.patch, 
> radfs.patch
>
>   Original Estimate: 168h
>  Remaining Estimate: 168h
>
> I created a method for low latency random reads using NIO on the server side 
> and simulated OS paging with LRU caching and lookahead on the client side.  
> Some applications could include lucene searching (term->doc and doc->offset 
> mappings are likely to be in local cache, thus much faster than nutch's 
> current FsDirectory impl and binary search through record files (bytes at 
> 1/2, 1/4, 1/8 marks are likely to be cached)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to