[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12756214#action_12756214 ] Raghu Angadi commented on HDFS-516: --- When you get a change please point me to the streaming test/benchmark. bq. After I get those, my roadmap for this is to add checksum support and better DatanodeInfo caching. User groups would come after that. Unless you want to add checksums for better comparison, I don't think it is every essential. You need not spend much time on getting feature parity with HDFS. For more users to benefit from your work, I think it is better to extract the features that are complementary to HDFS. and we can work on getting those into HDFS. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755627#action_12755627 ] Raghu Angadi commented on HDFS-516: --- bq. somehow, from 213 seconds to 112 seconds to stream 1GB from a remote HDFS file. This is 5MBps for HDFS and 9MBps for RadFS. Assuming 9MBps is probably 100Mbps network limit (is it?), 5MBps is too low for any FS. Since both reads are from the same physical files, this may not be hardware related. Could you check what is causing this delay? This might be affecting other benchmarks as well. Checking netstat on the client while this read is going on might help. Regd reads in RAD fs, does client fetch 32KB each time (single RPC) or does it pipeline (multiple requests for single client's stream)? @Todd, I essentially see this as POC of what could/should be improved in HDFS for addressing latency issues. Contrib makes sense, but I would not expect this to go to production in this form and should be marked 'Experimental'. The benchmarks also help greatly in setting priorities for features. I don't think this needs a branch since it does not touch core at all. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755640#action_12755640 ] Todd Lipcon commented on HDFS-516: -- bq. I essentially see this as POC of what could/should be improved in HDFS for addressing latency issues. Contrib makes sense, but I would not expect this to go to production in this form and should be marked 'Experimental'. +1 Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755823#action_12755823 ] Jay Booth commented on HDFS-516: Yeah, I was puzzled by the performance too. I dug through the DFS code and I'm saving a bit on new socket and object creation, maybe a couple instructions here and there, but that shouldn't add up to 100 seconds for a gigabyte (approx 20 blocks). I'm calling read() a bajillion times in a row so it's conceivable (although unlikely) that I'm pegging the CPU and that's the limiting factor. I'm busy for a couple days but will get back to you with some figures from netstat, top and whatever else I can think of, along with another streaming case that works with read(b, off, len) to see if that changes things. I'll do a little more digging into DFS as well to see if I can isolate the cause. I definitely did run them several times on the same machine and another time on a different cluster with similar results, so it wasn't simply bad luck on the rack placement on EC2 (well maybe but unlikely). Will report back when I have more numbers. After I get those, my roadmap for this is to add checksum support and better DatanodeInfo caching. User groups would come after that. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755048#action_12755048 ] Todd Lipcon commented on HDFS-516: -- I haven't had a chance to look over the patch as of yet, but I have one concern: Is there a plan for deprecation in the event that HDFS itself achieves similar performance? I think having an entirely separate FS implementation that only differs in performance is not a good idea longterm. Using this contrib project as an experimentation ground sounds great, but I think long term we should focus on improving DistributedFileSystem's performance itself, and not bifurcate the code into the fast version that we dont really support because it's contrib and slow version that we do. I'll try to find a chance to look over the patch soon, but in the meantime do you have any thoughts on the above? Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755033#action_12755033 ] Raghu Angadi commented on HDFS-516: --- Hi Jay, will go through the patch. I hope a few others get a chance to look at it as well. Since it is contrib, it certainly makes it easier to include in trunk. I am not sure about 0.21 timeline. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755120#action_12755120 ] Jay Booth commented on HDFS-516: Hey Todd, in short, I agree, we should be looking at moving performance improvements over to the main FS implementation. Right now, my version doesn't support user permissions or checksumming. I'd say it makes sense to keep it in contrib as a sandbox for now, and work towards full compatibility with the main DFS implementation at which point we could consider swapping in the new reading subsystem? User permissioning would require some model changes but should be workable, checksumming probably won't be too bad if I read the code right. So, I suppose keep it in contrib as a sandbox initially with an explicit goal of moving it over to DFS when it reaches compatibility? It doesn't really lend itself to moving over piecemeal, as it has several components which all pretty much need each other. However, it's pretty well integrated with the DFS API and only replaces one method on the filesystem class. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12755121#action_12755121 ] Todd Lipcon commented on HDFS-516: -- Jay: that sounds good to me. Let's explicitly mark the API as Experimental and Likely to disappear in a future release with little or no warning - use at your own risk :) I think especially as security and authentication begin to take form in the next couple months it will be a headache to try to maintain this code. If branching in SVN weren't such a pain, I'd suggest we maintain this in a separate branch that didn't go into releases... c'est la vie ;-) Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090912.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12750246#action_12750246 ] Jay Booth commented on HDFS-516: I did some benchmarking, here are the results: Each test ran 1000 searches to warm, then 5000 searches to benchmark. Binary search of a 20GB sorted sequence file of 20 million 1kb records. Tests were run from the namenode in a 4-node EC2 medium cluster, 1.7 GB of ram each. 1 namenode and 3 datanodes. From HDFS to a 512MB cached RadFS there was a 4X average improvement in search times, from 102ms to 24ms. Each search was, theoretically, 24.25 reads (log 2 of 20 million). Not actually measured. I only ran each set once. The 90th percent line trends the right way, although the max line is a little spikey. I'll add a 99th % in future benchmarks. HDFS, baseline: Warming with 1000 searches Executed 5000 random searches with FS class org.apache.hadoop.hdfs.DistributedFileSystem Done, Search Times: Mean: 102.178415 Variance: 5939.660105461091 Median: 97.0 Max: 3095.0 Min: 33.0 90th pct: 130.0 Rad, no cache Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem Done, Search Times: Mean: 68.556402 Variance: 233.8335857571515 Median: 67.0 Max: 379.0 Min: 26.0 90th pct: 79.0 Rad, 16MB cache: Warming with 1000 searches Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem Done, Search Times: Mean: 42.0397985 Variance: 237.83818359671966 Median: 40.0 Max: 203.0 Min: 5.0 90th pct: 59.0 Rad, 128MB cache: Warming with 1000 searches Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem Done, Search Times: Mean: 29.8506007 Variance: 202.08189601920367 Median: 27.0 Max: 203.0 Min: 1.0 90th pct: 45.0 Rad, 512MB cache: Warming with 1000 searches Executed 5000 random searches with FS class org.apache.hadoop.hdfs.rad.RadFileSystem Done, Search Times: Mean: 24.2746014 Variance: 250.3052558911758 Median: 22.0 Max: 687.0 Min: 0.0 90th pct: 36.0 I could still shave a point or two by cleaning up my caching system to be more graceful with its lookahead mechanism, but not bad for now. I'll pretty it up and post a first attempt at a final patch soon. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: hdfs-516-20090824.patch, hdfs-516-20090831.patch, radfs.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738565#action_12738565 ] Raghu Angadi commented on HDFS-516: --- Jay, random read is an (increasingly more) important feature for HDFS to support. Currently latency is the biggest draw back. See HDFS-236. It is good to see your work on this. You could also run simple benchmark in HDFS-236 that does simple random read on a file and does not depend on a sequence file. From your architecture description this reduces the latency through following improvements : * Connection caching (Through RPC). * File Channel caching on Server * Local cache on the client. These are complementary to existing datanode. I might be a lot more simpler to add these features to existing implementation rather than requiring a user to choose an implementation based on the access. As such you will have to re-implement many features (BlockLocations on client, CRC verification, effcient bulk transfers AVRO-24, etc ) Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12738651#action_12738651 ] Jay Booth commented on HDFS-516: Wow, thanks Raghu, that's awesome and will save me a ton of time. A couple points for discussion: * The random 4k byte grabber is awesome and I will be using it as part of my benchmarking at the first opportunity, however I think it's worth also testing some likely applications to really show the strength of client-side caching. 10MB or so worth of properly warmed cache could mean your first 20 lookups in a binary search are almost-free, and having the frontmost 10% of a lucene index in cache will mean that almost all of the scoring portion of the search will be computed against local memory. Meanwhile, for truly random reads, having a cache that's, say, 5-10% of the size of the data will only get you a small improvement. So I'd like to get some numbers for use cases that really thrive on caching in addition to truly random access.But that will be extremely useful for tuning the IO layer and establishing a baseline for cache-miss performance, so thanks for the heads up. * I have a feeling that my implementation is significantly slower than the default when it comes to streaming, since it relies on successive, small positioned reads and a heavy memory footprint rather than a simple stream of bytes. Watching my unit tests run on my laptop with a ton of confounding factors sure seemed that way, although that's not a scientific measurement (one more item to benchmark). So while I agree with the urge for simplicity, I feel like we need to make that performance tradeoff clear. Otherwise, we could have a lot of very slow mapreduce jobs happening. Given that MapReduce is the primary use case for Hadoop, my instinct was to make RadFileSystem a non-default implementation. Point very well taken about the BlockLocations and CRC verification, maybe the best way to handle future integration with DataNode would be to develop separately, reuse as much code as possible and then when RadFileSystem is mature and benchmarked we can revisit a merge with DistributedFileSystem? Thanks again, I'll try and write a post later tonight with an explicit plan for benchmarking and then maybe people can comment and poke holes in it as they see fit? Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737713#action_12737713 ] Jay Booth commented on HDFS-516: Here's some architectural overview and a general request for comments on the matter, I'll be away and busy the next few days but should be able to get back to this in the middle of next week. The basic workflow is I created a RadFileSystem (RandomAccessDistributed FS) which wraps DistributedFileSystem and delegates to it for everything except for getFSDataInputStream. That returns a custom FSDataInputStream which wraps a CachingByteService which itself wraps a RadFSByteService. The caching byte services share a cache which is managed by the RadFSClient class (could maybe factor that away and put it in RadFileSystem instead). They try to hit the cache, and if they miss, they call the underlying RadFSClientByteService to get the requested page plus a few pages of lookahead. The RadFSClientByteService calls the namenode to get appropriate block locations (todo, cache these effectively) and then calls RadNode, which is embedded in DataNode via ServicePlugin and maintains an IPCServer and a set of FileChannels to the local blocks. On repeated requests for the same data, the RadFSClient tends to favor going to the same host, figuring that the benefit of hitting the DataNode's OS cache for the given bytes outweighs the penalty of hopping a rack in terms of reducing latency (untested assumption). The intended use case is pretty different from MapReduce so I think this should be a contrib module that has to be explicitly invoked by clients. It really underperforms DFS in terms of streaming but should (haven't tested extensively outside of localhost) significantly outperform it in terms of random reads. In terms of files with 'hot paths', such as lucene indices or binary search over a normal file, cache hit percentage is likely to be pretty high so it should probably perform pretty well. Currently, it makes a fresh request to the NameNode for every read, which is inefficient but more likely to be correct. Going forward, I'd like to tighten this up, make sure it plays nice with append and get it into a future Hadoop release. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737797#action_12737797 ] Tsz Wo (Nicholas), SZE commented on HDFS-516: - .., svn diff misses new files, ... For new files, run svn add /path/to/new/files before svn diff. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch, radfs.tgz Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737799#action_12737799 ] Jay Booth commented on HDFS-516: Ok, thanks, is one big patch preferred to the tiny patch + tarball? Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch, radfs.tgz Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
[jira] Commented: (HDFS-516) Low Latency distributed reads
[ https://issues.apache.org/jira/browse/HDFS-516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12737806#action_12737806 ] Tsz Wo (Nicholas), SZE commented on HDFS-516: - Ok, thanks, is one big patch preferred to the tiny patch + tarball? Yes. Otherwise, the automated build won't work. Low Latency distributed reads - Key: HDFS-516 URL: https://issues.apache.org/jira/browse/HDFS-516 Project: Hadoop HDFS Issue Type: New Feature Reporter: Jay Booth Priority: Minor Attachments: radfs.patch, radfs.tgz Original Estimate: 168h Remaining Estimate: 168h I created a method for low latency random reads using NIO on the server side and simulated OS paging with LRU caching and lookahead on the client side. Some applications could include lucene searching (term-doc and doc-offset mappings are likely to be in local cache, thus much faster than nutch's current FsDirectory impl and binary search through record files (bytes at 1/2, 1/4, 1/8 marks are likely to be cached) -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.