Todd is right on here, usually on a system I get HDFS reads in hbase in the millisecond region. Disk caching on the Datanode can help somewhat.
Ultimately the real long pole that keeps things slow is disk seeks. I have some plans to use DirectByteBuffers to underly the LRU block cache, and it might be possible to use memory mapped to get a kind of paging effect, but I'm not sure it would be worth it - you are just trading one disk (that is complete with checksums!) with another (that doesnt have checksums). On Mon, Apr 26, 2010 at 3:47 PM, Todd Lipcon <t...@cloudera.com> wrote: > On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey <ghend...@decarta.com> wrote: > >> Let me preface this by saying that you all know much better than I do what >> is best. I'm very impressed by what you've done, and so this isn't >> criticism. Far from it. It's just curiosity. >> >> Memory indexes are "decent", because while they are fast, they don't scale. >> At some point you run out of RAM. Are you implementing an LRU cache? Since >> the table is orders of magnitude larger than the memory available on any >> region server (even accounting for the fact that a region server needs to >> cache only its "shard") it's hard to understand how I could support 100% >> cache hit rate for a TB-sized table and a reasonable number of region >> servers. >> >> When you get a cache miss, and you almost always will when the table is >> orders of magnitude larger than the cache, you need to read a whole block >> out of HDFS. >> > > This is a common misconception about HDFS. There's no need to read an entire > HDFS block at a time. Although the blocks may be 64MB+, you can certainly > read very small byte ranges, and that's exactly what HBase does. > > For a more efficient method of accessing local data blocks, I did some > initial experimentation in HDFS-347, but the speedup was not an order of > magnitude. > > -Todd > > >> >> My thought with memory mapping was, as you noted, *not* to try to map files >> that are inside of HDFS but rather to copy as many blocks as possible out of >> HDFS, onto region server filesystems, and memory map the file on the region >> server. TB drives are now common. The virtual memory system of the Operating >> System manages paging in and out of "real" memory off disk when you use >> memory mapping. My experience with memory mapped ByteBuffer in Java is that >> it is very fast and scalable. By fast, I mean I have clocked reads in the >> microseconds using nanotime. So I was just wondering why you wouldn't at >> least make a 2nd level cache with memory mapping. >> >> -geoff >> >> -----Original Message----- >> From: Ryan Rawson [mailto:ryano...@gmail.com] >> Sent: Monday, April 26, 2010 1:24 PM >> To: hbase-user@hadoop.apache.org >> Subject: Re: optimizing for random access >> >> HFile uses in memory indexes to only need 1 seek to access data. How is >> this only "decent" ? >> >> As for memory mapped files, given that HDFS files are not local, we can't >> mmap() them. However HBase does block caching in memory to reduce the trips >> to HDFS. >> >> -ryan >> >> >> >> On Mon, Apr 26, 2010 at 11:33 AM, Geoff Hendrey <ghend...@decarta.com> >> wrote: >> > Hi, >> > >> > Any pointers on how to optimize hbase for random access? My >> > understanding is that HFile is decent at random access. Why doesn't it >> > use memory mapped I/O? (my reading on it indicated it uses "something >> > like NIO"). I'd like my entire table to be distributed across region >> > servers, so that random reads are quickly served by a region server >> > without having to transfer a block from HDFS. Is this the right >> > approach? I would have thought that some sort of memory-mapped region >> > file would be perfect for this. Anyway, just looking to understand the >> > best practice(s). >> > >> > >> > -geoff >> > >> > >> > >> > > > > -- > Todd Lipcon > Software Engineer, Cloudera >