On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey <ghend...@decarta.com> wrote:
> Let me preface this by saying that you all know much better than I do what > is best. I'm very impressed by what you've done, and so this isn't > criticism. Far from it. It's just curiosity. > > Memory indexes are "decent", because while they are fast, they don't scale. > At some point you run out of RAM. Are you implementing an LRU cache? Since > the table is orders of magnitude larger than the memory available on any > region server (even accounting for the fact that a region server needs to > cache only its "shard") it's hard to understand how I could support 100% > cache hit rate for a TB-sized table and a reasonable number of region > servers. > > When you get a cache miss, and you almost always will when the table is > orders of magnitude larger than the cache, you need to read a whole block > out of HDFS. > This is a common misconception about HDFS. There's no need to read an entire HDFS block at a time. Although the blocks may be 64MB+, you can certainly read very small byte ranges, and that's exactly what HBase does. For a more efficient method of accessing local data blocks, I did some initial experimentation in HDFS-347, but the speedup was not an order of magnitude. -Todd > > My thought with memory mapping was, as you noted, *not* to try to map files > that are inside of HDFS but rather to copy as many blocks as possible out of > HDFS, onto region server filesystems, and memory map the file on the region > server. TB drives are now common. The virtual memory system of the Operating > System manages paging in and out of "real" memory off disk when you use > memory mapping. My experience with memory mapped ByteBuffer in Java is that > it is very fast and scalable. By fast, I mean I have clocked reads in the > microseconds using nanotime. So I was just wondering why you wouldn't at > least make a 2nd level cache with memory mapping. > > -geoff > > -----Original Message----- > From: Ryan Rawson [mailto:ryano...@gmail.com] > Sent: Monday, April 26, 2010 1:24 PM > To: hbase-user@hadoop.apache.org > Subject: Re: optimizing for random access > > HFile uses in memory indexes to only need 1 seek to access data. How is > this only "decent" ? > > As for memory mapped files, given that HDFS files are not local, we can't > mmap() them. However HBase does block caching in memory to reduce the trips > to HDFS. > > -ryan > > > > On Mon, Apr 26, 2010 at 11:33 AM, Geoff Hendrey <ghend...@decarta.com> > wrote: > > Hi, > > > > Any pointers on how to optimize hbase for random access? My > > understanding is that HFile is decent at random access. Why doesn't it > > use memory mapped I/O? (my reading on it indicated it uses "something > > like NIO"). I'd like my entire table to be distributed across region > > servers, so that random reads are quickly served by a region server > > without having to transfer a block from HDFS. Is this the right > > approach? I would have thought that some sort of memory-mapped region > > file would be perfect for this. Anyway, just looking to understand the > > best practice(s). > > > > > > -geoff > > > > > > > -- Todd Lipcon Software Engineer, Cloudera