Hey Todd, by saying that HDFS is able to read just small byte ranges, are talking about the capability described in the Bigtable original paper? I mean the ability to read just part of a compressed SSTable Block and using it in a block cache type of way. Thanks.
Renato M. 2010/4/26 Todd Lipcon <t...@cloudera.com> > On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey <ghend...@decarta.com> > wrote: > > > Let me preface this by saying that you all know much better than I do > what > > is best. I'm very impressed by what you've done, and so this isn't > > criticism. Far from it. It's just curiosity. > > > > Memory indexes are "decent", because while they are fast, they don't > scale. > > At some point you run out of RAM. Are you implementing an LRU cache? > Since > > the table is orders of magnitude larger than the memory available on any > > region server (even accounting for the fact that a region server needs to > > cache only its "shard") it's hard to understand how I could support 100% > > cache hit rate for a TB-sized table and a reasonable number of region > > servers. > > > > When you get a cache miss, and you almost always will when the table is > > orders of magnitude larger than the cache, you need to read a whole block > > out of HDFS. > > > > This is a common misconception about HDFS. There's no need to read an > entire > HDFS block at a time. Although the blocks may be 64MB+, you can certainly > read very small byte ranges, and that's exactly what HBase does. > > For a more efficient method of accessing local data blocks, I did some > initial experimentation in HDFS-347, but the speedup was not an order of > magnitude. > > -Todd > > > > > > My thought with memory mapping was, as you noted, *not* to try to map > files > > that are inside of HDFS but rather to copy as many blocks as possible out > of > > HDFS, onto region server filesystems, and memory map the file on the > region > > server. TB drives are now common. The virtual memory system of the > Operating > > System manages paging in and out of "real" memory off disk when you use > > memory mapping. My experience with memory mapped ByteBuffer in Java is > that > > it is very fast and scalable. By fast, I mean I have clocked reads in the > > microseconds using nanotime. So I was just wondering why you wouldn't at > > least make a 2nd level cache with memory mapping. > > > > -geoff > > > > -----Original Message----- > > From: Ryan Rawson [mailto:ryano...@gmail.com] > > Sent: Monday, April 26, 2010 1:24 PM > > To: hbase-user@hadoop.apache.org > > Subject: Re: optimizing for random access > > > > HFile uses in memory indexes to only need 1 seek to access data. How is > > this only "decent" ? > > > > As for memory mapped files, given that HDFS files are not local, we can't > > mmap() them. However HBase does block caching in memory to reduce the > trips > > to HDFS. > > > > -ryan > > > > > > > > On Mon, Apr 26, 2010 at 11:33 AM, Geoff Hendrey <ghend...@decarta.com> > > wrote: > > > Hi, > > > > > > Any pointers on how to optimize hbase for random access? My > > > understanding is that HFile is decent at random access. Why doesn't it > > > use memory mapped I/O? (my reading on it indicated it uses "something > > > like NIO"). I'd like my entire table to be distributed across region > > > servers, so that random reads are quickly served by a region server > > > without having to transfer a block from HDFS. Is this the right > > > approach? I would have thought that some sort of memory-mapped region > > > file would be perfect for this. Anyway, just looking to understand the > > > best practice(s). > > > > > > > > > -geoff > > > > > > > > > > > > > > > -- > Todd Lipcon > Software Engineer, Cloudera >