Re: optimizing for random access

Ryan Rawson Mon, 26 Apr 2010 16:00:50 -0700

Todd is right on here, usually on a system I get HDFS reads in hbase
in the millisecond region.  Disk caching on the Datanode can help
somewhat.


Ultimately the real long pole that keeps things slow is disk seeks.

I have some plans to use DirectByteBuffers to underly the LRU block
cache, and it might be possible to use memory mapped to get a kind of
paging effect, but I'm not sure it would be worth it - you are just
trading one disk (that is complete with checksums!) with another (that
doesnt have checksums).




On Mon, Apr 26, 2010 at 3:47 PM, Todd Lipcon <t...@cloudera.com> wrote:
> On Mon, Apr 26, 2010 at 3:36 PM, Geoff Hendrey <ghend...@decarta.com> wrote:
>
>> Let me preface this by saying that you all know much better than I do what
>> is best. I'm very impressed by what you've done, and so this isn't
>> criticism. Far from it. It's just curiosity.
>>
>> Memory indexes are "decent", because while they are fast, they don't scale.
>> At some point you run out of RAM. Are you implementing an LRU cache? Since
>> the table is orders of magnitude larger than the memory available on any
>> region server (even accounting for the fact that a region server needs to
>> cache only its "shard") it's hard to understand how I could support 100%
>> cache hit rate for a TB-sized table and a reasonable number of region
>> servers.
>>
>> When you get a cache miss, and you almost always will when the table is
>> orders of magnitude larger than the cache, you need to read a whole block
>> out of HDFS.
>>
>
> This is a common misconception about HDFS. There's no need to read an entire
> HDFS block at a time. Although the blocks may be 64MB+, you can certainly
> read very small byte ranges, and that's exactly what HBase does.
>
> For a more efficient method of accessing local data blocks, I did some
> initial experimentation in HDFS-347, but the speedup was not an order of
> magnitude.
>
> -Todd
>
>
>>
>> My thought with memory mapping was, as you noted, *not* to try to map files
>> that are inside of HDFS but rather to copy as many blocks as possible out of
>> HDFS, onto region server filesystems, and memory map the file on the region
>> server. TB drives are now common. The virtual memory system of the Operating
>> System manages paging in and out of "real" memory off disk when you use
>> memory mapping. My experience with memory mapped ByteBuffer in Java is that
>> it is very fast and scalable. By fast, I mean I have clocked reads in the
>> microseconds using nanotime. So I was just wondering why you wouldn't at
>> least make a 2nd level cache with memory mapping.
>>
>> -geoff
>>
>> -----Original Message-----
>> From: Ryan Rawson [mailto:ryano...@gmail.com]
>> Sent: Monday, April 26, 2010 1:24 PM
>> To: hbase-user@hadoop.apache.org
>> Subject: Re: optimizing for random access
>>
>> HFile uses in memory indexes to only need 1 seek to access data.  How is
>> this only "decent" ?
>>
>> As for memory mapped files, given that HDFS files are not local, we can't
>> mmap() them.  However HBase does block caching in memory to reduce the trips
>> to HDFS.
>>
>> -ryan
>>
>>
>>
>> On Mon, Apr 26, 2010 at 11:33 AM, Geoff Hendrey <ghend...@decarta.com>
>> wrote:
>> > Hi,
>> >
>> > Any pointers on how to optimize hbase for random access? My
>> > understanding is that HFile is decent at random access. Why doesn't it
>> > use memory mapped I/O? (my reading on it indicated it uses "something
>> > like NIO").  I'd like my entire table to be distributed across region
>> > servers, so that random reads are quickly served by a region server
>> > without having to transfer a block from HDFS. Is this the right
>> > approach? I would have thought that some sort of memory-mapped region
>> > file would be perfect for this. Anyway, just looking to understand the
>> > best practice(s).
>> >
>> >
>> > -geoff
>> >
>> >
>> >
>>
>
>
>
> --
> Todd Lipcon
> Software Engineer, Cloudera
>

Re: optimizing for random access

Reply via email to