Re: HBase reading performance

Jean-Daniel Cryans Mon, 01 Mar 2010 19:17:38 -0800

Alvin,

That feature doesn't exist currently and I don't see a nice way of
doing it as those regions will change location over time (tho on a
normal production system it shouldn't vary that much). But, someone
motivated could do the following:


- Have a new method in HConnectionManager.TableServers that dumps the
content of its cachedRegionLocations map to a specified table/row
key/fam:qual
- Have another method that's able to load that same content from the
specified location.

That's hacky but I'm pretty sure it wouldn't be that hard to do.
Another way of doing it would be to get a public getter for
cachedRegionLocations and use the normal HTable to do the exact same
thing.

J-D

2010/3/1 Alvin C.L Huang <alvincl.hu...@gmail.com>:
> @J-D
> I like the idea of 'warm up'.
>
> I wondered whether it is possible to clone client caches across JVMs.
> (A cache of hot regions or a cache of a running job)
>
> --
> Alvin C.-L., Huang / 黃俊龍
> ATC, ICL, ITRI, Taiwan
> T: 886-3-59-14625
> 本信件可能包含工研院機密資訊，非指定之收件者，請勿使用或揭露本信件內容，並請銷毀此信件。
> This email may contain confidential information. Please do not use or
> disclose it in any way and delete it if you are not the intended recipient.
>
>
> On 2 March 2010 10:00, Jean-Daniel Cryans <jdcry...@apache.org> wrote:
>
>> Ah I understand now, thanks for the context. So I interpreted your
>> first test wrong, you are just basically hitting .META. with a lot of
>> random reads with lots of clients that have completely empty caches
>> when the test begins.
>>
>> So here you hit some pain points we have currently WRT random reads
>> but first, I'd like to point out that HBase isn't your typical RDBMS
>> where you can just point the machine to read from and be done with it.
>> Here the client has to figure the region locations by itself doing
>> location discovery using the .META. table. Normally that would be fast
>> but a couple of issues are slowing down concurrent reads on hot rows:
>>
>> We don't use pread: https://issues.apache.org/jira/browse/HBASE-2180
>> Random reading locks whole rows (among other stuff) that will be fixed
>> in: https://issues.apache.org/jira/browse/HBASE-2248
>> Reading from .META. is really slowed down when it has more than 1
>> store file: https://issues.apache.org/jira/browse/HBASE-2175
>>
>> The first two are actively being worked on, the third still needs
>> investigation and may be just a symptom.
>>
>> What this means for you is that, if possible, you should try to reuse
>> JVMs across jobs in order to use warmed up caches. For example, do the
>> same test but call the same code twice and you should see each new
>> HTable be really faster in the second batch.
>>
>> Another option would be to implement a new feature in the HBase client
>> that warms it up using scanners (I think there's a jira about it).
>>
>> J-D
>>
>

Re: HBase reading performance

Reply via email to