Well you don't have to do this often, just try it once to see. You can do it in the HBase shell:
major_compact '.META.' And it takes 3-4 seconds. J-D On Tue, Oct 6, 2009 at 12:31 PM, Adam Silberstein <[email protected]> wrote: > Hi J-D, > Thanks for the tips. Tweaking the multiplier looks easy enough. I'm not > sure how to force a major compaction. If from M/R, does that mean you did it > with the HDFS/Hadoop API? Any guess on how long that major compaction takes? > Just wondering what it does to availability. > > Thanks, > Adam > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of Jean-Daniel > Cryans > Sent: Tuesday, October 06, 2009 9:11 AM > To: [email protected] > Subject: Re: random read/write performance > > Adam, > > Few thoughts: > > - Do you use LZO? > - Instead of disabling the WAL, try first tweaking the safety net > that's in place. For example, setting > hbase.regionserver.logroll.multiplier to 1.5 or even higher will make > it roll less often. The current value of 0.95 means you roll every > ~62MB inserted in a regionserver. You can also set > hbase.regionserver.maxlogs to something higher than 32 like 64. > - We flush the .META. table very very often and this results, > sometimes after a big upload, in a lot of store files. Once I > force-major compacted it during a MR job and speed went 500% faster > because of the contention of all the clients. > > J-D > > On Tue, Oct 6, 2009 at 11:59 AM, Adam Silberstein > <[email protected]> wrote: >> Hi, >> >> Just wanted to give a quick update on our HBase benchmarking efforts at >> Yahoo. The basic use case we're looking at is: >> >> 1K records >> >> 20GB of records per node (and 6GB of memory per node, so data is not >> memory resident) >> >> Workloads that do random reads/writes (e.g. 95% reads, 5% writes). >> >> Multiple clients doing the reads/writes (i.e. 50-200) >> >> Measure throughput vs. latency, and see how high we can push the >> throughput. >> >> Note that although we want to see where throughput maxes out, the >> workload is random, rather than scan-oriented. >> >> >> >> I've been tweaking our HBase installation based on advice I've >> read/gotten from a few people. Currently, I'm running 0.20.0, have heap >> size set to 6GB per server, and have iCMS off. I'm still using the REST >> server instead of the java client. We're about to move our benchmarking >> tool to java, so at that point we can use the java API. At that point, >> I want to turn off WAL as well. If anyone has more suggestions for this >> workload (either things to try while still using REST, or things to try >> once I have a java client), please let me know. >> >> >> >> Given all that, I'm currently seeing maximal throughput of about 300 >> ops/sec/server. Has anyone with a similar disk-resident and random >> workload seen drastically different numbers, or guesses for what I can >> expect with the java client? >> >> >> >> Thanks! >> >> Adam >> >> >
