https://issues.apache.org/jira/browse/HBASE-5479
JD - apologies if that was unrelated to your email On Sat, Feb 25, 2012 at 4:03 PM, Matt Corgan <[email protected]> wrote: > Yeah. You would also want a mechanism to prevent queuing the same CF > multiple times, and probably want the completion of one compaction to > trigger a check to see if it should queue another. > > A possibly different architecture than the current style of queues would > be to have each Store (all open in memory) keep a compactionPriority score > up to date after events like flushes, compactions, schema changes, etc. > Then you create a "CompactionPriorityComparator implements > Comparator<Store>" and stick all the Stores into a PriorityQueue. The > async compaction threads would keep pulling off the head of that queue as > long as the head has compactionPriority > X. > > > On Sat, Feb 25, 2012 at 3:44 PM, lars hofhansl <[email protected]>wrote: > >> Interesting. So a compaction request would hold no information beyond the >> CF, really, >> but is just a promise to do a compaction as soon as possible. >> I agree with Ted, we should explore this in a jira. >> >> -- Lars >> >> >> ----- Original Message ----- >> From: Matt Corgan <[email protected]> >> To: [email protected] >> Cc: >> Sent: Saturday, February 25, 2012 3:18 PM >> Subject: Re: Follow-up to my HBASE-4365 testing >> >> I've been meaning to look into something regarding compactions for a while >> now that may be relevant here. It could be that this is already how it >> works, but just to be sure I'll spell out my suspicions... >> >> I did a lot of large uploads when we moved to .92. Our biggest dataset is >> time series data (partitioned 16 ways with a row prefix). The actual >> inserting and flushing went extremely quickly, and the parallel >> compactions >> were churning away. However, when the compactions inevitably started >> falling behind I noticed a potential problem. The compaction queue would >> get up to, say, 40, which represented, say, an hour's worth of requests. >> The problem was that by the time a compaction request started executing, >> the CompactionSelection that it held was terribly out of date. It was >> compacting a small selection (3-5) of the 50 files that were now there. >> Then the next request would compact another (3-5), etc, etc, until the >> queue was empty. It would have been much better if a CompactionRequest >> decided what files to compact when it got to the head of the queue. Then >> it could see that there are now 50 files needing compacting and to >> possibly >> compact the 30 smallest ones, not just 5. When the insertions were done >> after many hours, I would have preferred it to do one giant major >> compaction, but it sat there and worked through it's compaction queue >> compacting all sorts of different combinations of files. >> >> Said differently, it looks like .92 picks the files to compact at >> compaction request time rather than compaction execution time which is >> problematic when these times grow far apart. Is that the case? Maybe >> there are some other effects that are mitigating it... >> >> Matt >> >> On Sat, Feb 25, 2012 at 10:05 AM, Jean-Daniel Cryans <[email protected] >> >wrote: >> >> > Hey guys, >> > >> > So in HBASE-4365 I ran multiple uploads and the latest one I reported >> > was a 5TB import on 14 RS and it took 18h with Stack's patch. Now one >> > thing we can see is that apart from some splitting, there's a lot of >> > compacting going on. Stack was wondering exactly how much that IO >> > costs us, so we devised a test where we could upload 5TB with 0 >> > compactions. Here are the results: >> > >> > The table was pre-split with 14 regions, 1 per region server. >> > hbase.hstore.compactionThreshold=100 >> > hbase.hstore.blockingStoreFiles=110 >> > hbase.regionserver.maxlogs=64 (the block size is 128MB) >> > hfile.block.cache.size=0.05 >> > hbase.regionserver.global.memstore.lowerLimit=0.40 >> > hbase.regionserver.global.memstore.upperLimit=0.74 >> > export HBASE_REGIONSERVER_OPTS="$HBASE_JMX_BASE -Xmx14G >> > -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=256m >> > -XX:MaxNewSize=256m" >> > >> > The table had: >> > MAX_FILESIZE => '549755813888', MEMSTORE_FLUSHSIZE => '549755813888' >> > >> > Basically what I'm trying to do is to never block and almost always be >> > flushing. You'll probably notice the big difference between the lower >> > and upper barriers and think "le hell?", it's because it takes so long >> > to flush that you have to have enough room to take on more data while >> > this is happening (and we are able to flush faster than we take on >> > write). >> > >> > The test reports the following: >> > Wall time: 34984.083 s >> > Aggregate Throughput: 156893.07 queries/s >> > Aggregate Throughput: 160030935.29 bytes/s >> > >> > That's 2x faster than when we wait for compactions and splits, not too >> > bad but I'm pretty sure we can do better: >> > >> > - The QPS was very uneven, it seems that when it's flushing it takes >> > a big toll and queries drop to ~100k/s while the rest of the time it's >> > more like 200k/s. Need to figure out what's going there and if it's >> > really just caused by flush-related IO. >> > - The logs were rolling every 6 seconds and since this takes a global >> > write lock, I can see how we could be slowing down a lot across 14 >> > machines. >> > - The load was a bit uneven, I miscalculated my split points and the >> > last region always had 2-3k more queries per second. >> > >> > Stay tuned for more. >> > >> > J-D >> > >> >> >
