Re: Follow-up to my HBASE-4365 testing

Matt Corgan Sat, 25 Feb 2012 16:52:07 -0800

https://issues.apache.org/jira/browse/HBASE-5479


JD - apologies if that was unrelated to your email


On Sat, Feb 25, 2012 at 4:03 PM, Matt Corgan <[email protected]> wrote:

> Yeah.  You would also want a mechanism to prevent queuing the same CF
> multiple times, and probably want the completion of one compaction to
> trigger a check to see if it should queue another.
>
> A possibly different architecture than the current style of queues would
> be to have each Store (all open in memory) keep a compactionPriority score
> up to date after events like flushes, compactions, schema changes, etc.
>  Then you create a "CompactionPriorityComparator implements
> Comparator<Store>" and stick all the Stores into a PriorityQueue.  The
> async compaction threads would keep pulling off the head of that queue as
> long as the head has compactionPriority > X.
>
>
> On Sat, Feb 25, 2012 at 3:44 PM, lars hofhansl <[email protected]>wrote:
>
>> Interesting. So a compaction request would hold no information beyond the
>> CF, really,
>> but is just a promise to do a compaction as soon as possible.
>> I agree with Ted, we should explore this in a jira.
>>
>> -- Lars
>>
>>
>> ----- Original Message -----
>> From: Matt Corgan <[email protected]>
>> To: [email protected]
>> Cc:
>> Sent: Saturday, February 25, 2012 3:18 PM
>> Subject: Re: Follow-up to my HBASE-4365 testing
>>
>> I've been meaning to look into something regarding compactions for a while
>> now that may be relevant here.  It could be that this is already how it
>> works, but just to be sure I'll spell out my suspicions...
>>
>> I did a lot of large uploads when we moved to .92.  Our biggest dataset is
>> time series data (partitioned 16 ways with a row prefix).  The actual
>> inserting and flushing went extremely quickly, and the parallel
>> compactions
>> were churning away.  However, when the compactions inevitably started
>> falling behind I noticed a potential problem.  The compaction queue would
>> get up to, say, 40, which represented, say, an hour's worth of requests.
>> The problem was that by the time a compaction request started executing,
>> the CompactionSelection that it held was terribly out of date.  It was
>> compacting a small selection (3-5) of the 50 files that were now there.
>> Then the next request would compact another (3-5), etc, etc, until the
>> queue was empty.  It would have been much better if a CompactionRequest
>> decided what files to compact when it got to the head of the queue.  Then
>> it could see that there are now 50 files needing compacting and to
>> possibly
>> compact the 30 smallest ones, not just 5.  When the insertions were done
>> after many hours, I would have preferred it to do one giant major
>> compaction, but it sat there and worked through it's compaction queue
>> compacting all sorts of different combinations of files.
>>
>> Said differently, it looks like .92 picks the files to compact at
>> compaction request time rather than compaction execution time which is
>> problematic when these times grow far apart.  Is that the case?  Maybe
>> there are some other effects that are mitigating it...
>>
>> Matt
>>
>> On Sat, Feb 25, 2012 at 10:05 AM, Jean-Daniel Cryans <[email protected]
>> >wrote:
>>
>> > Hey guys,
>> >
>> > So in HBASE-4365 I ran multiple uploads and the latest one I reported
>> > was a 5TB import on 14 RS and it took 18h with Stack's patch. Now one
>> > thing we can see is that apart from some splitting, there's a lot of
>> > compacting going on. Stack was wondering exactly how much that IO
>> > costs us, so we devised a test where we could upload 5TB with 0
>> > compactions. Here are the results:
>> >
>> > The table was pre-split with 14 regions, 1 per region server.
>> > hbase.hstore.compactionThreshold=100
>> > hbase.hstore.blockingStoreFiles=110
>> > hbase.regionserver.maxlogs=64  (the block size is 128MB)
>> > hfile.block.cache.size=0.05
>> > hbase.regionserver.global.memstore.lowerLimit=0.40
>> > hbase.regionserver.global.memstore.upperLimit=0.74
>> > export HBASE_REGIONSERVER_OPTS="$HBASE_JMX_BASE -Xmx14G
>> > -XX:CMSInitiatingOccupancyFraction=75 -XX:NewSize=256m
>> > -XX:MaxNewSize=256m"
>> >
>> > The table had:
>> >  MAX_FILESIZE => '549755813888', MEMSTORE_FLUSHSIZE => '549755813888'
>> >
>> > Basically what I'm trying to do is to never block and almost always be
>> > flushing. You'll probably notice the big difference between the lower
>> > and upper barriers and think "le hell?", it's because it takes so long
>> > to flush that you have to have enough room to take on more data while
>> > this is happening (and we are able to flush faster than we take on
>> > write).
>> >
>> > The test reports the following:
>> > Wall time: 34984.083 s
>> > Aggregate Throughput: 156893.07 queries/s
>> > Aggregate Throughput: 160030935.29 bytes/s
>> >
>> > That's 2x faster than when we wait for compactions and splits, not too
>> > bad but I'm pretty sure we can do better:
>> >
>> >  - The QPS was very uneven, it seems that when it's flushing it takes
>> > a big toll and queries drop to ~100k/s while the rest of the time it's
>> > more like 200k/s. Need to figure out what's going there and if it's
>> > really just caused by flush-related IO.
>> >  - The logs were rolling every 6 seconds and since this takes a global
>> > write lock, I can see how we could be slowing down a lot across 14
>> > machines.
>> >  - The load was a bit uneven, I miscalculated my split points and the
>> > last region always had 2-3k more queries per second.
>> >
>> > Stay tuned for more.
>> >
>> > J-D
>> >
>>
>>
>

Re: Follow-up to my HBASE-4365 testing

Reply via email to