Re: Major Compaction Concerns

Doug Meil Mon, 16 Jan 2012 14:40:26 -0800

re:  "A suggestion is perhaps to take all those example/explanation and
add them
to the book for future reference."


Absolutely!  I've been watching this thread with great interest.





On 1/14/12 4:30 PM, "Mikael Sitruk" <[email protected]> wrote:

>Wow, thank you very much for all those precious explanations, pointers and
>examples. It's a lot to ingest... I will try them (at least what i can
>with
>0.90.4 (yes i'm upgrading from 0.90.1 to 0.90.4)) and keep you informed.
>BTW I'm already using compression (GZ), the current data is randomized so
>I
>don't have so much gain as you mentioned ( i think i'm around 30% only).
>It seems that BF is one of the major thing i need to look up with the
>compaction.ratio, and i need a different setting for my different CF. (one
>CF has small set of column and each update will change 50% --> ROWCOL, the
>second CF has always a new column per update --> ROW)
>I'm not keeping more than one version neither, and you wrote this is not a
>point query.
>
>A suggestion is perhaps to take all those example/explanation and add them
>to the book for future reference.
>
>Regards,
>Mikael.S
>
>
>On Sat, Jan 14, 2012 at 4:06 AM, Nicolas Spiegelberg
><[email protected]>wrote:
>
>> >I'm sorry but i don't understand, of course i have a disk and network
>> >saturation and the flush stop to flush because he is waiting for
>> >compaction
>> >to finish. Since this a major compaction was triggered - all the
>> >stores (large number)  present on the disks (7 disk per RS) will be
>> >grabbed
>> >for major compact, and the I/O is affected. Network is also affected
>>since
>> >all are major compacting at the same time and replicating files on same
>> >time (1GB network).
>>
>> When you have an IO problem, there are multiple pieces at play that you
>> can adjust:
>>
>> Write: HLog, Flush, Compaction
>> Read: Point Query, Scan
>>
>> If your writes are far more than your reads, then you should relax one
>>of
>> the write pieces.
>> - HLog: You can't really adjust HLog IO outside of key compression
>> (HBASE-4608)
>> - Flush: You can adjust your compression.  None->LZO == 5x compression.
>> LZO->GZ == 2x compression.  Both are at the expense of CPU.  HBASE-4241
>> minimizes flush IO significantly in the update-heavy use case (discussed
>> this in the last email).
>> - Compaction: You can lower the compaction ratio to minimize the amount
>>of
>> rewrites over time.  That's why I suggested changing the ratio from 1.2
>>->
>> 0.25.  This gives a ~50% IO reduction (blog post on this forthcoming @
>> http://www.facebook.com/UsingHBase ).
>>
>> However, you may have a lot more reads than you think.  For example,
>>let's
>> say read:write ratio is 1:10, so significantly read dominated.  Without
>> any of the optimizations I listed in the previous email, your real read
>> ratio is multiplied by the StoreFile count (because you naively read all
>> StoreFiles).  So let say, during congestion, you have 20 StoreFiles.
>> 1*20:10 means that you're now 2:1 read dominated.  You need features to
>> reduce the number of StoreFiles you scan when the StoreFile count is
>>high.
>>
>> - Point Query: bloom filters (HBASE-1200, HBASE-2794), lazy seek
>> (HBASE-4465), and seek optimizations (HBASE-4433, HBASE-4434,
>>HBASE-4469,
>> HBASE-4532)
>> - Scan: not as many optimizations here.  Mostly revolve around proper
>> usage & seek-next optimization when using filters. Don't have JIRA
>>numbers
>> here, but probably half-dozen small tweaks were added to 0.92.
>>
>> >I don't have an increment workload (the workload either update columns
>>on
>> >a
>> >CF or add column on a CF for the same key), so how those patch will
>>help?
>>
>> Increment & read->update workload end up roughly picking up the same
>> optimizations.  Adding a column to an existing row is no different than
>> adding a new row as far as optimizations are concerned because there's
>> nothing to de-dupe.
>>
>> >I don't say this is a bad thing, this is just an observation from our
>> >test,
>> >HBase will slow down the flush in case too many store file are present,
>> >and
>> >will add pressure on GC and memory affecting performance.
>> >The update workload does not send all the row content for a certain
>>key so
>> >only partial data is written, in order to get all the row i presume
>>that
>> >reading the newest Store is not enough ("all" stores need to be read
>> >collecting the more up to date field a rebuild a full row), or i'm
>>missing
>> >something?
>>
>> Reading all row columns is the same as doing a scan.  You're not doing a
>> point query if you don't specify the exact key (columns) you're looking
>> for.  Setting versions to unlimited, then getting all versions of a
>> particular ROW+COL would also be considered a scan vs a point query as
>>far
>> as optimizations are concerned.
>>
>> >1. If i did not set a specific property for bloom filter (BF), does it
>> >means that i'm not using them (the book only refer to BF with regards
>>to
>> >CF)?
>>
>> By default, bloom filters are disabled, so you need to enable them to
>>get
>> the optimizations.  This is by design.  Bloom Filters trade off cache
>> space for low-overhead probabilistic queries.  Default is 8-bytes per
>> bloom entry (key) & 1% false positive rate.  You can use 'bin/hbase
>> org.apache.hadoop.hbase.io.hfile.HFile' (look at help, then -f to
>>specify
>> a StoreFile and then use -m for meta info) to see your StoreFile's
>>average
>> KV size.  If size(KV) == 100 bytes, then blooms use 8% of the space in
>> cache, which is better than loading the StoreFile block only to get a
>>miss.
>>
>> Whether to use a ROW or ROWCOL bloom filter depends on your write & read
>> pattern.  If you read the entire row at a time, use a ROW bloom.  If you
>> point query, ROW or ROWCOL are both options.  If you write all columns
>>for
>> a row at the same time, definitely use a ROW bloom.  If you have a small
>> column range and you update them at different rates/times, then a ROWCOL
>> bloom filter may be more helpful.  ROWCOL is really useful if a scan
>>query
>> for a ROW will normally return results, but a point query for a ROWCOL
>>may
>> have a high miss rate.  A perfect example is storing unique hash-values
>> for a user on disk.  You'd use 'user' as the row & the hash as the
>>column.
>>  Most instances, the hash won't be a duplicate, so a ROWCOL bloom would
>>be
>> better.
>>
>> >3. How can we ensure that compaction will not suck too much I/O if we
>> >cannot control major compaction?
>>
>> TCP Congestion Control will ensure that a single TCP socket won't
>>consume
>> too much bandwidth, so that part of compactions is automatically
>>handled.
>> The part that you need to handle is the number of simultaneous TCP
>>sockets
>> (currently 1 until multi-threaded compactions) & the aggregate data
>>volume
>> transferred over time.  As I said, this is controlled by
>>compaction.ratio.
>>  If temporary high StoreFile counts cause you to bottleneck, slight
>> latency variance is an annoyance of the current compaction algorithm but
>> the underlying problem you should be looking at solving is the system's
>> inability to filter out the unnecessary StoreFiles.
>>
>>

Re: Major Compaction Concerns

Reply via email to