[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations

stack (JIRA) Mon, 20 Apr 2015 16:32:25 -0700

    [ 
https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503943#comment-14503943
 ]


stack commented on HBASE-13389:
-------------------------------

bq. This may be hard to achieve because out of order puts can be flushed at 
different time.

Do 'out of order' puts happen at DLR time only [~jeffreyz]? i.e. WALs can be 
replayed in any order since they are farmed out over the cluster. We also 
cannot guarantee when a region that is receiving DLR edits will flush hfiles; 
e.g. we could get row1/logSeqId=2 during DLR and flush because we had memory 
pressure, but then later row1/logSeqId=1 might arrive and be flushed into a 
newer hfile. The fix for this is to not let compactions happen when region is 
in recovery -- this is probably the case already (or let compactions go on but 
preserve mvcc while in recovery)?

So, the Lars fix would be to drop mvcc if no scanner outstanding with a span 
that includes mvcc in current hfile AND we are not in DLR recovery mode? 

Are there other places where we might have out-of-order puts? (Flushes are 
single threaded and edits go into FSHLog and MemStore in order caveat Elliott 
and Nate's recent find: 
https://issues.apache.org/jira/browse/HBASE-12751?focusedCommentId=14377157&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14377157).

bq. ...and only keep mvcc around during region recovery time so that we can 
still keep HBASE-12600 goal

Yes.

On keeping seqid in the KV in hfiles so we can do "...out of order in minor 
compactions..... "

...don't we mean compacting non-adjacent files rather than out-of-order here?

So, yeah, if we preserved mvcc always, we could do any order and non-adjacent. 
Would be nice.

Otherwise, as I see it, if we want to do non-adjacent compactions (which as 
[~lhofhansl] says above, we do not currently have), then we could do it if all 
files under a Store have zero for mvcc and we just order the edits by the hfile 
meta data mvcc number. When there are files with an mvcc per KV, then we should 
probably merge those first...  Would have to think it through more.

It gets a little complicated though if the Store has some files with a hfile 
meta data mvcc number but other files have an mvcc per KV. We could not do a 
file that has an mvcc per KV with a non-adjacent 

But we could do it also if files with zero if we have the Lars optimization, we 
could do non-adjacent if we respected the hfile seqid order.  It gets tricky if 
a file has mvcc in the KV and all the rest do not.  Files with KVs in the mvcc 
need to be compacted together ahead of 

> [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
> -------------------------------------------------------------
>
>                 Key: HBASE-13389
>                 URL: https://issues.apache.org/jira/browse/HBASE-13389
>             Project: HBase
>          Issue Type: Sub-task
>          Components: Performance
>            Reporter: stack
>         Attachments: 13389.txt
>
>
> HBASE-12600 moved the edit sequenceid from tags to instead exploit the 
> mvcc/sequenceid slot in a key. Now Cells near-always have an associated 
> mvcc/sequenceid where previous it was rare or the mvcc was kept up at the 
> file level. This is sort of how it should be many of us would argue but as a 
> side-effect of this change, read-time optimizations that helped speed scans 
> were undone by this change.
> In this issue, lets see if we can get the optimizations back -- or just 
> remove the optimizations altogether.
> The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291.
> The optimizations undone by this changes are (to quote the optimizer himself, 
> Mr [~lhofhansl]):
> {quote}
> Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166.
> We're always storing the mvcc readpoints, and we never compare them against 
> the actual smallestReadpoint, and hence we're always performing all the 
> checks, tests, and comparisons that these jiras removed in addition to 
> actually storing the data - which with up to 8 bytes per Cell is not trivial.
> {quote}
> This is the 'breaking' change: 
> https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations

Reply via email to