[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16566424#comment-16566424 ] Anoop Sam John commented on HBASE-13389: I think so.. So looks like now we tend to keep this mvcc for really longer. One or the other reason. Can we make it such that we keep the mvcc in long format rather than vlong? As it is vlong, every cell read need to read the vlong byte by byte and causing perf. For random read, the seek need to skip many cells. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Priority: Major > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16564192#comment-16564192 ] Lars Hofhansl commented on HBASE-13389: --- I was just pointed to this again... It's just not just DRL, it's also replication, right [~anoop.hbase]? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack >Priority: Major > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116696#comment-15116696 ] Anoop Sam John commented on HBASE-13389: It is HBASE-15020. We can remove whole code parts added for DLR. Any way that is disabled and found to have bugs still. Ya may be we can undo HBASE-12600 now itself I feel (before whole code parts removal) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116678#comment-15116678 ] Lars Hofhansl commented on HBASE-13389: --- Thanks [~anoop.hbase]. Which jira are you referring to? Can we undo HBASE-12600 right now, or does it depend on this other jira being implemented first? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15116641#comment-15116641 ] Anoop Sam John commented on HBASE-13389: There is already a jira to remove the DLR and cleanup code. So then we can undo HBASE-12600 and will get back the old mvcc parse optimization > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632376#comment-14632376 ] Lars Hofhansl commented on HBASE-13389: --- So where are we with this? To answer your question above [~stack], in the subtask I just put the part of the optimization back, namely if all involved HFiles have a max timestamp of 0, then there is no need to write the timestamp into the new HFile (as all would be 0 anyway). (previously it did that if all timestamp are older than the oldest running scanner, but as discussed here, we can't do that any long) So how do we proceed with this one? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14510470#comment-14510470 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} I don't see the WALEdit sequenceid being used when we replicate. Is this something to implement? (Sounds like a good idea... ) {quote} [~saint@gmail.com]I thought we already had used it because intra-replication did otherwise I can give a first try on this. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14509175#comment-14509175 ] stack commented on HBASE-13389: --- bq. After 6 days (currently) we are zero'ing mvcc during compactions. [~lhofhansl] Pardon me, where is this enforced (all hfiles participating in a compaction must be 6-days old?) 6-days is arbitrary, right? It means that there cannot be a WAL outstanding that has an edit that has not yet been flushed, right? It also means, that there cannot be an edit in a WAL that has not been replicated and flushed on the remote side? Is there a trigger we could use instead rather than an arbitrary timing? [~jeffreyz] Thanks for chiming in. I don't see the WALEdit sequenceid being used when we replicate. Is this something to implement? (Sounds like a good idea... ) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508508#comment-14508508 ] Lars Hofhansl commented on HBASE-13389: --- Feel free to +1 the patch on HBASE-13497 :) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508507#comment-14508507 ] Lars Hofhansl commented on HBASE-13389: --- Thanks for the explanation. After 6 days (currently) we _are_ zero'ing mvcc during compactions. In HBASE-13497 I allow a compaction to not store mvcc stuff, when it's all 0 anyway (not looking at the current scanner, but only going my the HFile's data). So that's safe at least. I agree we cannot put the original optimization back. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14508495#comment-14508495 ] Jeffrey Zhong commented on HBASE-13389: --- [~saint@gmail.com] Well said and good examples! As of today. there are two cases that we could have out of order puts: DLR or replication, where the order of wal files to be replayed isn't guaranteed. For non-adjacent hfile compactions, it seems that we have to keep mvcc in KVs level, For example, hfile1(max mvcc=1) hfile2(max mvcc=2) and hfile3(max mvcc=3). If we just compact hfile1 and hfile3, we can't set the newly compacted hfile's max mvcc=3 because hfile2 may have same rows in either hfile1 or hfile2. Keeping mvcc will make the "haunting" out-of-order issue go away and one less concern. Let me know which option we should go and I can also help on the fix. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14504361#comment-14504361 ] stack commented on HBASE-13389: --- Thinking on it, during out-of-order DLR, there are a few ways in which we could lose data if we bring back the optimization that zeros all mvccs promoting the highest mvcc seen to be the hfiles mvcc kept in the hfile metadata. During recovery of a region during DLR, we may flush hfiles in a manner such that the older edits are in the most recently flushed file or hfiles are made of edits that do not have a linearly increasing mvcc. This is a violation of tenets that hold when flushes always drop files that have mvcc/sequenceid in excess of files currently present in the filesystem (and whose edits have increasing mvccs) We have to be careful compacting these files dropped during recovery. We need to compact them all up together first -- after the region comes on line -- before we can mix them in with zero'd mvcc files (it has to be after region comes online and not before because region may crash during recovery having dropped one or more out-of-order hfiles) Here is an illustration. A region is recovering. It comes under memory pressure so flushes the edits it received so far. It so happens that it mostly received older edits but a few new ones came in too. It dumps out (Let the letters be keys and the numbers mvcc): A 2 B 4 C 10 Recovery completes and it drops another hfile: A 1 B 5 C 11 Now, if we compact the first file with a zero'd mvcc file with a sequenceid of 8, the product will be a zero'd mvcc hfile whose seqid is 10. If we then compact this '10' file with the second file flushed, we lose the 'B 5' edit because it is < '10'. Even if we compacted all three files together -- the zero'd mvcc hfile and the two files dropped during recovery -- we could lose 'B 5' and 'A 2' since both have mvccs < '10'. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503943#comment-14503943 ] stack commented on HBASE-13389: --- bq. This may be hard to achieve because out of order puts can be flushed at different time. Do 'out of order' puts happen at DLR time only [~jeffreyz]? i.e. WALs can be replayed in any order since they are farmed out over the cluster. We also cannot guarantee when a region that is receiving DLR edits will flush hfiles; e.g. we could get row1/logSeqId=2 during DLR and flush because we had memory pressure, but then later row1/logSeqId=1 might arrive and be flushed into a newer hfile. The fix for this is to not let compactions happen when region is in recovery -- this is probably the case already (or let compactions go on but preserve mvcc while in recovery)? So, the Lars fix would be to drop mvcc if no scanner outstanding with a span that includes mvcc in current hfile AND we are not in DLR recovery mode? Are there other places where we might have out-of-order puts? (Flushes are single threaded and edits go into FSHLog and MemStore in order caveat Elliott and Nate's recent find: https://issues.apache.org/jira/browse/HBASE-12751?focusedCommentId=14377157&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14377157). bq. ...and only keep mvcc around during region recovery time so that we can still keep HBASE-12600 goal Yes. On keeping seqid in the KV in hfiles so we can do "...out of order in minor compactions. " ...don't we mean compacting non-adjacent files rather than out-of-order here? So, yeah, if we preserved mvcc always, we could do any order and non-adjacent. Would be nice. Otherwise, as I see it, if we want to do non-adjacent compactions (which as [~lhofhansl] says above, we do not currently have), then we could do it if all files under a Store have zero for mvcc and we just order the edits by the hfile meta data mvcc number. When there are files with an mvcc per KV, then we should probably merge those first... Would have to think it through more. It gets a little complicated though if the Store has some files with a hfile meta data mvcc number but other files have an mvcc per KV. We could not do a file that has an mvcc per KV with a non-adjacent But we could do it also if files with zero if we have the Lars optimization, we could do non-adjacent if we respected the hfile seqid order. It gets tricky if a file has mvcc in the KV and all the rest do not. Files with KVs in the mvcc need to be compacted together ahead of > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14503806#comment-14503806 ] Lars Hofhansl commented on HBASE-13389: --- Thanks [~jeffreyz], just discussed a bit with [~stack]... If we kept the in-order compactions, we won't need MVCC stamps in the HFile beyond the oldest scanner, right? I feel like I am missing something. Could you show an example of when we need MVCC stamps in the HFile beyond the oldest scanner when you have some time? The issue has to do with Puts/Deletes happening in the same millisecond, right? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502373#comment-14502373 ] Jeffrey Zhong commented on HBASE-13389: --- That sounds good. We can shorter the time period to 2 or 3 days. In one case that keeping mvcc longer can gain some performance because it makes possible that we can compact HFiles out of order in minor compactions. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502201#comment-14502201 ] Lars Hofhansl commented on HBASE-13389: --- Correctness > Performance :) We need to have this correct in all cases. OK. So lemme file a sub issue to apply the patch I attache here. Not as good as the original patch, but better. Here we need to have the discussion about how long to keep the Cells, it seems we want this less than the minimum time between major compactions (which is 3.5 days currently - 1 week +- 1/2 week) for performance (but again correctness is more important). Might also want to change the detection code for whether a major compaction is needed, if we can rid of MVCC stamp, we should major compact. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14502173#comment-14502173 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} All other cases we should be covering with metadata in the HFiles trailer, not on individual Cells. {quote} This may be hard to achieve because out of order puts can be flushed at different time. Let's say row1/logSeqId=2 is flushed earlier than row1/logSeqId=1. HFile trailer meta data's mvcc range will be overlapped among multiple HFiles. One option is that we can reinstate your original code by checking against the oldest running scanner and only keep mvcc around during region recovery time so that we can still keep HBASE-12600 goal. If not much overall read performance degrade(because this part may not be the bottleneck in the read path), I think it's better to keep current way so all cases can work correctly for out of order puts. How do you guys think? Thanks. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14500883#comment-14500883 ] Lars Hofhansl commented on HBASE-13389: --- Sure. After discussions we had ([~stack], [~apurtell], and I) I think we can reinstate my original code, where we check against the oldest running scanner and if all Cell in an HFile are older than that scaner (in terms of MVCC timestamp) we can set them to 0 and not store them upon compaction. The observation being that we only need MVCC stamps in the HFile to cover flushes/compactions that happen while a current scanner is running. All other cases we should be covering with metadata in the HFiles trailer, not on individual Cells. [~jeffreyz], do you agree? HBASE-12600 changed that, and I bet you had a very good reason. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492743#comment-14492743 ] stack commented on HBASE-13389: --- [~lhofhansl] Yes, in a subtask. Lets figure this 6 days vs 3 days vs a couple of hours and other items raised here as other subtasks or issues. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14492722#comment-14492722 ] Lars Hofhansl commented on HBASE-13389: --- Should we apply my patch. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14486820#comment-14486820 ] Lars Hofhansl commented on HBASE-13389: --- Misunderstood the comparison order. All good :) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482558#comment-14482558 ] Jeffrey Zhong commented on HBASE-13389: --- Changing comparing order in row > column -> ts -> seqId -> type order can make things more consistently and doesn't change HBase current idempotence. For example, for puts with the same timestamp, the last put wins while if we do put, delete, put or delete, put , put and the delete always win. I think it's better that a delete should be treated as a put so users can have same exceptions as puts. Otherwise, for low time resolution OS or when a put is missing, we often want to check if there is a delete overshadowing newer puts. Yeah, keeping mvcc 3 days is good enough. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482550#comment-14482550 ] stack commented on HBASE-13389: --- [~lhofhansl] 3 days is arbitrary. If it so happens that someone has failed to observe a backed up replication and don't notice it till a week has gone by, will they lose data? Can we not have the optimization instead cut in when it is guaranteed the mvcc is no longer needed? In-cluster, it seems a matter of hours will do it as long as we check that locally flushing is working. Below are a few 'statements' of why I think it should be fine. For replication, as I read it, edits get a new seqid when applied to the sink cluster so source cluster seqid doesn't factor in at all. Maybe I misread. As per Enis, I don't get how idempotency is effected. Notes on in-cluster: * For log replay in-cluster, when Distributed Log Replay, if a Region gets an edit with a seqid that fits inside a range covered by an existing hfile, then we can just drop it because it already persisted. This would be for case were an old WAL is being replayed though the edits have been flushed out to hfiles already (and the optimization dropping mvcc has been run). * If a region can't flush, then we should not run the optimization (This is probably ok... compaction will likely just not succeed if we can't flush but optimization should check last flush time). * If no replication, optimization can run on any file as long as no outstanding scanners and read point is beyond the oldest edit in an hfile (optimization does this now I believe). > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482334#comment-14482334 ] Enis Soztutar commented on HBASE-13389: --- bq. I would actually be against it, since it breaks the fact that all mutations in HBase are idempotent - when the client encounters any problem with a batch of updates, it can just do those again, and the outcome would be identical I don't understand how this is related to idempotent updates. The sort order proposed will still keep ts before the type/seqId. 3 days should be good enough for replication I say. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482322#comment-14482322 ] Lars Hofhansl commented on HBASE-13389: --- I think we had comment overlap. :) bq. ...you are not against changing sort order so that seqid prevails over type are you...? I would actually be against it, since it breaks the fact that all mutations in HBase are idempotent - when the client encounters any problem with a batch of updates, it can just do those again, and the outcome would be identical - within the limits of what HBase defines, i.e. with ms resolution, now we would complicate that, and need explaining to do. So with the discussion above in place, can be lower the default time to 3 days? So that we can be reasonably sure that major compactions would purge the mvcc cruft? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482283#comment-14482283 ] stack commented on HBASE-13389: --- bq. So with all this I do see any reason to keep these for more than a few hours. Its not log rolling as per Enis. It is when memstore is flushed. Default is memstores are flushed at least once an hour: public static final int DEFAULT_CACHE_FLUSH_INTERVAL = 360; So if an old edit comes in during distributed log replay, an edit that has already been flushed to an hfile, we need to be able to put it in the appropriate slot (as you say). This can happen if we are overplaying edits in case where Master does not have last flush sequenceid on a region. If HFiles have all their seqids, it is easy. But if mvcc has been purged from hfiles (optimization) and we get an edit that falls into the hfile time range, we are going to be confused. Somehow the optimization purging mvcc should not run until we are sure old WALs with seqids older than those in hfiles for all regions have been let go. For replication, yeah, needs a few days. The root of the lag may take a few days to fix. On the put -> delete -> put, you are not against changing sort order so that seqid prevails over type are you [~lhofhansl]? Would be good change for 2.0. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482262#comment-14482262 ] Lars Hofhansl commented on HBASE-13389: --- Yeah, not related to log roll, sorry. I meant the max time before we force a memstore flush (1 hour by default)... HBASE-5930. I still have not heard a convincing reason why the time to keep the mvcc stuff around needs to be greater than an hour or two :) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482163#comment-14482163 ] Enis Soztutar commented on HBASE-13389: --- bq. when we replay data due to recovery we want it to fall into the right place w.r.t to existing data. Why do we need more than the maximum time to roll a log (1h)? I think the min time to keep is max time an edit can live in the memstore without being flushed. This is not related to log roll (since we still replay edits from a previous log roll) but how much further an edit can be replayed through dist log replay I think. Case 3 as Jeff puts it is an issue with the comparison order. We compare entries with {{row > column -> ts -> type -> seqId}} order, however, we should compare entries in {{row > column -> ts -> seqId -> type}} order so that Put, Delete, Put with the same TS works. If we do better resolution for ts's, this is not needed though. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14482106#comment-14482106 ] stack commented on HBASE-13389: --- I have been trying to write up life of a sequenceid: https://docs.google.com/document/d/16beczDie-KU1uSpJvd0GoUlQbPtQBL93rOOPqnE5Ma0/edit# Let me pick it up again. Will add in above notes. Would be sweet if could backfill tests that verify our expectations align with the story we are telling. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481916#comment-14481916 ] Lars Hofhansl commented on HBASE-13389: --- We do need to revisit the 6 days, right? Would 3 days be enough? Lemme try to understand the cases: # when we replay data due to recovery we want it to fall into the right place w.r.t to existing data. Why do we need more then the maximum time to roll a log (1h)? # replication... Yeah, that's important. I'd say if you have a replication lag of more than a few hours you have a larger problem anyway. # This too... Although I do not actually agree that this is an advantage. Mutations (including deletes) being idempotent in HBase is a feature and not a problem. So with all this I do need any reason to keep these for more than a few hours. It's very possible that I am missing something. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14481348#comment-14481348 ] Jeffrey Zhong commented on HBASE-13389: --- {quote} To what does the above statement apply? To all three of your 'cases' or just to the last case, case #3? {quote} Just for case#3. The other two cases need mvcc around for a little bit time. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396059#comment-14396059 ] stack commented on HBASE-13389: --- bq. Seems to me not needed(before I thought we need to keep mvcc around till a major compaction) [~jeffreyz] Please say more. To what does the above statement apply? To all three of your 'cases' or just to the last case, case #3? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14396012#comment-14396012 ] stack commented on HBASE-13389: --- @jeffrey zhong kwhwn you say AR end of first comment above "seems to me not needed...". I do not follow can u say more? Thanks > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395995#comment-14395995 ] Jeffrey Zhong commented on HBASE-13389: --- There is another thought. If we can keep mvcc being part of key byte array(logically it is but not in key serialization & deserialization) then we could use lazy read approach because mvcc is hardly used during key comparison. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395994#comment-14395994 ] Jeffrey Zhong commented on HBASE-13389: --- Thanks [~lhofhansl] for looking this. I think your patch can help a bit. {quote} Do we need valid (non 0) mvcc readpoints for committed data (i.e. data that was flushed to an HFile and hence we'll never need to replay any HLogs for those)? Do we need these anywhere but in the memstore? {quote} There are three cases(I could think of and maybe more) that we need the logSeqId(mvcc) around to help us keep the put order. Assuming all put/deletes are of same row & timestamp(version) case 1) region server recovery case We need mvcc(logSeqId) only when region is in recovery mode but not after recovery. case 2) replication receiving side, we need logSeqId to maintain the order because region move or recovery in replication playing side cause puts out of order We need mvcc for couple of days(to be safe) so that at least the data eventually in receiving side are correct. case 3) put , delete, put. Currently delete overshadows the later put but with logSeqId we can easily solve the issue because logSeqId is the real version of a put. Seems to me not needed(before I thought we need to keep mvcc around till a major compaction) > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > Attachments: 13389.txt > > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395875#comment-14395875 ] Lars Hofhansl commented on HBASE-13389: --- Lastly... 6 days from HBASE-11315 is not right. We have major compaction set to be done every 7 days, with a jitter of 1/2 week. I.e. data might be major compacted as early as _3.5_ days. The retention of mvcc data should be less than that. Maybe 3 days or so. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395851#comment-14395851 ] Lars Hofhansl commented on HBASE-13389: --- [~jeffreyz], [~stack], I have to admit I do not quite follow the reasoning behind HBASE-12600. The main question I have: Do we need valid (non 0) mvcc readpoints for committed data (i.e. data that was flushed to an HFile and hence we'll never need to replay any HLogs for those)? Do we need these anywhere but in the memstore? > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14395844#comment-14395844 ] Lars Hofhansl commented on HBASE-13389: --- It turns out that the optimization in HBASE-8151 and HBASE-9751 still works, but only after 6 days, when compactions allow setting mvcc readpoints to 0. I think we can get the optimization for HBASE-8166 and back and still have HBASE-12600 correctly, if we replace this: {code} - final boolean needMvcc = fd.maxMVCCReadpoint >= smallestReadPoint; + final Compression.Algorithm compression = store.getFamily().getCompactionCompression(); StripeMultiFileWriter.WriterFactory factory = new StripeMultiFileWriter.WriterFactory() { @Override public Writer createWriter() throws IOException { return store.createWriterInTmp( - fd.maxKeyCount, compression, true, needMvcc, fd.maxTagsLength > 0); + fd.maxKeyCount, compression, true, true, fd.maxTagsLength > 0); } {code} With this: {code} - final boolean needMvcc = fd.maxMVCCReadpoint >= smallestReadPoint; + final boolean needMvcc = fd.maxMVCCReadpoint >= 0; final Compression.Algorithm compression = store.getFamily().getCompactionCompression(); StripeMultiFileWriter.WriterFactory factory = new StripeMultiFileWriter.WriterFactory() { @Override public Writer createWriter() throws IOException { return store.createWriterInTmp( fd.maxKeyCount, compression, true, needMvcc, fd.maxTagsLength > 0); } {code} So when all mvccr readpoint are 0, the next compaction can then still do the optimization for HBASE-8166 and not write the mvcc information at all. It just will be later... Before we already do that when we do not have any scanner open with a readpoint older than any of the readpoints in the HFile, now we have to wait until comactions set them all to 0. It's not all that bad. [~stack], if the data is older than 6 days I'd expect this to no longer show in the profiler. Maybe we need to write some unittests for this, although I assume that won't be easy. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394693#comment-14394693 ] stack commented on HBASE-13389: --- bq. I'm surprised the extra mvcc value caused so much perf regression. Yeah, weirdly I see it costing us a bunch. Will report better over in HBASE-13291. bq. Should we keep the time period configuration shorter or revert all related changes? We'll figure it [~jeffreyz] Thanks for the input. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394114#comment-14394114 ] Jeffrey Zhong commented on HBASE-13389: --- Should we keep the time period configuration shorter or revert all related changes? Thanks. > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13389) [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations
[ https://issues.apache.org/jira/browse/HBASE-13389?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14394111#comment-14394111 ] Jeffrey Zhong commented on HBASE-13389: --- [~stack] The performance regression is due to we keep mvcc values longer(HBASE-11315) so comes the later change https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96. I'm surprised the extra mvcc value caused so much perf regression. Here is the code which calculates minSeqId to keep in file Compactor.java during compaction. {code} // when isAllFiles is true, all files are compacted so we can calculate the smallest // MVCC value to keep if(fd.minSeqIdToKeep < file.getMaxMemstoreTS()) { fd.minSeqIdToKeep = file.getMaxMemstoreTS(); } // output to writer: for (Cell c : cells) { if (cleanSeqId && c.getSequenceId() <= smallestReadPoint) { CellUtil.setSequenceId(c, 0); } {code} > [REGRESSION] HBASE-12600 undoes skip-mvcc parse optimizations > - > > Key: HBASE-13389 > URL: https://issues.apache.org/jira/browse/HBASE-13389 > Project: HBase > Issue Type: Sub-task > Components: Performance >Reporter: stack > > HBASE-12600 moved the edit sequenceid from tags to instead exploit the > mvcc/sequenceid slot in a key. Now Cells near-always have an associated > mvcc/sequenceid where previous it was rare or the mvcc was kept up at the > file level. This is sort of how it should be many of us would argue but as a > side-effect of this change, read-time optimizations that helped speed scans > were undone by this change. > In this issue, lets see if we can get the optimizations back -- or just > remove the optimizations altogether. > The parse of mvcc/sequenceid is expensive. It was noticed over in HBASE-13291. > The optimizations undone by this changes are (to quote the optimizer himself, > Mr [~lhofhansl]): > {quote} > Looks like this undoes all of HBASE-9751, HBASE-8151, and HBASE-8166. > We're always storing the mvcc readpoints, and we never compare them against > the actual smallestReadpoint, and hence we're always performing all the > checks, tests, and comparisons that these jiras removed in addition to > actually storing the data - which with up to 8 bytes per Cell is not trivial. > {quote} > This is the 'breaking' change: > https://github.com/apache/hbase/commit/2c280e62530777ee43e6148fd6fcf6dac62881c0#diff-07c7ac0a9179cedff02112489a20157fR96 -- This message was sent by Atlassian JIRA (v6.3.4#6332)