[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020272#comment-14020272 ] Hadoop QA commented on HBASE-8763: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12648663/hbase-8763-v5.2.patch against trunk revision . ATTACHMENT ID: 12648663 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 3 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9708//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.2.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14020352#comment-14020352 ] stack commented on HBASE-8763: -- All above comments are good by me. This is good: getPreAssignedWriteNumber +1 Please fix the javadoc on commit and file follow on issues. Please also add a comment that explains why the Pair is needed and conditions under which we can remove it. Also explain in comment why the MutableLong is used though it seems it not needed. bq. We don't have setMvccVersion in Cell interface. Do you want to create one? Ugh. Cell Interface is read-only as it should be. Would it make sense having another Interface, MVCCSettable with a setMvccVersion in it? This would be server-side only? Is there ever a time when we know the mvcc constructing a Cell such that we can shove it in on Construction other than at deserializing or clone time? [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.2.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14019021#comment-14019021 ] Jeffrey Zhong commented on HBASE-8763: -- [~saint@gmail.com] Thanks for the review and good comments. {quote} Why return a Pair? We are passing in the Cell? Can't caller use passed in Cell and just leave addTo returning long as before (I am missing something I know... pardon my being slow). We are making a new object each time we add to memstore, the Pair. {quote} That's the issue on how we add cells to memestore today. Basically it does copy the passed in Cells and put newly created Cells into memstore. So I have to return those newly added Cells to update mvcc later(because JAVA doesn't support reference to a primitive type). In future, we shouldn't make copies when add cells into memstore. {quote} If so, a comment on the @param would help {quote} Sure. I'll do that. {quote} create a new one each time through? Just as we have WALEdit.EMPTY_WALEDIT? {quote} Good point! I'll do that. {quote} On beginMemstoreInsert, why take a value at all {quote} It's used by beginMemstoreInsertWithSeqNum. One way is that I can let beginMemstoreInsert to call beginMemstoreInsertWithSeqNum instead. After this, I think it can address several following comments. {quote} NO_SEQUENCE_ID Should be a define in your new SequenceId interface? A comment on wny you do 'w = null;' would be helpful in flush: e.g. Set to null to indicate success Change name of memstoreKVs to be memstoreCells (be forward thinking!) {quote} Ok, I'll change that. {quote} Do we need the MutableLong here still? {quote} The reason is that I want to hide the fact of bumping 1 billion number inside this function and keep the bumping in one place. I could define a constant for this purpose {quote} Can these be lists of Cells rather than private final transient ListKeyValue memstoreKVs;? You can do cell.setMvccVersion. {quote} We don't have setMvccVersion in Cell interface. Do you want to create one? {quote} Are we not passing the KVs twice? Once in WALEdits and then again in this new memstoreKVs argument? {quote} My firstly version(not published) is using KVs in WALEdits while HRegion#doProcessRowWithTimeout let clients to create WALEdits so it's impossible to merge those two lists. {quote} Man, the mvcc stuff should be redone w/ disruptor. Looks like ideal disruptor case. {quote} The issue lands in we have different durability modes so the disruptor way maybe hard. As in the earlier review thread, we could remove mvcc writeQueue while it needs to keep the sequence id of the last unflushed edit with sync_wal durability. Let the optimization be done later. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018163#comment-14018163 ] stack commented on HBASE-8763: -- Looking at 5.1: Why return a Pair? We are passing in the Cell? Can't caller use passed in Cell and just leave addTo returning long as before (I am missing something I know... pardon my being slow). We are making a new object each time we add to memstore, the Pair. Why would the below take a list of kvs to appendNoSyncNoAppend? When would it ever make sense passing a list of kvs? (Hmm... I think I see why -- when we want to just set an mvcc on the KV though we are not appending it -- is that right? If so, a comment on the @param would help) In the appendNoSyncNoAppend we make HLogKey. We call System.currentTimeMillis. This edit is never appended. Can we NOT call System.currentTimeMillis? Just pass a -1 or something instead? Or can we make a noop HLogKey defined as a static in HLogKey and just use it every time rather than create a new one each time through? Just as we have WALEdit.EMPTY_WALEDIT? On beginMemstoreInsert, why take a value at all? Wny not just return the WriteEntry that has special value for the write number? If we ever try to use this number advancing the read point, throw exceptions? Remove the current beginMemstoreInsert that does not take an entry? I see that in your new method, waitForPreviousTransactionsComplete, you put something into the mvcc queue w/ a HLog.NO_SEQUENCE_ID and wait for this edit to go around so you can be sure queue is cleared. So you have second use for special mvcc/sequenceid number. Should the NO_SEQUENCE_ID be it? and you just use it when beginMemstoreInsert is called setting it into the WriteEntry? Should the number even come from HLog? Could it be private to this class? When we do waitForPreviousTransactionsComplete, does WriteEntry have a writeNumber set? Or is it NO_SEQUENCE_ID? If it is the latter, yeah, just change beginMemstoreInsert to not take a param, or at least, not take this particular one because it is means of asking for a special behavior. If the writeNumber is set, where does that happen? NO_SEQUENCE_ID Should be a define in your new SequenceId interface? A comment on wny you do 'w = null;' would be helpful in flush: e.g. Set to null to indicate success Change name of memstoreKVs to be memstoreCells (be forward thinking!) I am not clear still on why the below is ok up in HRegion#doMiniBatchMutation (Do we need the MutableLong here still? Why not just set the sequenceid into beginMemstoreInsertWithSeqNum and you are doing this when you use it kv.setMvccVersion(mvccNum.longValue());) mvccNum.setValue(this.sequenceId.incrementAndGet()); At a minimum it needs a comment explaining why (Sorry if I am being dense here). You know why we add to memstore first before WAL? For speed IIRC. I should go research it. This rollback stuff could be tricky. So then here: mvccNum.setValue(this.sequenceId.incrementAndGet()); w = mvcc.beginMemstoreInsertWithSeqNum(mvccNum); We are getting a seqid and setting it as write number. We have not yet gone on the ring buffer. Every edit is getting a write number like this? MVCC read number happens only after the WAL append has happened. Man, the mvcc stuff should be redone w/ disruptor. Looks like ideal disruptor case. StoreFlusher change no longer needed? Can these be lists of Cells rather than private final transient ListKeyValue memstoreKVs;? You can do cell.setMvccVersion. Why this? long appendNoSync(HTableDescriptor htd, HRegionInfo info, HLogKey key, WALEdit edits, AtomicLong sequenceId, boolean inMemstore, ListKeyValue memstoreKVs) Unnecssary import in HLogSplitter? Unnecessary change in WALEdit? Are we not passing the KVs twice? Once in WALEdits and then again in this new memstoreKVs argument? I'm running tests now to see what this patch does for performance. After our chat yesterday, yes, I see, it should not have much of an impact (especially looking at what you did in FSHLog). That'd be cool. I'm excited about this patch coming in. Great work Mr. Zhong. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018165#comment-14018165 ] Liyin Tang commented on HBASE-8763: --- Hi, I am out of office since 9/1/2012 to 9/16/2012 and I cannot access to this email. In urgent case, please forward your email to liyint...@gmail.com Thanks a lot Liyin [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14018385#comment-14018385 ] stack commented on HBASE-8763: -- This patch seems to get us our speed back. Good on one [~jeffreyz] Below do basic 1, 5, 25, and 200 threads test: Here is nopatch, the current master branch: {code} nopatch1.1.txt:2014-06-04 12:11:52,176 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100, syncInterval=0 took 1224.313s 816.785ops/s nopatch5.1.txt:2014-06-04 12:29:02,864 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100, syncInterval=0 took 1025.163s 4877.273ops/s nopatch25.1.txt:2014-06-04 12:53:02,973 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=25, iterations=100, syncInterval=0 took 1434.641s 17425.963ops/s nopatch200.1.txt:2014-06-04 13:40:30,333 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=200, iterations=100, syncInterval=0 took 2841.947s 70374.289ops/s {code} Here is w/ patch applied: {code} patch1.1.txt:2014-06-04 14:37:04,973 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=1, iterations=100, syncInterval=0 took 1228.775s 813.819ops/s patch5.1.txt:2014-06-04 14:53:53,623 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=5, iterations=100, syncInterval=0 took 1003.234s 4983.882ops/s patch25.1.txt:2014-06-04 15:17:17,952 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=25, iterations=100, syncInterval=0 took 1398.927s 17870.840ops/s patch200.1.txt:2014-06-04 15:47:36,297 INFO [main] wal.HLogPerformanceEvaluation: Summary: threads=200, iterations=100, syncInterval=0 took 1813.013s 110313.609ops/s {code} [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14014466#comment-14014466 ] Hadoop QA commented on HBASE-8763: -- {color:green}+1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647731/hbase-8763-v5.1.patch against trunk revision . ATTACHMENT ID: 12647731 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 15 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9654//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.1.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010314#comment-14010314 ] stack commented on HBASE-8763: -- Patch looks great. On commit, remove this comment. Doesn't seem appropriate to the statement that follows: +// Record the mvcc for all transactions in progress. You need to do this? - + mvccNum.setValue(this.sequenceId.incrementAndGet()); I'm trying to keep it so all sequenceid increments happen inside HLog only. You do it in a few places. Does it have to increment? Could you just get current? Would that work? This should almost be a method because it happens more than a few times: + if(walKey == null){ +// Append a faked WALEdit in order for SKIP_WAL updates to get mvccNum assigned +walKey = this.appendNoSyncNoAppend(this.log, mvccNum); + } Rename SequenceNumberAssignor as SequenceNumber and method as getSequenceNumber? If no one else reviews in next few days, I'll give it another go (having trouble concentrating on this because I've looked at a few versions of this patch). Overall I think this is excellent. Maybe we should just commit and then tune up in new issues? Not sure we can get our speedup back looking at this patch at the mo. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010336#comment-14010336 ] Hadoop QA commented on HBASE-8763: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12646094/hbase-8763-v4.patch against trunk revision . ATTACHMENT ID: 12646094 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified tests. {color:red}-1 patch{color}. The patch command could not apply the patch. Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9607//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010637#comment-14010637 ] stack commented on HBASE-8763: -- Then, does it have to be getSequenceId? There may be appends in flight inside in the disruptor and so you'll get a sequence id that may be superceded? [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010682#comment-14010682 ] Hadoop QA commented on HBASE-8763: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12647028/hbase-8763-v5.patch against trunk revision . ATTACHMENT ID: 12647028 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:red}-1 release audit{color}. The applied patch generated 31 release audit warnings (more than the trunk's current 0 warnings). {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//testReport/ Release audit warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/patchReleaseAuditProblems.txt Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9610//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14010789#comment-14010789 ] Jeffrey Zhong commented on HBASE-8763: -- {quote} There may be appends in flight inside in the disruptor and so you'll get a sequence id that may be superceded? {quote} Yes. The mvccNum.setValue(this.sequenceId.incrementAndGet()); here ensures the ordering of KVs when inserted into memstore for the same row key and their mvcc values will be replaced later with actual sequence id while the ordering is maintained. Because appendNoSync also happens while rowlock is held so their actual sequence Ids maintain the same ordering. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763-v4.patch, hbase-8763-v5.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000131#comment-14000131 ] stack commented on HBASE-8763: -- [~jeffreyz] Excellent work. Thanks for the writeup. Helps. bq. After mvcc log sequence combin We can do a bunch of stuff... But I like your idea about clients being able to read at least to last flush. Sweet. Some comments on the patch: Should beginMemstoreInsert assign a write mvcc number at all? This won't work: +w = mvcc.beginMemstoreInsert(this.sequenceId.incrementAndGet()); Or at least I don't think you are getting what you think you are getting. See the note on sequenceid. I think you have to use the new getNextSequenceId. Then you do this + flushSeqId = getNextSequenceId(wal); at end of flush. Do we need the earlier mvcc? You do it thrice I see (There is 'lag' between adding of append to ring buffer and it being consumed and its edit/sequence id being updated... perhaps an edit is in the ring buffer, you update the sequence id thinking you are getting the last sequence id but subsequently, the ring buffer consumer runs...). Misspelling: waitForPreviousTransactoinsComplete I asked already and I think you explained but still not sure what is going on here: + mvcc.waitForPreviousTransactoinsComplete(w); + w = null; Call this appendNoSyncFakedWALEdit appendNoSyncNoAppend? It will make folks sit up and wonder why you would do such a thing and they'd read the javadoc. It is more explicit than appendNoSyncFakedWALEdit. I'd think the below is temporary? It is convenient updating once and all related get the update. Is that the only reason? Could we iterate the KVs that make up the edit and set their MVCC? The edits are not yet in the memstore, right? They get their mvcc before they are added to the MemStore? (Always?) - private long mvcc = 0; // this value is not part of a serialized KeyValue (not in HFiles) + private MutableLong mvcc; // this value is not part of a serialized KeyValue (not in HFiles) This is the clone that we need to get rid of, right (smile)? -newKv.setMvccVersion(kv.getMvccVersion()); +newKv.setMvccVersion(kv.getMvccVersionReference()); nit: don't need to set the memstoreRead... it is done in the declaration. -this.memstoreRead = this.memstoreWrite = 0; +memstoreRead = 0; Set top bits rather than add a big number?: +curSeqNum.setValue(originalVal + 10); Do we need to have MemStore know about HLogKeys? And that they have this odd 'waiting' thing that you can do on them? + long newSeqNum = walKey.waitForLogSeqNumAssigned(); + e.setWriteNumber(newSeqNum); Could we do the wait in the WAL system before we call completeMemstoreInsertWithSeqNum passing in the sequence id to use? Could the consumer on the ring buffer call the Memstore. completeMemstoreInsertWithSeqNum? Hmm... maybe we should make the writeQueue a disruptor too? Later. +1 on moving the latch from waledit to walkey. Nice. You fixing this test? TestMultiParallel Patch looks good. Can give a closer review later. This first pass should do for now. Great stuff Jeffrey [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000519#comment-14000519 ] stack commented on HBASE-8763: -- +1 on removing waitQueue in new JIRA bq. I'll pass HLog.NO_SEQUENCE_ID value then. Yes. I tried to make it so only the WAL system does the sequence id update so these this.sequenceId.incrementAndGet() make me go yuck (smile) +1 on MutableLong (I may ask you to make the same explanation again in the future so remember this bit of text so you can copy/paste it next time I ask). Ok on not setting top bits. Add comments on how large needs to be (theoretically). Helps others coming along after better understand. On MemStore knowing about WALKey, as in+ public void completeMemstoreInsertWithSeqNum(WriteEntry e, HLogKey walKey) throws IOException {, that is just butt ugly (pardon the technical term). Doing '+import org.apache.hadoop.hbase.regionserver.wal.HLogKey;' is so ugly, I'd suggest we have an Interface that has one method in it -- waitForLogSeqNumAssigned or getSequenceId or getLogSeqNumAssigned (and just hide the fact that we are waiting) -- and pass that instead. Good stuff Jay-Z [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000515#comment-14000515 ] Hadoop QA commented on HBASE-8763: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12645323/hbase-8763-v3.patch against trunk revision . ATTACHMENT ID: 12645323 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 12 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 6 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:green}+1 core tests{color}. The patch passed unit tests in . Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9526//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000581#comment-14000581 ] Jeffrey Zhong commented on HBASE-8763: -- {quote} I'd suggest we have an Interface that has one method in it – waitForLogSeqNumAssigned or getSequenceId or getLogSeqNumAssigned (and just hide the fact that we are waiting) – and pass that instead. {quote} Ok, I'll do that in my next patch. The QA run succeeded except java doc warnings which will be cleared in my next patch. Thanks. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000455#comment-14000455 ] Jeffrey Zhong commented on HBASE-8763: -- Thanks [~saint@gmail.com] for the good comments! I'll make corresponding changes in my next patch. Please see my answers below: {quote} Hmm... maybe we should make the writeQueue a disruptor too? Later. {quote} That's a good idea. I could also remove the writeQueue even without disruptor. Basically return the previous highestUnsyncedSequence from disruptor consumer code where we stampRegionSequenceId in FSHLog#append() before current wal append, inside syncOrDefer we will wait for this previous highestUnsyncedSequence for skip_wal async_wal case. Let's do it later in a separate JIRA. {quote} Should beginMemstoreInsert assign a write mvcc number at all? {quote} No need to assign a write number and I could pass 0 here. The main purpose for the call is inserting a marker so that later code can wait all mvcc transactions before marker complete. this.sequenceId.incrementAndGet() is just a way to pass some good number without messing up the memstore read point later by mvcc.advanceMemstore(w) call. I'll pass HLog.NO_SEQUENCE_ID value then. {quote} Misspelling: waitForPreviousTransactoinsComplete {quote} Good catch. {quote} Call this appendNoSyncFakedWALEdit appendNoSyncNoAppend {quote} Ok. I'll make the change {quote} It is convenient updating once and all related get the update. Is that the only reason? {quote} It's not only for convenient but for correctness. Let's say we have two updates coming in the order of c1, c2 while c1 could have seqId=2 and c2 could have seqId=1. As you know wal syncer syncs all available pending appends and it's likely both wal entries for e1 and e2 are synced at same time. Therefore, the mvcc for c1 will advance memstore read point to 2. Since MutableLong is used here, so we know for sure c1's MVCC has been updated to 1 before c2 gets its seqid=2 otherwise we could end up with the situation that memstore read point has been set to 2 while c1's mvcc in menstore hasn't been updated yet depending on the caller thread scheduling.(Unless I pass MVCC into ring buffer where we keep the reference of all new KVs and update mvcc for them in disruptor consumer code) . {quote} Set top bits rather than add a big number?: + curSeqNum.setValue(originalVal + 10); {quote} How about the case if the first bit is already used for a large sequence number? In theory, I only need to bump the number to 2 * the number of rpc handlers because the writeQueue will be blocked if current mvcc isn't complete. The big number here is just to be safe. {quote} Do we need to have MemStore know about HLogKeys? Could we do the wait in the WAL system before we call completeMemstoreInsertWithSeqNum passing in the sequence id to use? Could the consumer on the ring buffer call the Memstore. completeMemstoreInsertWithSeqNum? {quote} Memstore doesn't know hlogkey but MVCC. I can do it outside of MVCC but that requires all places calling completeMemstoreInsertWithSeqNum need to remember to call the waitForLogSequence. It may leave hole in the future. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: HBase MVCC LogSeqId Combined.pdf, hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997068#comment-13997068 ] Jeffrey Zhong commented on HBASE-8763: -- Thanks [~saint@gmail.com]! I'm migrating my patch on top of hbase-11135 now. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996821#comment-13996821 ] stack commented on HBASE-8763: -- I committed hbase-11135 so hopefully this patch is cleaner. I opened HBASE-11160 for the case where we can hopefully let go of append having to wait on edit/sequence id updates (early-binding instead of late-binding). [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13994785#comment-13994785 ] Jeffrey Zhong commented on HBASE-8763: -- {quote} Early- vs late-binding would change this patch? {quote} Yes, it makes the patch much easier for edits with durablity=SKIP_WAL ASYNC_WAL situation otherwise it would do similar things of HBASE-11135. {quote} I'm not sure I'm clear on what could be rolled back out of memstore around flush. Or, can there be more doc on how mvcc and sequence id are interacting here? {quote} During flush, we take a region write lock(prevent all writes coming into a region), append a marker in MVCC queue, get flush sequence Id, take a mem store snapshot and then release the region write lock. After lock release, we wait for the MVCC marker we appended while holding the write lock in order for all in-flight transactions before acquiring the region write lock to complete(either succeed or rollback). Since there is only one copy of MVCC which is a MutableLong object referenced by all related KVs, the rollback of KVs from mem store should have no issue. {quote} Is the best write up on how this is going to work going forward what is above in this issue? {quote} Sure. I'll modify this patch on top of HBASE-11135 and then a write up on the final implementation. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13992516#comment-13992516 ] stack commented on HBASE-8763: -- This changes with hbase-11109: {code} + flushSeqId = this.sequenceId.incrementAndGet(); +} else { + // use the provided sequence Id as WAL is not being used for this flush. + flushSeqId = myseqid; {code} ... but that is fine. Let 11109 worry about it. I'm not sure I'm clear on what could be rolled back out of memstore around flush. Or, can there be more doc on how mvcc and sequence id are interacting here? For those who come after us? This should be called sequenceId: +MutableLong seqNumber = new MutableLong(); A left shift would be better? +return beginMemstoreInsert(curSeqNum + 10); Did a quick skim. Early- vs late-binding would change this patch? Is the best write up on how this is going to work going forward what is above in this issue? Thanks [~jeffreyz] [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988533#comment-13988533 ] Hadoop QA commented on HBASE-8763: -- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12643167/hbase-8763.patch against trunk revision . ATTACHMENT ID: 12643167 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:green}+1 tests included{color}. The patch appears to include 6 new or modified tests. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:red}-1 javadoc{color}. The javadoc tool appears to have generated 4 warning messages. {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 1.3.9) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.io.TestHeapSize Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/9452//console This message is automatically generated. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13987070#comment-13987070 ] stack commented on HBASE-8763: -- bq. ...and I reset the value of the MutaleLong in one place so all new KVs in MemStore will see the updated sequence number... Nice. bq. If our WAL Sync doesn't late binding then I don't need to use MutableLong. IIUC, if we want the region sequence id (and hence the mvcc) to reflect the order in which edits appear in the WAL, we must do late binding. See my natterings over in HBASE-11099. bq. ...and reset their MVCC values in an extra loop Tell me more. Where the KVs be 'kept'? In the WALEdit? The extra loop would be after a sync or on append? bq. ...our pre post co-processor copies MVCC values... Ok. Ugly. Would be cool getting this in w/o breaking CPs in 1.0 but we should break the API after 1.0 if forcing inefficiency. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986354#comment-13986354 ] stack commented on HBASE-8763: -- (Pardon me; am excited about this one so went back to do some more study...) So yeah, why do a MutableLong in KV rather than just a volatile? Probably same in the end... I see you want to get a reference to a KV. You trying to tie a KV to something else in case the mvcc gets updated? Maybe this is it: -newKv.setMvccVersion(kv.getMvccVersion()); +newKv.setMvccVersion(kv.getMvccVersionReference()); We are cloning. The original gets updated later? You want to make sure clone is updated too? Would be cool if we could get rid of this clone one day. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986360#comment-13986360 ] Jeffrey Zhong commented on HBASE-8763: -- The reason using MutableLong object is that at the very beginning we don't know the real sync sequence number(due to the late binding) so I use MutableLong object which keeps a faked big sequence number. All new KVs and related of this transation reference this mvcc mutablelong object. Once after the corresponding WALEdit is synced(after SyncOrDefer call), we have the real sequence number and I reset the value of the MutaleLong in one place so all new KVs in MemStore will see the updated sequence number(because they keep the reference to this MVCC(MutableLong) instance. If our WAL Sync doesn't late binding then I don't need to use MutableLong. [~enis] is suggesting not to use MutableLong while keeping all new KVs and reset their MVCC values in an extra loop. This may be hard to implement because our pre post co-processor copies MVCC values as in the code you pasted above(where I changed to copy reference) {code} newKv.setMvccVersion(kv.getMvccVersionReference()); {code} My plan is to get all tests pass and then do enhancement/refactoring that you and [~enis] are suggesting. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985080#comment-13985080 ] Enis Soztutar commented on HBASE-8763: -- Great to see progress Jeffrey. As talked offline, I think we do not want to allocate 2 objects on heap rather than one: {code} - private long mvcc = 0; // this value is not part of a serialized KeyValue (not in HFiles) + private MutableLong mvcc; // this value is not part of a serialized KeyValue (not in HFiles) {code} Although, changing the mvcc long after object construction may require it to be declared volatile. I think you are using the technique of double incrementing the seqId, once at trx start and once at end right? Did you try your append 1B to seqId for memstore approach? I think I also changed Cell.getMvcc() to Cell.getSeqId(). Maybe worth doing once we get the patch fully running and performant. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13985127#comment-13985127 ] stack commented on HBASE-8763: -- You the man [~jeffreyz] Spelling here waitForPreviousTransactoinsComplete What is happening here? syncOrDefer(txid, durability); +// Get LogSequenceNumber from WAL Sync +if(walEdit.getLogKey() != null) { + seqNumber.setValue(walEdit.getLogKey().getLogSeqNum()); +} The seqNumber of the last wallEdit added to the WAL and sync'd? The seq number could be well beyond this in actually? Does that matter? Should we let you have access to last sync'd id out of WAL? You import but don't use? +import org.apache.commons.lang.mutable.MutableInt; +import org.apache.commons.lang.mutable.MutableLong; ... maybe a few times. Why you use the MutableLong instead of say a long or a volatile? Will do a closer review later. So far looks great. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Assignee: Jeffrey Zhong Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857630#comment-13857630 ] Liyin Tang commented on HBASE-8763: --- I totally vote for combining the MVCC and SeqID. Furthermore, it will be even more straightforward if the SeqID does not shared across all the Regions. Ideally, each region shall have its own monotonously increasing seq id. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857661#comment-13857661 ] Sergey Shelukhin commented on HBASE-8763: - I thought the latter was already done? [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857665#comment-13857665 ] Ted Yu commented on HBASE-8763: --- Right. See HBASE-8741 [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857680#comment-13857680 ] Liyin Tang commented on HBASE-8763: --- Thanks for jira ! If SeqID has already been per-region basis, and we want to combine the MVCC, then how do we want to handle the group commit across multiple regions ? [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857813#comment-13857813 ] Jeffrey Zhong commented on HBASE-8763: -- I'm working with [~enis] on this though in a slow pace because the feature can most likely be released in 0.99 or later. There are work left to be done like upgrade handling performance evaluation. [~liyin] Not much special handing on log group commit which you can check on hbase-8741. Basically log sequence number won't monotonically increase in a WAL while it is for a single region. MVCC will become a region specific value not global in RS level so we need to maintain a map region - read point for scanning. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13857820#comment-13857820 ] Liyin Tang commented on HBASE-8763: --- [~jeffreyz], I see. Thanks for the clarification and it makes sense to me now ! [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Priority: Critical Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13856987#comment-13856987 ] Sergey Shelukhin commented on HBASE-8763: - Any update on this? Resolving this confusion would help greatly. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839529#comment-13839529 ] stack commented on HBASE-8763: -- bq. 1) Memstore insert using long.max as the initial write number What will we do if two edits arrive with same coordinates? How will we distingush them if both have long.max during the time it takes to sync and converte long.max to a legit seqid? bq. Currently, we maintain an internal queue which might defer the read point bump up if transactions complete order is different than that of MVCC internal write queue. A reason to unify MVCC and WAL seqid (smile). bq. By doing above, it's possible to remove the logics maintaining writeQueue ... We need the writeQueue for performance reasons, right? We need to add edits in bulk under a lock and this lock is expensive to obtain (maybe I am missing something?) bq. ...so it means we can remove two locking and one queue loop in write code path. What are the two locks J? Otherwise, sounds great. Will look at patches... [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839728#comment-13839728 ] Jeffrey Zhong commented on HBASE-8763: -- {quote} What will we do if two edits arrive with same coordinates? How will we distingush them if both have long.max during the time it takes to sync and converte long.max to a legit seqid? {quote} Basically the MVCC write number only needs to make sure scanner can't see them before a write is done. Therefore we can assign them to Long.MAX. It means all in-progress writes belongs to one bucket and scanner can't see them. Once a write is done, we assign them the logSeqNumber in WAL appending order and then bump up the min read point so that all writes before current log sequence number are visible to scanners. In this case, client can see changes in the order we commit the writes. There are two orders in today's code because we assign the write number before a write starts: receiving order and commit order. For example, Put1 has write number 1 and Put2 has write number 2 while Put2 can finish earlier than Put1 but Put2 still need wait for Put1 to finish. This cause issues for replication and recovery because both replies on the order(commit order) in the WAL file. {quote} What are the two locks J? {quote} In file MultiVersionConsistencyControl, the locks guard the access to writeQueue. Since we don't need keep the receiving order(which have to today because large write number could complete earlier than smaller write number), we can remove the related code as you can see my proof-of-concept patch beginMemstoreInsertUseSeqNum advanceMemstoreUseSeqNum. I still keep a collection inProgressWrites because our Increment, Append etc needs all in-progress done but this part can be optimized by just keeping a hashmap for rows which row lock are released but not wal synced yet. Thanks. {code} public WriteEntry beginMemstoreInsert() { synchronized (writeQueue) { long nextWriteNumber = ++memstoreWrite; WriteEntry e = new WriteEntry(nextWriteNumber); writeQueue.add(e); return e; } } boolean advanceMemstore(WriteEntry e) { synchronized (writeQueue) { ... while (!writeQueue.isEmpty()) { ... } } } {code} [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13839816#comment-13839816 ] stack commented on HBASE-8763: -- bq. It means all in-progress writes belongs to one bucket and scanner can't see them. These will not be in the memstore, right? They will be held aside? We only add to memstore after they get their seqid? Else, the second edit overwrites the first? bq. ...Put1 has write number 1 and Put2 has write number 2 while Put2 can finish earlier than Put1 but Put2 still need wait for Put1 to finish. This cause issues for replication and recovery because both replies on the order(commit order) in the WAL file. Yes. I like the way you call this out. Please talk this fact up going forward. Sorry, let me go look at your code. That should help me follow along. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837109#comment-13837109 ] Jeffrey Zhong commented on HBASE-8763: -- Today I had some discussion with [~enis] and [~te...@apache.org] on this topic and found it might be possible to handle the JIRA issue in a simpler way. Below are the steps: 1) Memstore insert using long.max as the initial write number 2) append no sync 3) sync 4) update WriteEntry's write number to the sequence number returned from Step 2 5) CompleteMemstoreInsert. In this step, make current read point to be = the sequence number from Step 2. The reasoning behind this is that once we sync till the sequence number, all changes with small sequence numbers are already synced into WAL. Therefore, we should be able to bump up read number to the last sequence number synced. Currently, we maintain an internal queue which might defer the read point bump up if transactions complete order is different than that of MVCC internal write queue. By doing above, it's possible to remove the logics maintaining writeQueue so it means we can remove two locking and one queue loop in write code path. Sounds too good to be true :-). Let me try to write a quick patch and run it against unit tests to see if the idea could fly. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837226#comment-13837226 ] Enis Soztutar commented on HBASE-8763: -- This may actually work with a very small amount of changes. It would be great if you can hack up a working patch and run the unit tests. In my case, I've seen the current set of UT's to cover most of the concurrency issues I ran into while developing this. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13837300#comment-13837300 ] Jeffrey Zhong commented on HBASE-8763: -- I tried a small patch. Since we support SKIP_WAL model, the MVCC.writeQueue is still needed to main the write order because there is no wal sync operation at all. Also there are quite a few test cases doesn't do appendNosync between mvcc.beginMemstoreInsert and mvcc.completeMemstoreInsert so they are needed to be adjusted. So far I didn't find block issues but still need to verify it thoroughly. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Attachments: hbase-8763_wip1.patch HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780489#comment-13780489 ] Himanshu Vashishtha commented on HBASE-8763: Hey Enis Jeffrey, could you guys please share the approach for this issue which is currently under testing. Thanks. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13780598#comment-13780598 ] Enis Soztutar commented on HBASE-8763: -- Definitely. I have a patch which works on 90% of the unit tests :) Was going to create a RB, but never got to make it stable enough, perf test, handle old mvcc numbers in files, etc. The approach is that we get keep MVCC, but mvcc write number will be the log sequence number. So instead of: 1. get mvcc number 2. memstore insert 3. append no sync 4. sync we are doing: 1. append no sync 2. memstore insert 3. sync You can check the TrxManager, which keeps track of on the fly transactions. MVCC is an impl of TrxMngr. HLog will order the transactions by the update lock, and create and add a trx to the Trx manager. TrxManager still ensures that the transactions are committed (made visible) in serial order. row locking order is also not changed, meaning, read-write transcations still have to wait for on-flight transactions to finish, after they acquire the row locks. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702436#comment-13702436 ] Sergey Shelukhin commented on HBASE-8763: - I have another idea, related to this, that intends to solve problems with clock skew also. If user doesn't supply ts, we can also combine ts with seqId and mvcc :) That way conflicts due to clock skew won't happen assuming seqId doesn't go back; which it doesn't under current assumptions such as compactions and log recovery. If user does, we can store user ts and seqId or just user ts (but then user has to avoid conflicts, same as now). This makes mixed scenarios more difficult though, and some people may rely in default ts actually storing time. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13702546#comment-13702546 ] stack commented on HBASE-8763: -- Can we just do the mvcc and seqid unification? It would be a nice simplification. Only hitch I see is ensuring that we write the WAL in sequenceid order (might require a little sort before we add to the WAL done under lock). On merging in ts, that looks like we'd be asking for trouble (smile). That'd be radical. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687345#comment-13687345 ] Enis Soztutar commented on HBASE-8763: -- I also realized that, if I am not wrong, current multi row atomic mutations is broken with scanners + RS crashes. Since mvcc is not persisted, if a multi put changing r1,r100 happens where mvcc = 100, the scanner with mvcc = 90 will not see r1. Just after passing r1, the scanner might fail, and the new scanner in the new region server will get another mvcc, but since the changes for previous multi put has been persisted (in log recovery), the scanner will happily see r100 mutation. The underlying reason multi puts + scanner for a region has to see a snapshot of the region, but mvcc is ephemeral. This can also be fixed by saving the seqId's in hfiles, and when a region scanner is opened, the client obtains the scanner seqId (mvcc read point) and uses this number in case of failover. [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8763) [BRAINSTORM] Combine MVCC and SeqId
[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13687444#comment-13687444 ] Sergey Shelukhin commented on HBASE-8763: - +1 on idea after we have discussed it [BRAINSTORM] Combine MVCC and SeqId --- Key: HBASE-8763 URL: https://issues.apache.org/jira/browse/HBASE-8763 Project: HBase Issue Type: Improvement Components: regionserver Reporter: Enis Soztutar Fix For: 0.98.0 HBASE-8701 and a lot of recent issues include good discussions about mvcc + seqId semantics. It seems that having mvcc and the seqId complicates the comparator semantics a lot in regards to flush + WAL replay + compactions + delete markers and out of order puts. Thinking more about it I don't think we need a MVCC write number which is different than the seqId. We can keep the MVCC semantics, read point and smallest read points intact, but combine mvcc write number and seqId. This will allow cleaner semantics + implementation + smaller data files. We can do some brainstorming for 0.98. We still have to verify that this would be semantically correct, it should be so by my current understanding. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira