[ https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839728#comment-13839728 ]
Jeffrey Zhong commented on HBASE-8763: -------------------------------------- {quote} What will we do if two edits arrive with same coordinates? How will we distingush them if both have long.max during the time it takes to sync and converte long.max to a legit seqid? {quote} Basically the MVCC write number only needs to make sure scanner can't see them before a write is done. Therefore we can assign them to Long.MAX. It means all in-progress writes belongs to one bucket and scanner can't see them. Once a write is done, we assign them the logSeqNumber in WAL appending order and then bump up the min read point so that all writes before current log sequence number are visible to scanners. In this case, client can see changes in the order we commit the writes. There are two orders in today's code because we assign the write number before a write starts: receiving order and commit order. For example, Put1 has write number 1 and Put2 has write number 2 while Put2 can finish earlier than Put1 but Put2 still need wait for Put1 to finish. This cause issues for replication and recovery because both replies on the order(commit order) in the WAL file. {quote} What are the two locks J? {quote} In file MultiVersionConsistencyControl, the locks guard the access to writeQueue. Since we don't need keep the receiving order(which have to today because large write number could complete earlier than smaller write number), we can remove the related code as you can see my proof-of-concept patch beginMemstoreInsertUseSeqNum & advanceMemstoreUseSeqNum. I still keep a collection inProgressWrites because our Increment, Append etc needs all in-progress done but this part can be optimized by just keeping a hashmap for rows which row lock are released but not wal synced yet. Thanks. {code} public WriteEntry beginMemstoreInsert() { synchronized (writeQueue) { long nextWriteNumber = ++memstoreWrite; WriteEntry e = new WriteEntry(nextWriteNumber); writeQueue.add(e); return e; } } boolean advanceMemstore(WriteEntry e) { synchronized (writeQueue) { ... while (!writeQueue.isEmpty()) { ... } } } {code} > [BRAINSTORM] Combine MVCC and SeqId > ----------------------------------- > > Key: HBASE-8763 > URL: https://issues.apache.org/jira/browse/HBASE-8763 > Project: HBase > Issue Type: Improvement > Components: regionserver > Reporter: Enis Soztutar > Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch > > > HBASE-8701 and a lot of recent issues include good discussions about mvcc + > seqId semantics. It seems that having mvcc and the seqId complicates the > comparator semantics a lot in regards to flush + WAL replay + compactions + > delete markers and out of order puts. > Thinking more about it I don't think we need a MVCC write number which is > different than the seqId. We can keep the MVCC semantics, read point and > smallest read points intact, but combine mvcc write number and seqId. This > will allow cleaner semantics + implementation + smaller data files. > We can do some brainstorming for 0.98. We still have to verify that this > would be semantically correct, it should be so by my current understanding. -- This message was sent by Atlassian JIRA (v6.1#6144)