[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839728#comment-13839728
 ] 

Jeffrey Zhong commented on HBASE-8763:
--------------------------------------

{quote}
What will we do if two edits arrive with same coordinates? How will we 
distingush them if both have long.max during the time it takes to sync and 
converte long.max to a legit seqid?
{quote}
Basically the MVCC write number only needs to make sure scanner can't see them 
before a write is done. Therefore we can assign them to Long.MAX. It means all 
in-progress writes belongs to one bucket and scanner can't see them.  Once a 
write is done, we assign them the logSeqNumber in WAL appending order and then 
bump up the min read point so that all writes before current log sequence 
number are visible to scanners. In this case, client can see changes in the 
order we commit the writes.

There are two orders in today's code because we assign the write number before 
a write starts: receiving order and commit order. For example, Put1 has write 
number 1 and Put2 has write number 2 while Put2 can finish earlier than Put1 
but Put2 still need wait for Put1 to finish. This cause issues for replication 
and recovery because both replies on the order(commit order) in the WAL file.

{quote}
What are the two locks J?
{quote}

In file MultiVersionConsistencyControl, the locks guard the access to 
writeQueue. Since we don't need keep the receiving order(which have to today 
because large write number could complete earlier than smaller write number), 
we can remove the related code as you can see my proof-of-concept patch 
beginMemstoreInsertUseSeqNum & advanceMemstoreUseSeqNum. I still keep a 
collection inProgressWrites because our Increment, Append etc needs all 
in-progress done but this part can be optimized by just keeping a hashmap for 
rows which row lock are released but not wal synced yet. 

Thanks.

{code}
  public WriteEntry beginMemstoreInsert() {
    synchronized (writeQueue) {
      long nextWriteNumber = ++memstoreWrite;
      WriteEntry e = new WriteEntry(nextWriteNumber);
      writeQueue.add(e);
      return e;
    }
  }

  boolean advanceMemstore(WriteEntry e) {
    synchronized (writeQueue) {
       ...
         while (!writeQueue.isEmpty()) {
        ...
         }
     }
}
{code}



> [BRAINSTORM] Combine MVCC and SeqId
> -----------------------------------
>
>                 Key: HBASE-8763
>                 URL: https://issues.apache.org/jira/browse/HBASE-8763
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Enis Soztutar
>         Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch
>
>
> HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
> seqId semantics. It seems that having mvcc and the seqId complicates the 
> comparator semantics a lot in regards to flush + WAL replay + compactions + 
> delete markers and out of order puts. 
> Thinking more about it I don't think we need a MVCC write number which is 
> different than the seqId. We can keep the MVCC semantics, read point and 
> smallest read points intact, but combine mvcc write number and seqId. This 
> will allow cleaner semantics + implementation + smaller data files. 
> We can do some brainstorming for 0.98. We still have to verify that this 
> would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.1#6144)

Reply via email to