Anoop Sam John created HBASE-11401:
--------------------------------------

             Summary: Issue with seqNo binding for KV mvcc
                 Key: HBASE-11401
                 URL: https://issues.apache.org/jira/browse/HBASE-11401
             Project: HBase
          Issue Type: Bug
    Affects Versions: 0.99.0
            Reporter: Anoop Sam John
            Priority: Critical
             Fix For: 0.99.0


After HBASE-8763, we have combined KV mvcc and HLog seqNo. This is implemented 
in a tricky way now.
In HRegion on write path, we first write to memstore and then write to HLog 
finally sync log. So at the time of write to memstore we dont know the WAL 
seqNo.  To overcome this, we hold the ref to the KV objects just added to 
memstore and pass those also to write to wal call. Once the seqNo is obtained, 
we will reset the mvcc is those KVs with this seqNo.  (While write to memstore 
we wrote kvs with a very high temp value for mvcc so that concurrent readers 
wont see them)
This model works well with the DefaultMemstore.  During the write there wont be 
any concurrent call to snapshot(). 
But now we have memstore as a pluggable interface. The above model of late 
binding assumes that the memstore internal datastructure continue to refer to 
same java objects. This might not be true always.  Like in HBASE-10713, in btw 
the kvs can be converted into a CellBlock. If we discontinue to refer to same 
KV java objects, we will fail in getting the seqNo assigned as kv mvcc.

If we were doing write and sync to wal and then write to memstore, this would 
have get solved. But this model we changed (in 94 I believe) for better perf. 
Under HRegion level lock, we write to memstore and then to wal. Finally out of 
lock we do the the log sync.  So we can not change it now

I tried changing the order of ops within the lock (ie. write to log and then to 
memstore) so that we can get the seqNo when write to memstore. But because of 
the new HLog write model, we are not guarenteed to get the write to done 
immediately. 

One possible way can be add a new API in Log level, to get a next seqNo alone. 
Call this first and then using which write to memstore and then to wal (using 
this seqNo).  Just a random thought. Not tried.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to