[ https://issues.apache.org/jira/browse/HBASE-11401?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14045336#comment-14045336 ]
Jeffrey Zhong commented on HBASE-11401: --------------------------------------- I think that the sync you mention is to only get sequence number not the WAL syncOrDefer which may take a while. Thanks. > Issue with seqNo binding for KV mvcc > ------------------------------------ > > Key: HBASE-11401 > URL: https://issues.apache.org/jira/browse/HBASE-11401 > Project: HBase > Issue Type: Bug > Affects Versions: 0.99.0 > Reporter: Anoop Sam John > Priority: Critical > Fix For: 0.99.0 > > Attachments: memstore.txt > > > After HBASE-8763, we have combined KV mvcc and HLog seqNo. This is > implemented in a tricky way now. > In HRegion on write path, we first write to memstore and then write to HLog > finally sync log. So at the time of write to memstore we dont know the WAL > seqNo. To overcome this, we hold the ref to the KV objects just added to > memstore and pass those also to write to wal call. Once the seqNo is > obtained, we will reset the mvcc is those KVs with this seqNo. (While write > to memstore we wrote kvs with a very high temp value for mvcc so that > concurrent readers wont see them) > This model works well with the DefaultMemstore. During the write there wont > be any concurrent call to snapshot(). > But now we have memstore as a pluggable interface. The above model of late > binding assumes that the memstore internal datastructure continue to refer to > same java objects. This might not be true always. Like in HBASE-10713, in > btw the kvs can be converted into a CellBlock. If we discontinue to refer to > same KV java objects, we will fail in getting the seqNo assigned as kv mvcc. > If we were doing write and sync to wal and then write to memstore, this would > have get solved. But this model we changed (in 94 I believe) for better perf. > Under HRegion level lock, we write to memstore and then to wal. Finally out > of lock we do the the log sync. So we can not change it now > I tried changing the order of ops within the lock (ie. write to log and then > to memstore) so that we can get the seqNo when write to memstore. But because > of the new HLog write model, we are not guarenteed to get the write to done > immediately. > One possible way can be add a new API in Log level, to get a next seqNo > alone. Call this first and then using which write to memstore and then to wal > (using this seqNo). Just a random thought. Not tried. -- This message was sent by Atlassian JIRA (v6.2#6252)