[ 
https://issues.apache.org/jira/browse/HBASE-11099?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986374#comment-13986374
 ] 

stack commented on HBASE-11099:
-------------------------------

[~jeffreyz] I don't think so. Down in append we do this:

{code}
...
    long sequence = this.disruptor.getRingBuffer().next();
    try {
      RingBufferTruck truck = this.disruptor.getRingBuffer().get(sequence);
      FSWALEntry entry =
        new FSWALEntry(sequence, logKey, edits, sequenceId, inMemstore, htd, 
info);
      truck.loadPayload(entry, scope.detach());
    } finally {
      this.disruptor.getRingBuffer().publish(sequence);
    }
...
{code}

So we get a slot on the ring buffer and load it up.  When ready to go, we 
publish to the ring.

Threads contend here abouts so publishing can be happening in any order (that 
could be ok).

(Reading the setAvailable, called when we publish, I can't tell how it works 
w/o running some tests; i.e. does publishing make it available for processing 
though there are sequences ahead of this one not yet published?. I could do 
that.)

The ring buffer sequence number is an internal detail not related to region 
sequence id. Wouldn't I have to relate them doing the above (ringbuffer is 
regionserver scoped)?  Otherwise, I would have to synchronize -- i.e. block -- 
the disruptor so I could tie the disruptor id getting and the upping of the 
region sequence id together?  Unless I used the disruptor id as region sequence 
id? (would need to check that publish respected disruptor id).  Disruptor id is 
a long.  Say 100k writes a second, I think its 3M years till we roll over 
(would have to check -- disruptor might be using some of the higher order bits 
as flags).

Also at flush time, don't we want all that could be in the snapshot sync'd 
rather than just appended?  I know sync is a pretty faint guarantee but it 
would be better than our using a seqid of an edit not sync'd?  Thinking on it, 
this might not be necessay.  If the flush succeeds, we probably had a sync come 
in in in the meantime.  Could do a sync outside of the update lock to be sure.

What you think boss?  (thanks for the help here).


> Two situations where we could open a region with smaller sequence number
> ------------------------------------------------------------------------
>
>                 Key: HBASE-11099
>                 URL: https://issues.apache.org/jira/browse/HBASE-11099
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.99.0
>            Reporter: Jeffrey Zhong
>             Fix For: 0.99.0
>
>
> Recently I happened to run into code where we potentially could open region 
> with smaller sequence number:
> 1) Inside function: HRegion#internalFlushcache. This is due to we change the 
> way WAL Sync where we use late binding(assign sequence number right before 
> wal sync).
> The flushSeqId may less than the change sequence number included in the flush 
> which may cause later region opening code to use a smaller than expected 
> sequence number when we reopen the region.
> {code}
> flushSeqId = this.sequenceId.incrementAndGet();
> ...
> mvcc.waitForRead(w);
> {code}
> 2) HRegion#replayRecoveredEdits where we have following code:
> {code}
> ...
>           if (coprocessorHost != null) {
>             status.setStatus("Running pre-WAL-restore hook in coprocessors");
>             if (coprocessorHost.preWALRestore(this.getRegionInfo(), key, 
> val)) {
>               // if bypass this log entry, ignore it ...
>               continue;
>             }
>           }
> ...
>           currentEditSeqId = key.getLogSeqNum();
> {code} 
> If coprocessor skip some tail WALEdits, then the function will return smaller 
> currentEditSeqId. In the end, a region may also open with a smaller sequence 
> number. This may cause data loss because Master may record a larger flushed 
> sequence Id and some WALEdits maybe skipped during recovery if the region 
> fail again.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to