apurtell edited a comment on pull request #2574:
URL: https://github.com/apache/hbase/pull/2574#issuecomment-718875966


   @Apache9 I am having trouble responding to your comment above. 
   
   > After reading this discussion several times, I think the reason we do not 
want to niterrupte a WAL sync is that it may lead to a region server abort?
   > I would say this is not the case here. I checked the code again, the 
actual sync is done in the disruptor thread, in the rpc thread we just block on 
a SyncFuture(as Andrew mentioned above), the interruption on the rpc thread 
will just lead to an IOException tp client, the actual sync operation will not 
be interrupted so we are safe.
   > So I do not think we need to disable interrupts here?
   
   You are correct about the SyncFuture.
   
   My concern is different. I am concerned about this case:
   - We start the WAL append
   - SyncFuture is interrupted, client gets an exception, client thinks the 
mutation failed
   - WAL sync completes, mutation is in WAL, mutation is applied (e.g. during 
WAL replay for some reason)
   
   So we have a case where the client's understanding of what happened is 
incorrect. I would prefer to prevent interrupts from messing with our ability 
to return success indication to the client once we are far enough along the 
mutation code path that we are writing to the WAL. At that point it is a point 
of no return, what happens in the WAL and the client's understanding of what 
happend should be in sync. If the WAL sync fails the client should get an 
exception. If the WAL sync succeeds the client should see success. Exception 
would be the case where the server is shutting down, or the existing timeout 
case, those we can't control. We can control when interrupts happen, so we can 
be considerate. disableInterrupt/enableInterrupt is consideration for the 
client.
   
   I can add the above as a code comment next to the disableInterrupt() call, 
would that help?
   
   Here is another case of concern:
   
   - We start the WAL append
   - SyncFuture is interrupted, memstore is not updated, client gets an 
exception. Unless the WAL is replayed the local cluster will not have the 
mutation applied. 
   - WAL was written, though, so change will be replicated to a remote cluster. 
Remote cluster applies the edit. Remote cluster is out of sync with local 
cluster.
   
   The disableInterrupt/enableInterrupt pair combines the WAL append and 
memstore update into a single protected operation with respect to this proposed 
change.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to