apurtell commented on a change in pull request #2574:
URL: https://github.com/apache/hbase/pull/2574#discussion_r512965703



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
##########
@@ -4569,13 +4665,29 @@ private void doMiniBatchMutate(BatchOperation<?> 
batchOp) throws IOException {
       // We should record the timestamp only after we have acquired the 
rowLock,
       // otherwise, newer puts/deletes/increment/append are not guaranteed to 
have a newer
       // timestamp
+
+      // Check for thread interrupt status in case we have been signaled from
+      // #interruptRegionOperation.
+      checkInterrupt();
+
       long now = EnvironmentEdgeManager.currentTime();
       batchOp.prepareMiniBatchOperations(miniBatchOp, now, acquiredRowLocks);
 
       // STEP 3. Build WAL edit
+
+      // Check for thread interrupt status in case we have been signaled from
+      // #interruptRegionOperation.
+      checkInterrupt();
+
       List<Pair<NonceKey, WALEdit>> walEdits = 
batchOp.buildWALEdits(miniBatchOp);
 
       // STEP 4. Append the WALEdits to WAL and sync.
+
+      // Check for thread interrupt status in case we have been signaled from
+      // #interruptRegionOperation. This is the last place we can do it 
"safely" before
+      // WAL appends.
+      checkInterrupt();
+

Review comment:
       Since we are on the topic of WAL syncs....
   
   We could protect against interrupting a sync using a couple of different 
approaches. The simplest in the context of this proposed change would be to 
remove the current handler thread from the regionLockHolders set whenever we 
decide interrupting it is no longer the best choice, like a point of no return 
on the mutation code path. 
   
       startRegionOperation(Operation.PUT); // add thread to the set so it is 
interruptible
       try {
           ...
           startWALOperation(); // remove thread from the set so it will not be 
interrupted
           try {
               ...
           } finally {
               endWALOperation(); // add thread to the set so it is eligible to 
be interrupted again
           }
           ...
       } finally {
           endRegionOperation();  // remove thread from the set as it is no 
longer actively handling RPC
       }
   
   However we already have issues with sync timeouts or regionserver aborts 
during sync such that we are not introducing anything new here. In other words, 
if it is problematic that a WAL sync can be aborted before it completes, this 
already happens for other reasons, like timeouts (presumably because HDFS is 
slow), or HDFS level  pipeline recovery failure, or server shutdown. 
   
   Thinking about this though makes me think of the English expression "opening 
a can of worms". I don't know of a Chinese equivalent. I'm going to implement 
the proposal above out of an abundance of caution and we can decide upon 
further review if it makes sense.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to