bharathv commented on a change in pull request #1755:
URL: https://github.com/apache/hbase/pull/1755#discussion_r432192279



##########
File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
##########
@@ -2079,16 +2085,29 @@ private void unassign(final HRegionInfo region,
         }
 
         if (logRetries) {
-          LOG.info("Server " + server + " returned " + t + " for "
-            + region.getRegionNameAsString() + ", try=" + i
-            + " of " + this.maximumAttempts, t);
+          LOG.info("Server " + server + " returned " + t + " for " + 
region.getRegionNameAsString()
+              + ", try=" + i + " of " + this.maximumAttempts,
+            t);
           // Presume retry or server will expire.
         }
       }
     }
-    // Run out of attempts
-    if (state != null) {
-      regionStates.updateRegionState(region, State.FAILED_CLOSE);
+
+    long sleepTime = backoffPolicy.getBackoffTime(retryConfig,

Review comment:
       Right, so my point is this. If there is an exception in 
sendRegionClose() (say ServerTooBusy), we enter the catch() block in L2024, we 
don't match the if / else if in L2031 (failed server) / 2038 (region not 
serving) and that leaves us with just looping for "maximumAttempts" quickly in 
succession without a backoff (this is the problem we are trying to address). My 
point was why not do something like this.
   
   ```
    if (t instanceof RegionServerAbortedException || t instanceof 
RegionServerStoppedException
               || t instanceof RegionServerStoppedException     
               || t instanceof ServerNotRunningYetException) {          
   ......
   } else if ((t instanceof FailedServerException)....) {
   .....
   } else {
    // Handle any other remote exception
    // use your exponential backoff algorithm...
   } 
   ```
   
   Basically we have the loop for maxAttempts just that it is not using an 
exponential backoff. Just plug it in an else {} block and we are good? What is 
advantage of doing it with a delayed callable, doesn't that just complicate the 
state machine with async requests and book-keeping or is there something that 
I'm missing?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to