thangTang commented on a change in pull request #1071: HBASE-23693 Split 
failure may cause region hole and data loss when use zk assign
URL: https://github.com/apache/hbase/pull/1071#discussion_r368797626
 
 

 ##########
 File path: 
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
 ##########
 @@ -783,10 +801,20 @@ public void regionOffline(
             LOG.info("Found region in " + state +
               " to be reassigned by ServerCrashProcedure for " + sn);
             rits.add(hri);
-          } else if(state.isSplittingNew() || state.isMergingNew()) {
-            LOG.info("Offline/Cleanup region if no meta entry exists, hri: " + 
hri +
-                " state: " + state);
-            regionsToClean.add(state.getRegion());
+          } else if (state.isSplittingNew() || state.isMergingNew()) {
+            LOG.info(
+              "Offline/Cleanup region if no meta entry exists, hri: " + hri + 
" state: " + state);
+            if (daughter2Parent.containsKey(hri.getEncodedName())) {
+              HRegionInfo parent = daughter2Parent.get(hri.getEncodedName());
+              HRegionInfo info = getHRIFromMeta(parent);
+              if (info != null && info.isSplit() && info.isOffline()) {
+                regionsToClean.add(Pair.newPair(state.getRegion(), info));
 
 Review comment:
   > > If the RS machine crash when the SplitTransactionImpl step after PONR, 
master will handle the split rollback.
   > 
   > Yeah, but it seems to work only because Master has different info for the 
region than what is actual in meta. What if Active Master also crashes before 
it triggers SCP and daughters get cleaned up? The other master will load 
regions state as seen in meta, then it can run into same problem again.
   
   daughter cleanup is excuted in SCP, and in this patch it will do three 
things together:
   
   1. delete daughter region in meta
   2. update parent region in meta
   3. delete daughter region dir in HDFS
   ```
          if (regionPair != null) {
               MetaTableAccessor.deleteRegion(this.server.getConnection(), 
hri);                    
MetaTableAccessor.deleteRegion(this.server.getConnection(), hri);
             }            }
             if (parentInfo != null) {
               List<Mutation> mutations = new ArrayList<Mutation>();
               HRegionInfo copyOfParent = new HRegionInfo(parentInfo);
               copyOfParent.setOffline(false);
               copyOfParent.setSplit(false);
               Put putParent = 
MetaTableAccessor.makePutFromRegionInfo(copyOfParent);
               mutations.add(putParent);
               MetaTableAccessor.mutateMetaTable(this.server.getConnection(), 
mutations);
             }
             LOG.debug("Cleaning up HDFS since no meta entry exists, hri: " + 
hri);               LOG.debug("Cleaning up HDFS since no meta entry exists, 
hri: " + hri);
             FSUtils.deleteRegionDir(server.getConfiguration(), hri);           
  FSUtils.deleteRegionDir(server.getConfiguration(), hri);
   ```
   
   So if Active Master also crashes before it triggers SCP, the daughter won`t 
be deleted.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to