[ https://issues.apache.org/jira/browse/HBASE-5196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13185774#comment-13185774 ]
jirapos...@reviews.apache.org commented on HBASE-5196: ------------------------------------------------------ bq. On 2012-01-13 19:18:26, Michael Stack wrote: bq. > +1 on patch so far. In issue when you say 'if master does not get a chance to fix it', when is that? Doesn't master do it when it comes on line? Good stuff Jimmy. There are only 3 threads to do the clean up. If there are lots of (most in the cluster) region servers died, the shutdown handler may stuck in log splitting for quite sometime. During this period, if the master died somehow, it won't be able to finish the clean up. In my case, I ran testLoadAndVerify and it brings the HDFS down to knee. So I restart the cluster and end up with lots of holes in the region chain. - Jimmy ----------------------------------------------------------- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/3488/#review4363 ----------------------------------------------------------- On 2012-01-13 19:11:36, Jimmy Xiang wrote: bq. bq. ----------------------------------------------------------- bq. This is an automatically generated e-mail. To reply, visit: bq. https://reviews.apache.org/r/3488/ bq. ----------------------------------------------------------- bq. bq. (Updated 2012-01-13 19:11:36) bq. bq. bq. Review request for hbase. bq. bq. bq. Summary bq. ------- bq. bq. When the master starts up, this patch tries to scan all offline split parents and fix up missing daughters as the ServerShutdownHandler does. bq. bq. bq. This addresses bug HBASE-5196. bq. https://issues.apache.org/jira/browse/HBASE-5196 bq. bq. bq. Diffs bq. ----- bq. bq. src/main/java/org/apache/hadoop/hbase/master/HMaster.java cb2f084 bq. src/main/java/org/apache/hadoop/hbase/master/handler/ServerShutdownHandler.java 8f4f4b8 bq. src/main/java/org/apache/hadoop/hbase/regionserver/SplitRequest.java 41f5dff bq. bq. Diff: https://reviews.apache.org/r/3488/diff bq. bq. bq. Testing bq. ------- bq. bq. I test the fix in my real cluster and it does fix the problem. bq. bq. I am working on a unit test now. bq. bq. bq. Thanks, bq. bq. Jimmy bq. bq. > Failure in region split after PONR could cause region hole > ---------------------------------------------------------- > > Key: HBASE-5196 > URL: https://issues.apache.org/jira/browse/HBASE-5196 > Project: HBase > Issue Type: Bug > Components: master, regionserver > Affects Versions: 0.92.0, 0.94.0 > Reporter: Jimmy Xiang > Assignee: Jimmy Xiang > > If region split fails after PONR, it relies on the master ServerShutdown > handler to fix it. However, if the master doesn't get a chance to fix it. > There will be a hole in the region chain. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira