[ https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13791174#comment-13791174 ]
ramkrishna.s.vasudevan commented on HBASE-9740: ----------------------------------------------- @Aditya HBASE-9522 is a similar issue. We have a patch for that. Anyway that patch does not deal with AM states but collects those HFiles seperately just logs a message. The intention was if one of the HFile is corrupted over a bunch of HFiles associated with that region, atleast the region opening could happen with that. (at the cost of data loss). This will be a configuration. I can attach a patch with some rebase. > A corrupt HFile could cause endless attempts to assign the region without a > chance of success > --------------------------------------------------------------------------------------------- > > Key: HBASE-9740 > URL: https://issues.apache.org/jira/browse/HBASE-9740 > Project: HBase > Issue Type: Bug > Reporter: Aditya Kishore > Assignee: Aditya Kishore > > As described in HBASE-9737, a corrupt HFile in a region could lead to an > assignment storm in the cluster since the Master will keep trying to assign > the region to each region server one after another and obviously none will > succeed. > The region server, upon detecting such a scenario should mark the region as > "RS_ZK_REGION_FAILED_ERROR" (or something to the effect) in the Zookeeper > which should indicate the Master to stop assigning the region until the error > has been resolved (via an HBase shell command, probably "assign"?) -- This message was sent by Atlassian JIRA (v6.1#6144)