[ https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879403#comment-13879403 ]
Ping commented on HBASE-9740: ----------------------------- Yes, @[~jxiang], you're right for we can't notice it from master's status page. on the other way the application which access this table(region) will find that. I am thinking of put some warning message on the page too if this is fit. I followed this: We just consider 0.94 branch, for this branch, if we move the region to FAILED_OPEN state, the AM will assign it again and again which BLOCKs cluster balancing, and in our product cluster, we can't even disable the table to make a repair(all tools include close_region/hbck repair/disable table/... NOT usable). so we think we can make this region offline for advanced repair or maintain. @[~adityakishore], thanks for your help, I will check my code style and fix it. and resubmit the patch if you and jimmy agree with it. > A corrupt HFile could cause endless attempts to assign the region without a > chance of success > --------------------------------------------------------------------------------------------- > > Key: HBASE-9740 > URL: https://issues.apache.org/jira/browse/HBASE-9740 > Project: HBase > Issue Type: Bug > Affects Versions: 0.94.16 > Reporter: Aditya Kishore > Assignee: Aditya Kishore > Attachments: patch-9740_0.94.txt > > > As described in HBASE-9737, a corrupt HFile in a region could lead to an > assignment storm in the cluster since the Master will keep trying to assign > the region to each region server one after another and obviously none will > succeed. > The region server, upon detecting such a scenario should mark the region as > "RS_ZK_REGION_FAILED_ERROR" (or something to the effect) in the Zookeeper > which should indicate the Master to stop assigning the region until the error > has been resolved (via an HBase shell command, probably "assign"?) -- This message was sent by Atlassian JIRA (v6.1.5#6160)