[jira] [Commented] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success

Ping (JIRA) Wed, 22 Jan 2014 19:10:50 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-9740?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13879403#comment-13879403
 ]


Ping commented on HBASE-9740:
-----------------------------

    Yes, @[~jxiang], you're right for we can't notice it from master's status 
page. on the other way the application which access this table(region) will 
find that. I am thinking of put some warning message on the page too if this is 
fit.
    I followed this: We just consider 0.94 branch, for this branch, if we move 
the region to  FAILED_OPEN state, the AM will assign it again and again which 
BLOCKs cluster balancing, and in our product cluster, we can't even disable the 
table to make a repair(all tools include close_region/hbck repair/disable 
table/... NOT usable). so we think we can make this region offline for advanced 
repair or maintain.
    @[~adityakishore], thanks for your help, I will check my code style and fix 
it. and resubmit the patch if you and jimmy agree with it.

> A corrupt HFile could cause endless attempts to assign the region without a 
> chance of success
> ---------------------------------------------------------------------------------------------
>
>                 Key: HBASE-9740
>                 URL: https://issues.apache.org/jira/browse/HBASE-9740
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.16
>            Reporter: Aditya Kishore
>            Assignee: Aditya Kishore
>         Attachments: patch-9740_0.94.txt
>
>
> As described in HBASE-9737, a corrupt HFile in a region could lead to an 
> assignment storm in the cluster since the Master will keep trying to assign 
> the region to each region server one after another and obviously none will 
> succeed.
> The region server, upon detecting such a scenario should mark the region as 
> "RS_ZK_REGION_FAILED_ERROR" (or something to the effect) in the Zookeeper 
> which should indicate the Master to stop assigning the region until the error 
> has been resolved (via an HBase shell command, probably "assign"?)



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

[jira] [Commented] (HBASE-9740) A corrupt HFile could cause endless attempts to assign the region without a chance of success

Reply via email to