[ 
https://issues.apache.org/jira/browse/HADOOP-2660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560641#action_12560641
 ] 

Billy Pearson commented on HADOOP-2660:
---------------------------------------

I thank that of two options that would help solve this problem and might need 
to use both 

option 1
build in a backlog limit on how many pending opens we can have in any one 
region server before stop accepting new opens.  
example finding the maximum sequence id for a region takes a lot less time then 
doing a recovery to a region. So its que would fill up faster making the master 
send some open request to different servers while this one catches up or loop 
until one of the region servers has open slots in it pending open que. I thank 
60 secs is the default loop time so they should be able to hand 10 pending 
opens or something like that many be make it an option limit in the conf.

option 2

1.Confirm we received the masters open request once we received it

Once confirmed master should not reassign the region to any other region server 
unless the region server goes off line and loses it lease

2 Confirm the open of the region success or failed

The master can make sure the region server is still alive by keeping up with 
heartbeat

> Regions getting messages from master to MSG_REGION_CLOSE_WITHOUT_REPORT
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-2660
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2660
>             Project: Hadoop
>          Issue Type: Bug
>          Components: contrib/hbase
>            Reporter: Billy Pearson
>             Fix For: 0.16.0
>
>
> I thank we addressed this here
> HADOOP-2295
> but I have found it showing up again
> my hlog size is set to 250,000
> so on a recovery from a failed region server the recovery of scanning the 
> logs takes longer then the 
> hbase.hbasemaster.maxregionopen default of 30 secs
> and the master is thinks the region is open but the region server closes the 
> region when done recovering becuase the master sent a 
> MSG_REGION_CLOSE_WITHOUT_REPORT to the region server.
> I was able to get my table back online completely by adding 
> hbase.hbasemaster.maxregionopen  with a value of 300000 mili secs to my 
> hbase-site.xml file
> and restart.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to