[ 
https://issues.apache.org/jira/browse/HBASE-5844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13464093#comment-13464093
 ] 

nkeywal commented on HBASE-5844:
--------------------------------

It's strange, I didn't reproduce it. I should, because it seems logical. Will 
look into it and create jiras.
Anyway, there are bugs around this scenario. For example, when it fails we now 
have a new pid file, but this pid does not match the process. This is true in 
0.90 as well. If there is no process, the error for the stop (in 0.96) will be 
??no regionserver to stop because kill -0 of pid 49938 failed with status 1??. 
If another process took this id (yes it should not happen often), the kill will 
succeed.
                
> Delete the region servers znode after a regions server crash
> ------------------------------------------------------------
>
>                 Key: HBASE-5844
>                 URL: https://issues.apache.org/jira/browse/HBASE-5844
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver, scripts
>    Affects Versions: 0.96.0
>            Reporter: nkeywal
>            Assignee: nkeywal
>             Fix For: 0.96.0
>
>         Attachments: 5844.v1.patch, 5844.v2.patch, 5844.v3.patch, 
> 5844.v3.patch, 5844.v4.patch
>
>
> today, if the regions server crashes, its znode is not deleted in ZooKeeper. 
> So the recovery process will stop only after a timeout, usually 30s.
> By deleting the znode in start script, we remove this delay and the recovery 
> starts immediately.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to