[ 
https://issues.apache.org/jira/browse/HBASE-8105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13603957#comment-13603957
 ] 

philo vivero commented on HBASE-8105:
-------------------------------------

It occurs to me that I might not have understood the first question: "is the RS 
still running?" The RS process is still running on the node, but the logs seem 
to indicate that it's not doing anything until the restart.

Perhaps a good re-wording of this would be "RegionServer Process Doesn't Die on 
Abnormal Loss of Network Connectivity to the Cluster"? But then maybe this is 
considered normal (though I'd expect something in the logs along the lines of 
"Ceasing normal activity, but keeping process alive [for whatever reason]."

It seems the advice to move the discussion to the mailing list would be apropos 
if RegionServer process staying alive under this circumstance is normal.
                
> RegionServer Doesn't Rejoin Cluster after Netsplit
> --------------------------------------------------
>
>                 Key: HBASE-8105
>                 URL: https://issues.apache.org/jira/browse/HBASE-8105
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>    Affects Versions: 0.92.1
>         Environment: Linux Ubuntu 10.04 LTS
>            Reporter: philo vivero
>
> Running a 15-node HBase cluster. Testing various failure scenarios. Segregate 
> one RegionServer from the cluster by firewalling off every port except SSH 
> (because we need to be able to re-enable the node later).
> After the RS is automatically removed from the cluster, we re-enable all 
> ports again, but RS never rejoins the cluster.
> I suspect the possibility this is desired behaviour, but haven't found proof 
> so far. The code doesn't have any comment indicating this is the behaviour 
> desired:
> http://grepcode.com/file/repo1.maven.org/maven2/org.apache.hbase/hbase/0.92.2/org/apache/hadoop/hbase/regionserver/HRegionServer.java/
> See lines starting at 624, public void run(). It makes it through the first 
> try/catch block, but then loops inside the second try/catch block. Our 
> hypothesis is that it never gets out naturally.
> If we bounce the RegionServer process, then it rejoins the cluster.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to