[ 
https://issues.apache.org/jira/browse/HBASE-4168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Anirudh Todi updated HBASE-4168:
--------------------------------

    Attachment: HBASE-4168(2).patch

@Ted - the patch I submitted is from the open-source trunk which I checked out 
here - https://svn.apache.org/repos/asf/hbase/trunk/

I see your source of confusion now. In trunk's CatalogTracker, line 469 is:

{noformat}
} else if (cause != null && cause.getMessage() != null
{noformat}

the internal branch had:

{noformat}
} else if (cause.getMessage() != null)
{noformat}

and when I conducted Experiment-4 using the internal branch, cause turned out 
to be null - and I received a NullPointerException at that line

However, would it still be better to return false and retry connecting to the 
META instead of throwing an exception there?
I have uploaded a new patch in which I am handling the IOException unwrapped 
from RemoteException in a similar manner.

> A client continues to try and connect to a powered down regionserver
> --------------------------------------------------------------------
>
>                 Key: HBASE-4168
>                 URL: https://issues.apache.org/jira/browse/HBASE-4168
>             Project: HBase
>          Issue Type: Bug
>            Reporter: Anirudh Todi
>            Assignee: Anirudh Todi
>            Priority: Minor
>         Attachments: HBASE-4168(2).patch, HBASE-4168-revised.patch, 
> HBASE-4168.patch, hbase-hadoop-master-msgstore232.snc4.facebook.com.log
>
>
> Experiment-1
> Started a dev cluster - META is on the same regionserver as my key-value. I 
> kill the regionserver process but donot power down the machine.
> The META is able to migrate to a new regionserver and the regions are also 
> able to reopen elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-2
> Started a dev cluster - META is on a different regionserver as my key-value. 
> I kill the regionserver process but donot power down the machine.
> The META remains where it is and the regions are also able to reopen 
> elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-3
> Started a dev cluster - META is on a different regionserver as my key-value. 
> I power down the machine hosting this regionserver.
> The META remains where it is and the regions are also able to reopen 
> elsewhere.
> The client is able to talk to the META and find the new kv location and get 
> it.
> Experiment-4 (This is the problematic one)
> Started a dev cluster - META is on the same regionserver as my key-value. I 
> power down the machine hosting this regionserver.
> The META is able to migrate to a new regionserver - however - it takes a 
> really long time (~30 minutes)
> The regions on that regionserver DONOT reopen (I waited for 1 hour)
> The client is able to find the new location of the META, however, the META 
> keeps redirecting the client to powered down
> regionserver as the location of the key-value it is trying to get. Thus the 
> client's get is unsuccessful.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to