[ 
https://issues.apache.org/jira/browse/HBASE-10271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13863395#comment-13863395
 ] 

Jean-Daniel Cryans commented on HBASE-10271:
--------------------------------------------

bq. Any update?

I'm waiting to see if there's interest in further building up the patch. Sergey 
seems ok with the process although seem to think we should further prove that 
it's necessary to have two methods to expire region servers. Lars H. had 
questions I answered but I didn't see interest.

IMO we can safely revert HBASE-9593 from all the branches and spend more time 
on a proper fix.

> [regression] Cannot use the wildcard address since HBASE-9593
> -------------------------------------------------------------
>
>                 Key: HBASE-10271
>                 URL: https://issues.apache.org/jira/browse/HBASE-10271
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.98.0, 0.94.13, 0.96.1
>            Reporter: Jean-Daniel Cryans
>            Priority: Critical
>             Fix For: 0.98.0, 0.94.16, 0.96.2, 0.99.0
>
>         Attachments: HBASE-10271.patch
>
>
> HBASE-9593 moved the creation of the ephemeral znode earlier in the region 
> server startup process such that we don't have access to the ServerName from 
> the Master's POV. HRS.getMyEphemeralNodePath() calls HRS.getServerName() 
> which at that point will return this.isa.getHostName(). If you set 
> hbase.regionserver.ipc.address to 0.0.0.0, you will create a znode with that 
> address.
> What happens next is that the RS will report for duty correctly but the 
> master will do this:
> {noformat}
> 2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] 
> master.ServerManager: Registering server=0:0:0:0:0:0:0:0%0,60020,1388691892014
> 2014-01-02 11:45:49,498 INFO  [master:172.21.3.117:60000] master.HMaster: 
> Registered server found up in zk but who has not yet reported in: 
> 0:0:0:0:0:0:0:0%0,60020,1388691892014
> {noformat}
> The cluster is then unusable.
> I think a better solution is to track the heartbeats for the region servers 
> and expire those that haven't checked-in for some time. The 0.89-fb branch 
> has this concept, and they also use it to detect rack failures: 
> https://github.com/apache/hbase/blob/0.89-fb/src/main/java/org/apache/hadoop/hbase/master/ServerManager.java#L1224.
>  In this jira's scope I would just add the heartbeat tracking and add a unit 
> test for the wildcard address.
> What do you think [~rajesh23]?



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to