[ 
https://issues.apache.org/jira/browse/HBASE-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016198#comment-13016198
 ] 

Jean-Daniel Cryans commented on HBASE-3478:
-------------------------------------------

This situation happened to someone on the mailing list.

> HBase fails to recover from failed DNS resolution of stale meta connection 
> info
> -------------------------------------------------------------------------------
>
>                 Key: HBASE-3478
>                 URL: https://issues.apache.org/jira/browse/HBASE-3478
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.1
>            Reporter: James Kennedy
>             Fix For: 0.92.0
>
>
> This looks like a variant of HBASE-3445:
> One of our developers ran a seed program with configuration A to generate 
> some test data on his local machine. He then moved that data into a 
> development environment on the same machine with a different hbase 
> configuration B.
> On startup the HMaster waits for new regionserver to register itself:
> [25/01/11 15:37:25] 162161 [  HRegionServer] INFO  
> ase.regionserver.HRegionServer  - Telling master at 10.0.1.4:7801 that we are 
> up
> [25/01/11 15:37:25] 162165 [ice-EventThread] DEBUG 
> .hadoop.hbase.zookeeper.ZKUtil  - master:7801-0x12dbf879abe0000 Retrieved 13 
> byte(s) of data from znode /hbase/rs/10.0.1.4,7802,1295998613814 and set 
> watcher; 10.0.1.4:7802
> Then ROOT region comes online at the right place: 10.0.1.4,7802
> [25/01/11 15:37:31] 168369 [yTasks:70236052] INFO  
> ase.catalog.RootLocationEditor  - Setting ROOT region location in ZooKeeper 
> as 10.0.1.4:7802
> 3:57 [25/01/11 15:37:31] 168408 [10.0.1.4:7801-0] DEBUG 
> er.handler.OpenedRegionHandler  - Opened region -ROOT-,,0.70236052 on 
> 10.0.1.4,7802,1295998613814
> But then HMaster chokes on the stale META region location.
> [25/01/11 15:37:31] 168448 [        HMaster] ERROR 
> he.hadoop.hbase.HServerAddress  - Could not resolve the DNS name of 
> warren:60020
> [25/01/11 15:37:31] 168448 [        HMaster] FATAL 
> he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.lang.IllegalArgumentException: Could not resolve the DNS name of 
> warren:60020
>    at 
> org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105)
>    at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66)
>    at 
> org.apache.hadoop.hbase.catalog.MetaReader.readLocation(MetaReader.java:344)
>    at 
> org.apache.hadoop.hbase.catalog.MetaReader.readMetaLocation(MetaReader.java:281)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:280)
>    at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:482)
>    at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
>    at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
>    at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)
>    at java.lang.Thread.run(Thread.java:680)
> First of all, we do not yet understand why in configuration A the RegionInfo 
> resolved to "warren:60020" whereas in configuration B we get "10.0.1.4:7802". 
>  The port numbers make sense but not the "warren" hostname. It's probably 
> specific to Warren's mac environment somehow because no other developer gets 
> this problem when doing the same thing.  "warren" isn't in his hosts file so 
> that remains a mystery.
> But irrespective of that, since the ports differ we expect the stale meta 
> connection data to cause connection failure anyway.  Perhaps in the form of a 
> SocketTimeoutException as in hbase-3445.
> But shouldn't the HMaster handle that by catching the exception and letting 
> verifyMetaRegionLocation() fail so that meta regions get reassigned to the 
> new region server?
> Probably the safeguards in CatalogTracker.getCachedConnection() should move 
> up to getMetaServerConnection() so as to encompass 
> MetaReader.readMetaLocation() also. Essentially if getMetaServerConnection() 
> encounters ANY exception connection to meta RegionServer it should probably 
> just return null to force meta region reassignment.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to