[ https://issues.apache.org/jira/browse/HBASE-3478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016198#comment-13016198 ]
Jean-Daniel Cryans commented on HBASE-3478: ------------------------------------------- This situation happened to someone on the mailing list. > HBase fails to recover from failed DNS resolution of stale meta connection > info > ------------------------------------------------------------------------------- > > Key: HBASE-3478 > URL: https://issues.apache.org/jira/browse/HBASE-3478 > Project: HBase > Issue Type: Bug > Components: master > Affects Versions: 0.90.1 > Reporter: James Kennedy > Fix For: 0.92.0 > > > This looks like a variant of HBASE-3445: > One of our developers ran a seed program with configuration A to generate > some test data on his local machine. He then moved that data into a > development environment on the same machine with a different hbase > configuration B. > On startup the HMaster waits for new regionserver to register itself: > [25/01/11 15:37:25] 162161 [ HRegionServer] INFO > ase.regionserver.HRegionServer - Telling master at 10.0.1.4:7801 that we are > up > [25/01/11 15:37:25] 162165 [ice-EventThread] DEBUG > .hadoop.hbase.zookeeper.ZKUtil - master:7801-0x12dbf879abe0000 Retrieved 13 > byte(s) of data from znode /hbase/rs/10.0.1.4,7802,1295998613814 and set > watcher; 10.0.1.4:7802 > Then ROOT region comes online at the right place: 10.0.1.4,7802 > [25/01/11 15:37:31] 168369 [yTasks:70236052] INFO > ase.catalog.RootLocationEditor - Setting ROOT region location in ZooKeeper > as 10.0.1.4:7802 > 3:57 [25/01/11 15:37:31] 168408 [10.0.1.4:7801-0] DEBUG > er.handler.OpenedRegionHandler - Opened region -ROOT-,,0.70236052 on > 10.0.1.4,7802,1295998613814 > But then HMaster chokes on the stale META region location. > [25/01/11 15:37:31] 168448 [ HMaster] ERROR > he.hadoop.hbase.HServerAddress - Could not resolve the DNS name of > warren:60020 > [25/01/11 15:37:31] 168448 [ HMaster] FATAL > he.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown. > java.lang.IllegalArgumentException: Could not resolve the DNS name of > warren:60020 > at > org.apache.hadoop.hbase.HServerAddress.checkBindAddressCanBeResolved(HServerAddress.java:105) > at org.apache.hadoop.hbase.HServerAddress.<init>(HServerAddress.java:66) > at > org.apache.hadoop.hbase.catalog.MetaReader.readLocation(MetaReader.java:344) > at > org.apache.hadoop.hbase.catalog.MetaReader.readMetaLocation(MetaReader.java:281) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:280) > at > org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:482) > at > org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435) > at > org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382) > at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277) > at java.lang.Thread.run(Thread.java:680) > First of all, we do not yet understand why in configuration A the RegionInfo > resolved to "warren:60020" whereas in configuration B we get "10.0.1.4:7802". > The port numbers make sense but not the "warren" hostname. It's probably > specific to Warren's mac environment somehow because no other developer gets > this problem when doing the same thing. "warren" isn't in his hosts file so > that remains a mystery. > But irrespective of that, since the ports differ we expect the stale meta > connection data to cause connection failure anyway. Perhaps in the form of a > SocketTimeoutException as in hbase-3445. > But shouldn't the HMaster handle that by catching the exception and letting > verifyMetaRegionLocation() fail so that meta regions get reassigned to the > new region server? > Probably the safeguards in CatalogTracker.getCachedConnection() should move > up to getMetaServerConnection() so as to encompass > MetaReader.readMetaLocation() also. Essentially if getMetaServerConnection() > encounters ANY exception connection to meta RegionServer it should probably > just return null to force meta region reassignment. -- This message is automatically generated by JIRA. For more information on JIRA, see: http://www.atlassian.com/software/jira