[ 
https://issues.apache.org/jira/browse/HBASE-3445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-3445:
-------------------------

    Attachment: 3445-v2.txt

Here is test that manufactures condition James sees.  His patch fixes it (I 
just added DEBUG logging to his patch).  I'm going to commit though I'm not 
going to include my test because of HBASE-3456 "Fix hardcoding of 20 second 
socket timeout down in HBaseClient                                          
hbase-issues".  I don't want to add gratuitous 20 second wait to our test suite 
(not that anyone would notice the extra 20 seconds on top of an hour-plus 
suite).                        

> Master crashes on data that was moved from different host
> ---------------------------------------------------------
>
>                 Key: HBASE-3445
>                 URL: https://issues.apache.org/jira/browse/HBASE-3445
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.90.0
>            Reporter: James Kennedy
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.90.1
>
>         Attachments: 3445-refactor.txt, 3445-v2.txt, 3445_0.90.0.patch
>
>
> While testing an upgrade to 0.90.0 RC3 I noticed that if I seeded our test 
> data on one machine and transferred to another machine the HMaster on the new 
> machine dies on startup.
> Based on the following stack trace it looks as though it is attempting to 
> find the .meta region with the ip address of the original machine.  Instead 
> of waiting around for RegionServer's to register with new location data, 
> HMaster throws it's hands up with a FATAL exception.
> Note that deleting the zookeeper dir makes no difference.
> Also note that so far I have only reproduced this in my own environment using 
> the hbase-trx extension of HBase and an ApplicationStarter that starts the 
> Master and RegionServer together in the same JVM.  While the issue seems 
> likely isolated from those factors it is far from a vanilla HBase environment.
> I will spend some time trying to reproduce the issue in a proper hbase test.  
> But perhaps someone can beat me to it?  How do I simulate the IP switch? May 
> require a data.tar upload. 
> [14/01/11 10:45:20] 6396   [     Thread-298] ERROR 
> server.quorum.QuorumPeerConfig  - Invalid configuration, only one server 
> specified (ignoring)
> [14/01/11 10:45:21] 7178   [           main] INFO  
> ion.service.HBaseRegionService  - troove> region port:       60010
> [14/01/11 10:45:21] 7180   [           main] INFO  
> ion.service.HBaseRegionService  - troove> region interface:  
> org.apache.hadoop.hbase.ipc.IndexedRegionInterface
> [14/01/11 10:45:21] 7180   [           main] INFO  
> ion.service.HBaseRegionService  - troove> root dir: 
> hdfs://localhost:8701/hbase
> [14/01/11 10:45:21] 7180   [           main] INFO  
> ion.service.HBaseRegionService  - troove> Initializing region server.
> [14/01/11 10:45:21] 7631   [           main] INFO  
> ion.service.HBaseRegionService  - troove> Starting region server thread.
> [14/01/11 10:46:54] 100764 [        HMaster] FATAL 
> he.hadoop.hbase.master.HMaster  - Unhandled exception. Starting shutdown.
> java.net.SocketTimeoutException: 20000 millis timeout while waiting for 
> channel to be ready for connect. ch : 
> java.nio.channels.SocketChannel[connection-pending 
> remote=192.168.1.102/192.168.1.102:60020]
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:213)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:311)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:865)
>       at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:732)
>       at 
> org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:258)
>       at $Proxy14.getProtocolVersion(Unknown Source)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:419)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:393)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.getProxy(HBaseRPC.java:444)
>       at org.apache.hadoop.hbase.ipc.HBaseRPC.waitForProxy(HBaseRPC.java:349)
>       at 
> org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getHRegionConnection(HConnectionManager.java:954)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getCachedConnection(CatalogTracker.java:384)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.getMetaServerConnection(CatalogTracker.java:283)
>       at 
> org.apache.hadoop.hbase.catalog.CatalogTracker.verifyMetaRegionLocation(CatalogTracker.java:478)
>       at 
> org.apache.hadoop.hbase.master.HMaster.assignRootAndMeta(HMaster.java:435)
>       at 
> org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:382)
>       at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:277)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to