The first exception means that it cannot connect to the zookeeper server on 192.168.1.100:36963
The second is "ok", the cluster still starts and everything is fine, but there's something wrong in how we handled the existing state node in that version. I fixed it for 0.20.3 J-D On Wed, Jan 13, 2010 at 5:24 PM, steven zhuang <[email protected]> wrote: > thanks, you guys, > > yes, I removed the root after the Hbase instance was shut > down. Now the topology of the cluster looks like this:\ > > hmaster : 192.168.0.178 > regionserver: 192.168.1.98/100/104 > zookeepers: 192.168.1.98/100/104 > > The instance can now boot with few exceptions. > There was an immediate IOException from ZooKeeper after > the start-hbase.sh command, it looks like this: > > KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss > for /hbase/master > at org.apache...... > > I checked the master log, and the master .out log, it > seems that there is some problem for Master to close the session after > initializing the zookeeper clients, there was always a IOException about > some "Read error rc" after every client session is ended. > And that's just the start, a > KeeperException$ConnectionLossException was thrown. > > After all these exceptions the cluster is started. > Following is the master log: > > 2010-01-13 10:38:25,650 INFO org.apache.zookeeper.ZooKeeper: Initiating > client connection, connectString=192.168.1.104:36963,192.168.1.100:36963, > 192.168.1.98:36963 sessionTimeout=60000 watcher=Thread[Thread-1,5,main] > 2010-01-13 10:38:25,651 INFO org.apache.zookeeper.ClientCnxn: > zookeeper.disableAutoWatchReset is false > 2010-01-13 10:38:25,657 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server /192.168.1.100:36963 > 2010-01-13 10:38:25,658 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected local=/ > 192.168.0.178:56731 remote=/192.168.1.100:36963] > 2010-01-13 10:38:25,667 INFO org.apache.zookeeper.ClientCnxn: Server > connection successful > 2010-01-13 10:38:25,669 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x0 to sun.nio.ch.selectionkeyi...@ba5bdb > java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 > lim=4 cap=4] > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > 2010-01-13 10:38:25,671 WARN org.apache.zookeeper.ClientCnxn: Ignoring > exception during shutdown input > java.net.SocketException: Transport endpoint is not connected > at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > at > sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:640) > at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:999) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) > 2010-01-13 10:38:25,671 WARN org.apache.zookeeper.ClientCnxn: Ignoring > exception during shutdown output > java.net.SocketException: Transport endpoint is not connected > at sun.nio.ch.SocketChannelImpl.shutdown(Native Method) > at > sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:651) > at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368) > at > org.apache.zookeeper.ClientCnxn$SendThread.cleanup(ClientCnxn.java:1004) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:970) > 2010-01-13 10:38:25,673 INFO org.apache.hadoop.hbase.master.RegionManager: > -ROOT- region unset (but not set to be reassigned) > 2010-01-13 10:38:25,674 INFO org.apache.hadoop.hbase.master.RegionManager: > ROOT inserted into regionsInTransition > 2010-01-13 10:38:26,243 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server /192.168.1.104:36963 > 2010-01-13 10:38:26,244 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected local=/ > 192.168.0.178:45372 remote=/192.168.1.104:36963] > 2010-01-13 10:38:26,244 INFO org.apache.zookeeper.ClientCnxn: Server > connection successful > 2010-01-13 10:38:26,248 WARN org.apache.zookeeper.ClientCnxn: Exception > closing session 0x0 to sun.nio.ch.selectionkeyi...@1e7c5cb > java.io.IOException: Read error rc = -1 java.nio.DirectByteBuffer[pos=0 > lim=4 cap=4] > at > org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:701) > at > org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945) > 2010-01-13 10:38:26,251 WARN org.apache.zookeeper.ClientCnxn: Ignoring > exception during shutdown input > java.net.SocketException: Transport endpoint is not connected > ...... > 2010-01-13 10:38:26,251 WARN org.apache.zookeeper.ClientCnxn: Ignoring > exception during shutdown output > java.net.SocketException: Transport endpoint is not connected > ...... > 2010-01-13 10:38:26,353 WARN > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to create /hbase > -- check quorum servers, currently=192.168.1.104:36963,192.168.1.100:36963, > 192.168.1.98:36963 > org.apache.zookeeper.KeeperException$ConnectionLossException: > KeeperErrorCode = ConnectionLoss for /hbase > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:90) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:608) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureExists(ZooKeeperWrapper.java:343) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.ensureParentExists(ZooKeeperWrapper.java:366) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.writeMasterAddress(ZooKeeperWrapper.java:454) > at > org.apache.hadoop.hbase.master.HMaster.writeAddressToZooKeeper(HMaster.java:272) > at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:254) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1218) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1259) > 2010-01-13 10:38:26,383 INFO org.apache.zookeeper.ClientCnxn: Attempting > connection to server /192.168.1.98:36963 > 2010-01-13 10:38:26,384 INFO org.apache.zookeeper.ClientCnxn: Priming > connection to java.nio.channels.SocketChannel[connected local=/ > 192.168.0.178:53192 remote=/192.168.1.98:36963] > 2010-01-13 10:38:26,384 INFO org.apache.zookeeper.ClientCnxn: Server > connection successful > 2010-01-13 10:38:26,419 DEBUG org.apache.hadoop.hbase.master.HMaster: Got > event None with path null > 2010-01-13 10:38:26,423 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master > got 192.168.0.178:60000 > 2010-01-13 10:38:26,423 DEBUG > org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Waiting for master > address ZNode to be deleted and watching the cluster state node > 2010-01-13 10:39:05,032 DEBUG > org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Got event NodeDeleted > with path /hbase/master > 2010-01-13 10:39:05,032 DEBUG > org.apache.hadoop.hbase.master.ZKMasterAddressWatcher: Master address ZNode > deleted, notifying waiting masters > 2010-01-13 10:39:05,092 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Wrote master address > 192.168.0.178:60000 to ZooKeeper > 2010-01-13 10:39:05,096 WARN > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Failed to set state node > in ZooKeeper > org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = > NodeExists for /hbase/shutdown > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:110) > at > org.apache.zookeeper.KeeperException.create(KeeperException.java:42) > at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:608) > at > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper.setClusterState(ZooKeeperWrapper.java:279) > at > org.apache.hadoop.hbase.master.HMaster.writeAddressToZooKeeper(HMaster.java:273) > at org.apache.hadoop.hbase.master.HMaster.<init>(HMaster.java:254) > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27) > at java.lang.reflect.Constructor.newInstance(Constructor.java:513) > at org.apache.hadoop.hbase.master.HMaster.doMain(HMaster.java:1218) > at org.apache.hadoop.hbase.master.HMaster.main(HMaster.java:1259) > 2010-01-13 10:39:05,097 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master > got 192.168.0.178:60000 > 2010-01-13 10:39:05,097 INFO org.apache.hadoop.hbase.master.HMaster: HMaster > initialized on 192.168.0.178:60000 > 2010-01-13 10:39:05,097 DEBUG org.apache.hadoop.hbase.master.HMaster: > Checking cluster state... > 2010-01-13 10:39:05,098 DEBUG > org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode > /hbase/root-region-server got 192.168.1.104:60020 > 2010-01-13 10:39:05,102 DEBUG org.apache.hadoop.hbase.master.HMaster: This > is a fresh start, proceeding with normal startup > 2010-01-13 10:39:05,104 DEBUG org.apache.hadoop.hbase.master.HMaster: No log > files to split, proceeding... > 2010-01-13 10:39:05,106 INFO org.apache.hadoop.metrics.jvm.JvmMetrics: > Initializing JVM Metrics with processName=Master, sessionId=HMaster > 2010-01-13 10:39:05,106 INFO > org.apache.hadoop.hbase.master.metrics.MasterMetrics: Initialized > > > > > > > On Thu, Jan 14, 2010 at 3:57 AM, Andrew Purtell <[email protected]> wrote: > >> Yep, I did that same thing once by accident. >> >> >> >> ----- Original Message ---- >> > From: Jean-Daniel Cryans <[email protected]> >> > To: [email protected] >> > Sent: Wed, January 13, 2010 9:49:06 AM >> > Subject: Re: cannot build a fully distributed mode hbase instance. >> > >> > Don't feel bad, I think we all messed up our first HBase setup. >> > >> > Did you delete /hbase while HBase was running? If so, first shut it >> > down/kill -9, clear out the folder and the the Master will take care >> > of recreating the ROOT and META on restart. >> > >> > J-D >> > >> > On Tue, Jan 12, 2010 at 6:03 PM, steven zhuang wrote: >> > > hi, Jean. >> > > Thanks a lot. >> > > I am really an idiot of Hbase. >> > > I removed the /hbase root directory from HDFS once, hoping it >> > > will rebuild the whole META-regions thing. Then I found the exception >> is >> > > still there every time I use the shell command. >> > > Before all that I am gonna ask, I have one question :"Is it >> OK if >> > > we run hbase shell command on any slave/region server? >> > > I have checked the log, seems the master will request the >> wrong >> > > regionserver for a region it's not servicing: >> > > >> > > 2010-01-12 20:25:11,996 INFO org.apache.hadoop.ipc.HBaseServer: IPC >> Server >> > > handler 3 on 60020, call getRegionInfo([...@dc9766) from >> 192.168.1.98:55351: >> > > error: org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 >> > > org.apache.hadoop.hbase.NotServingRegionException: -ROOT-,,0 >> > > at >> > > >> > >> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2309) >> > > >> > > I am still analyzing the master log, for the most recent >> start, >> > > there seems no exception records in the log. >> > > >> > > >> > > >> > > >> > > On Wed, Jan 13, 2010 at 9:39 AM, Jean-Daniel Cryans >> > wrote: >> > > >> > >> It seems it found the ROOT region but META wasn't assigned. Either you >> > >> didn't wait enough after starting hbase or you should look at the >> > >> master's log for the reason why that region wasn't assigned. >> > >> >> > >> J-D >> > >> >> > >> On Tue, Jan 12, 2010 at 5:36 PM, steven zhuang >> > >> wrote: >> > >> > That's done, thanks, Jean. >> > >> > >> > >> > But now there is another problem. Now I can start the >> cluster >> > >> > without any exception(good!), but at any node, when I run >> list/create, I >> > >> > always get this exception, although afterwards I checked the table >> is >> > >> > created. >> > >> > >> > >> > 10/01/12 20:25:16 DEBUG client.HConnectionManager$TableServers: >> Found >> > >> ROOT >> > >> > at 192.168.1.104:60020 >> > >> > 10/01/12 20:25:16 DEBUG client.HConnectionManager$TableServers: >> > >> > locateRegionInMeta attempt 0 of 5 failed; retrying after sleep of >> 2000 >> > >> > org.apache.hadoop.hbase.client.NoServerForRegionException: No server >> > >> address >> > >> > listed in -ROOT- for region .META.,,1 >> > >> > at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:668) >> > >> > at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:590) >> > >> > at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.relocateRegion(HConnectionManager.java:563) >> > >> > at >> > >> > >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getRegionLocation(HConnectionManager.java:407) >> > >> > >> > >> > >> > >> > >> > >> > On Wed, Jan 13, 2010 at 8:57 AM, Jean-Daniel Cryans >> > >> >wrote: >> > >> > >> > >> >> Just make sure your OS doesn't resolve itself as 127.0.0.1, usual >> > >> >> suspect if you are using ubuntu is to look at /etc/hosts and make >> sure >> > >> >> your hostname resolves to your IP. >> > >> >> >> > >> >> J-D >> > >> >> >> > >> >> On Tue, Jan 12, 2010 at 4:52 PM, steven zhuang >> > >> > >> > >> >> wrote: >> > >> >> > thanks, Jean, >> > >> >> > I figured out that, in the netstat's output I >> can see >> > >> >> > 127.0.0.1:60000, I don't know if this means it only listen on >> > >> connection >> > >> >> > request from the same machine. >> > >> >> > About the hbase.master configuration, is there >> > >> anything >> > >> >> I >> > >> >> > can use to replace it? >> > >> >> > >> > >> >> > >> > >> >> > On Wed, Jan 13, 2010 at 1:36 AM, Jean-Daniel Cryans < >> > >> [email protected] >> > >> >> >wrote: >> > >> >> > >> > >> >> >> > 10/01/11 21:16:46 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode >> > >> >> >> /hbase/master >> > >> >> >> > got 127.0.1.1:60000 >> > >> >> >> >> > >> >> >> This means that your master registered itself in Zookeeper as >> > >> >> >> 127.0.0.1, you seem to have a network configuration problem. >> > >> >> >> >> > >> >> >> Also the hbase.master configuration is deprecated and unused. >> > >> >> >> >> > >> >> >> J-D >> > >> >> >> >> > >> >> >> On Tue, Jan 12, 2010 at 6:16 AM, steven zhuang < >> > >> [email protected] >> > >> >> > >> > >> >> >> wrote: >> > >> >> >> > hello, list, >> > >> >> >> > >> > >> >> >> > I am now setting up a HBase cluster using HBase >> > >> version >> > >> >> >> > 0.20.2. But I have met some problems which I googled a lot and >> got >> > >> no >> > >> >> >> > answer. >> > >> >> >> > Please help me. >> > >> >> >> > >> > >> >> >> > I modified the Hbase-site.xml and copy the whole >> > >> >> directory >> > >> >> >> to >> > >> >> >> > another machine. >> > >> >> >> > Using one as the master, after I started the >> hbase >> > >> >> server, I >> > >> >> >> > CAN see Hmaster / HQuorumPeer / HRegionServer running on >> Master >> > >> >> >> > and HQuorumPeer / HRegionServer running on the slave node. >> > >> >> >> > Here is what's weird: >> > >> >> >> > I can enter the hbase shell on master node, but >> on the >> > >> >> other >> > >> >> >> > region server I cannot execute any command, a "list" command >> would >> > >> >> cause >> > >> >> >> a >> > >> >> >> > list of exception. >> > >> >> >> > >> > >> >> >> > 10/01/11 21:16:46 DEBUG >> client.HConnectionManager$ClientZKWatcher: >> > >> Got >> > >> >> >> > ZooKeeper event, state: SyncConnected, type: None, path: null >> > >> >> >> > 10/01/11 21:16:46 DEBUG zookeeper.ZooKeeperWrapper: Read ZNode >> > >> >> >> /hbase/master >> > >> >> >> > got 127.0.1.1:60000 >> > >> >> >> > 10/01/11 21:16:46 INFO client.HConnectionManager$TableServers: >> > >> >> getMaster >> > >> >> >> > attempt 0 of 5 failed; retrying after sleep of 2000 >> > >> >> >> > java.net.ConnectException: Connection refused >> > >> >> >> > at sun.nio.ch.SocketChannelImpl.checkConnect(Native >> Method) >> > >> >> >> > at >> > >> >> >> > >> > >> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574) >> > >> >> >> > at >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206) >> > >> >> >> > at >> org.apache.hadoop.net.NetUtils.connect(NetUtils.java:404) >> > >> >> >> > >> > >> >> >> > I can create table in the master node's Hbase >> shell, but >> > >> >> there >> > >> >> >> > sometime is some exception like: >> > >> >> >> > 10/01/12 06:08:15 DEBUG >> client.HConnectionManager$TableServers: >> > >> >> >> > locateRegionInMeta attempt 2 of 5 failed; retrying after sleep >> of >> > >> 2000 >> > >> >> >> > org.apache.hadoop.hbase.client.NoServerForRegionException: No >> > >> server >> > >> >> >> address >> > >> >> >> > listed in .META. for region t3,,1263305290760 >> > >> >> >> > at >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:668) >> > >> >> >> > at >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:594) >> > >> >> >> > at >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:557) >> > >> >> >> > >> > >> >> >> > But after this I can use list to see that the table HAS >> BEEN >> > >> >> BUILT >> > >> >> >> > inside the hdfs. >> > >> >> >> > >> > >> >> >> > the Hbase-site.xml I used: >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.rootdir >> > >> >> >> > hdfs://sz:8998/hbase >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.cluster.distributed >> > >> >> >> > true >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.master >> > >> >> >> > sz:60000 >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.tmp.dir >> > >> >> >> > /home/steven/data/hbase-${user.name} >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.zookeeper.property.dataDir >> > >> >> >> > ${hbase.tmp.dir}/zookeeper >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.zookeeper.quorum >> > >> >> >> > sz,hadoop3 >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.zookeeper.peerport >> > >> >> >> > 2888 >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > hbase.zookeeper.leaderport >> > >> >> >> > 3888 >> > >> >> >> > >> > >> >> >> > >> > >> >> >> >> > >> >> >> > >> >> > >> --------------------------------------------------------------------------------------------------------------------------------------------------------------------------- >> > >> >> >> > >> > >> >> >> > >> > >> >> >> > -- >> > >> >> >> > best wishes. >> > >> >> >> > steven >> > >> >> >> > >> > >> >> >> >> > >> >> > >> > >> >> > >> > >> >> > >> > >> >> > -- >> > >> >> > best wishes. >> > >> >> > steven >> > >> >> > >> > >> >> >> > >> > >> > >> > >> > >> > >> > >> > -- >> > >> > best wishes. >> > >> > steven >> > >> > >> > >> >> > > >> > > >> > > >> > > -- >> > > best wishes. >> > > steven >> > > >> >> >> >> >> >> > > > -- > best wishes. > steven >
