[
https://issues.apache.org/jira/browse/HBASE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jean-Daniel Cryans updated HBASE-1921:
--------------------------------------
Attachment: HBASE-1921.patch
Patch that does what I described and here's what you will see when it happens:
{code}2009-10-20 10:53:38,708 DEBUG org.apache.hadoop.hbase.master.HMaster: Got
event None with path null
2009-10-20 10:53:39,997 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /10.10.1.58:2181
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/10.10.1.58:56099
remote=/10.10.1.58:2181]
2009-10-20 10:53:39,998 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-10-20 10:53:40,000 WARN org.apache.zookeeper.ClientCnxn: Exception closing
session 0x12472fd41f10004 to sun.nio.ch.selectionkeyi...@2afb6c5f
java.io.IOException: Session Expired
at
org.apache.zookeeper.ClientCnxn$SendThread.readConnectResult(ClientCnxn.java:589)
at org.apache.zookeeper.ClientCnxn$SendThread.doIO(ClientCnxn.java:709)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:945)
2009-10-20 10:53:40,000 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event
None with path null
2009-10-20 10:53:40,000 INFO org.apache.hadoop.hbase.master.HMaster: Master
lost its znode, trying to get a new one
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ZooKeeper: Closing session:
0x12472fd41f10004
2009-10-20 10:53:40,000 INFO org.apache.zookeeper.ClientCnxn: Closing
ClientCnxn for session: 0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ClientCnxn: Disconnecting
ClientCnxn for session: 0x12472fd41f10004
2009-10-20 10:53:40,001 INFO org.apache.zookeeper.ZooKeeper: Session:
0x12472fd41f10004 closed
2009-10-20 10:53:40,001 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Closed connection with
ZooKeeper
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ZooKeeper: Initiating client
connection, connectString=10.10.1.58:2181 sessionTimeout=60000
watcher=Thread[HMaster,5,main]
2009-10-20 10:53:40,003 INFO org.apache.zookeeper.ClientCnxn: Attempting
connection to server /10.10.1.58:2181
2009-10-20 10:53:40,005 INFO org.apache.zookeeper.ClientCnxn: Priming
connection to java.nio.channels.SocketChannel[connected local=/10.10.1.58:56100
remote=/10.10.1.58:2181]
2009-10-20 10:53:40,006 INFO org.apache.zookeeper.ClientCnxn: Server connection
successful
2009-10-20 10:53:40,009 DEBUG org.apache.hadoop.hbase.master.HMaster: Got event
None with path null
2009-10-20 10:53:40,012 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Wrote master address
10.10.1.58:60000 to ZooKeeper
2009-10-20 10:53:40,016 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode /hbase/master
got 10.10.1.58:60000
2009-10-20 10:53:40,017 DEBUG org.apache.hadoop.hbase.master.HMaster: Checking
cluster state...
2009-10-20 10:53:40,017 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/root-region-server got 10.10.1.58:60020
2009-10-20 10:53:40,019 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Read ZNode
/hbase/rs/1256061062528 got 10.10.1.58:60020
2009-10-20 10:53:40,019 INFO org.apache.hadoop.hbase.master.HMaster: This is a
failover, ZK inspection begins...
2009-10-20 10:53:40,020 DEBUG org.apache.hadoop.hbase.master.HMaster:
Inspection found server 10.10.1.58
2009-10-20 10:53:40,022 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: Updated ZNode
/hbase/rs/1256061062528 with data 10.10.1.58:60020
2009-10-20 10:53:40,028 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper: SetData of ZNode
/hbase/root-region-server with 10.10.1.58:60020
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Inspection
found 3 regions, with -ROOT-
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Found log
folder : 10.10.1.58,60020,1256061062528
2009-10-20 10:53:40,029 INFO org.apache.hadoop.hbase.master.HMaster: Log folder
belongs to an existing region server
2009-10-20 10:53:40,029 INFO org.apache.zookeeper.ClientCnxn: EventThread shut
down
2009-10-20 10:54:38,601 INFO org.apache.hadoop.hbase.master.ServerManager: 1
region servers, 0 dead, average load 3.0
2009-10-20 10:54:38,602 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scanning meta region {server: 10.10.1.58:60020,
regionname: -ROOT-,,0, startKey: <>}
2009-10-20 10:54:38,607 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scanning meta region {server: 10.10.1.58:60020,
regionname: .META.,,1, startKey: <>}
2009-10-20 10:54:38,611 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.rootScanner scan of 1 row(s) of meta region {server:
10.10.1.58:60020, regionname: -ROOT-,,0, startKey: <>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner:
RegionManager.metaScanner scan of 1 row(s) of meta region {server:
10.10.1.58:60020, regionname: .META.,,1, startKey: <>} complete
2009-10-20 10:54:38,615 INFO org.apache.hadoop.hbase.master.BaseScanner: All 1
.META. region(s) scanned
{code}
> When the Master's session times out and there's only one, cluster is wedged
> ---------------------------------------------------------------------------
>
> Key: HBASE-1921
> URL: https://issues.apache.org/jira/browse/HBASE-1921
> Project: Hadoop HBase
> Issue Type: Bug
> Affects Versions: 0.20.1
> Reporter: Jean-Daniel Cryans
> Assignee: Jean-Daniel Cryans
> Fix For: 0.20.2, 0.21.0
>
> Attachments: HBASE-1921.patch
>
>
> On IRC, some fella had a session expiration on his Master and had only one.
> Maybe in this case the Master should first try to re-get the znode?
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.