[ https://issues.apache.org/jira/browse/HBASE-7034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13482739#comment-13482739 ]
stack commented on HBASE-7034: ------------------------------ Here is more from log Ram. Master side listing of all lines related to 1349052737638.9af7cfc9b15910a0b3d714bf40a3248f {code} 2012-10-22 02:20:01,351 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:10302-0xb3a862e57a503ba Async create of unassigned node for 9af7cfc9b15910a0b3d714bf40a3248f with OFFLINE state 2012-10-22 02:20:24,577 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:10302-0xb3a862e57a503ba Set watcher on existing znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:21:08,770 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: rs=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. state=OFFLINE, ts=1350872401351, server=null, server=sv4r17s44,10304,1350872216778 2012-10-22 02:21:09,495 DEBUG org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: rs=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. state=OFFLINE, ts=1350872401351, server=null 2012-10-22 02:21:40,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:10302-0xb3a862e57a503ba Set watcher on existing znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:23:47,089 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:10302-0xb3a862e57a503ba Set watcher on existing znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:24:34,154 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:10302-0xb3a862e57a503ba Received ZooKeeper Event, type=NodeDataChanged, state=SyncConnected, path=/hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:10302-0xb3a862e57a503ba Retrieved 112 byte(s) of data from znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f and set watcher; region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f., origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPENED 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: Handling transition=RS_ZK_REGION_OPENED, server=sv4r17s44,10304,1350872216778, region=9af7cfc9b15910a0b3d714bf40a3248f, which is more than 15 seconds late 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED event for b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. from sv4r17s44,10304,1350872216778; deleting unassigned node 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:10302-0xb3a862e57a503ba Deleting existing unassigned node for 9af7cfc9b15910a0b3d714bf40a3248f that is in expected state RS_ZK_REGION_OPENED 2012-10-22 02:24:34,221 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: master:10302-0xb3a862e57a503ba Retrieved 112 byte(s) of data from znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f; data=region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f., origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPENED 2012-10-22 02:24:34,239 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: master:10302-0xb3a862e57a503ba Successfully deleted unassigned node for region 9af7cfc9b15910a0b3d714bf40a3248f in expected state RS_ZK_REGION_OPENED 2012-10-22 02:27:09,169 DEBUG org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: master:10302-0xb3a862e57a503ba Received ZooKeeper Event, type=NodeDeleted, state=SyncConnected, path=/hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:27:09,169 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: The znode of region b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. has been deleted. 2012-10-22 02:27:09,174 INFO org.apache.hadoop.hbase.master.AssignmentManager: The master has opened the region b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. that was online on sv4r17s44,10304,1350872216778 {code} The above is being processed some minutes after the regionserver (see below) I guess because it is a bulk assign and there is a lot of other zk'ing going on at this time. Here is the regionserver side. {code} 2012-10-22 02:21:34,778 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:10304-0xd3a863aa2ee011a Attempting to transition node 9af7cfc9b15910a0b3d714bf40a3248f from RS_ZK_REGION_OPENING to RS_ZK_REGION_OPENED 2012-10-22 02:21:34,801 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: regionserver:10304-0xd3a863aa2ee011a Retrieved 112 byte(s) of data from znode /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f; data=region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f., origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPE NING 2012-10-22 02:21:48,240 WARN org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper: Possibly transient ZooKeeper exception: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:21:50,698 ERROR org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed transitioning node b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. from OPENING to OPENED -- closing region org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = BadVersion for /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f 2012-10-22 02:21:50,698 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Closing b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f.: disabling compactions & flushes 2012-10-22 02:21:50,698 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Updates disabled for region b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. 2012-10-22 02:21:50,703 INFO org.apache.hadoop.hbase.regionserver.HRegion: Closed b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. 2012-10-22 02:43:35,980 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. 2012-10-22 02:43:36,392 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. 2012-10-22 02:43:37,012 DEBUG org.apache.hadoop.hbase.regionserver.HRegionServer: NotServingRegionException; Region is not online: b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. {code} I think the client keeps coming here because it was here previously before the restart so that is what is in the .META. table for the location. > Bad version, failed OPENING to OPENED but master thinks it is open anyways > -------------------------------------------------------------------------- > > Key: HBASE-7034 > URL: https://issues.apache.org/jira/browse/HBASE-7034 > Project: HBase > Issue Type: Bug > Components: Region Assignment > Affects Versions: 0.94.2 > Reporter: stack > > I have this in RS log: > {code} > 2012-10-22 02:21:50,698 ERROR > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed > transitioning node > b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f. > from OPENING to OPENED -- closing region > org.apache.zookeeper.KeeperException$BadVersionException: KeeperErrorCode = > BadVersion for /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f > {code} > Master says this (it is bulk assigning): > {code} > .... > 2012-10-22 02:21:40,673 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:10302-0xb3a862e57a503ba Set watcher on existing znode > /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f > ... > then this > .... > 2012-10-22 02:23:47,089 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:10302-0xb3a862e57a503ba Set watcher on existing znode > /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f > .... > 2012-10-22 02:24:34,176 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: > master:10302-0xb3a862e57a503ba Retrieved 112 byte(s) of data from znode > /hbase/unassigned/9af7cfc9b15910a0b3d714bf40a3248f and set watcher; > region=b9,\xEE\xAE\x9BiQO\x89]+a\xE0\x7F\xB7'X?,1349052737638.9af7cfc9b15910a0b3d714bf40a3248f., > origin=sv4r17s44,10304,1350872216778, state=RS_ZK_REGION_OPENED > etc. > {code} > Disagreement as to what is going on here. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira