[ 
https://issues.apache.org/jira/browse/HBASE-6406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13416673#comment-13416673
 ] 

Lars Hofhansl commented on HBASE-6406:
--------------------------------------

The first one fails due to a race condition:
{quote}
2012-07-16 03:36:04,631 INFO  [pool-1-thread-1] 
zookeeper.MiniZooKeeperCluster(193): Started MiniZK Cluster and connect 1 ZK 
server on client port: 61529
2012-07-16 03:36:04,760 DEBUG [pool-1-thread-1] zookeeper.ZKUtil(100): 
connection to cluster: clusterId opening connection to ZooKeeper with ensemble 
(localhost:61529)
2012-07-16 03:36:04,831 INFO  [pool-1-thread-1] 
zookeeper.RecoverableZooKeeper(97): The identifier of this process is 
22...@vesta.apache.org
2012-07-16 03:36:04,927 INFO  [pool-1-thread-1] hbase.ResourceChecker(145): 
before replication.TestReplicationPeer#testResetZooKeeperSession: 11 threads, 
86 file descriptors 0 connections, 
2012-07-16 03:36:07,918 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(262): connection to cluster: clusterId Received 
ZooKeeper Event, type=None, state=SyncConnected, path=null
2012-07-16 03:36:07,926 INFO  [Thread-2] replication.TestReplicationPeer(54): 
Expiring ReplicationPeer ZooKeeper session.
2012-07-16 03:36:07,950 DEBUG [pool-1-thread-1-EventThread] 
zookeeper.ZooKeeperWatcher(339): connection to cluster: 
clusterId-0x1388ddb61410000 connected
2012-07-16 03:36:08,091 INFO  [Thread-2] hbase.HBaseTestingUtility(1344): ZK 
Closed Session 0x1388ddb61410000; sleeping=7000
2012-07-16 03:36:15,092 INFO  [Thread-2] replication.TestReplicationPeer(58): 
Attempting to use expired ReplicationPeer ZooKeeper session.
2012-07-16 03:36:15,095 INFO  [pool-1-thread-1] hbase.ResourceChecker(145): 
after replication.TestReplicationPeer#testResetZooKeeperSession: 11 threads 
(was 11), 89 file descriptors (was 89). 0 connections, 
{quote}

A successful run looks like this:
{quote}
2012-07-17 15:20:35,285 INFO  [main] zookeeper.MiniZooKeeperCluster(193): 
Started MiniZK Cluster and connect 1 ZK server on client port: 49834
2012-07-17 15:20:35,298 DEBUG [main] zookeeper.ZKUtil(100): connection to 
cluster: clusterId opening connection to ZooKeeper with ensemble 
(localhost:49834)
2012-07-17 15:20:35,312 INFO  [main] zookeeper.RecoverableZooKeeper(97): The 
identifier of this process is 26186@xxxx
2012-07-17 15:20:35,336 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(262): connection to cluster: clusterId Received 
ZooKeeper Event, type=None, state=SyncConnected, path=null
2012-07-17 15:20:35,338 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(339): connection to cluster: 
clusterId-0x1389707502a0000 connected
2012-07-17 15:20:35,348 INFO  [main] hbase.ResourceChecker(145): before 
replication.TestReplicationPeer#testResetZooKeeperSession: 10 threads, 87 file 
descriptors 0 connections, 
2012-07-17 15:20:35,356 INFO  [Thread-2] replication.TestReplicationPeer(56): 
Expiring ReplicationPeer ZooKeeper session.
2012-07-17 15:20:35,360 INFO  [Thread-2] hbase.HBaseTestingUtility(1344): ZK 
Closed Session 0x1389707502a0000; sleeping=7000
2012-07-17 15:20:35,459 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(262): connection to cluster: 
clusterId-0x1389707502a0000 Received ZooKeeper Event, type=None, 
state=Disconnected, path=null
2012-07-17 15:20:35,459 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(360): connection to cluster: 
clusterId-0x1389707502a0000 Received Disconnected from ZooKeeper, ignoring
2012-07-17 15:20:37,267 DEBUG [main-EventThread] 
zookeeper.ZooKeeperWatcher(262): connection to cluster: 
clusterId-0x1389707502a0000 Received ZooKeeper Event, type=None, state=Expired, 
path=null
2012-07-17 15:20:37,269 WARN  [main-EventThread] 
replication.ReplicationPeer(157): The ReplicationPeer coresponding to peer 
clusterKey was aborted for the following reason(s):connection to cluster: 
clusterId-0x1389707502a0000 connection to cluster: clusterId-0x1389707502a0000 
received expired from ZooKeeper, aborting
org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode = 
Session expired
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.connectionEvent(ZooKeeperWatcher.java:374)
        at 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher.process(ZooKeeperWatcher.java:271)
        at 
org.apache.zookeeper.ClientCnxn$EventThread.processEvent(ClientCnxn.java:521)
        at org.apache.zookeeper.ClientCnxn$EventThread.run(ClientCnxn.java:497)
2012-07-17 15:20:42,360 INFO  [Thread-2] replication.TestReplicationPeer(60): 
Attempting to use expired ReplicationPeer ZooKeeper session.
2012-07-17 15:20:42,362 DEBUG [Thread-2] zookeeper.ZKUtil(100): connection to 
cluster: clusterId opening connection to ZooKeeper with ensemble 
(localhost:49834)
2012-07-17 15:20:42,363 INFO  [Thread-2] zookeeper.RecoverableZooKeeper(97): 
The identifier of this process is 26186@xxxx
2012-07-17 15:20:42,364 INFO  [Thread-2] replication.TestReplicationPeer(69): 
Attempting to use refreshed ReplicationPeer ZooKeeper session.
2012-07-17 15:20:42,372 DEBUG [Thread-2-EventThread] 
zookeeper.ZooKeeperWatcher(262): connection to cluster: clusterId Received 
ZooKeeper Event, type=None, state=SyncConnected, path=null
2012-07-17 15:20:42,373 INFO  [main] hbase.ResourceChecker(145): after 
replication.TestReplicationPeer#testResetZooKeeperSession: 10 threads (was 10), 
88 file descriptors (was 88). 0 connections, 
2012-07-17 15:20:42,373 DEBUG [Thread-2-EventThread] 
zookeeper.ZooKeeperWatcher(339): connection to cluster: 
clusterId-0x1389707502a0001 connected
{quote}

In the first case it seems the ZKW connects to the cluster only after the 
session is attempted to be expired (and hence it is never actually expired, 
because it wasn't connected in the first place).

                
> TestReplicationPeer.testResetZooKeeperSession and 
> TestZooKeeper.testClientSessionExpired fail frequently
> --------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-6406
>                 URL: https://issues.apache.org/jira/browse/HBASE-6406
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.94.1
>            Reporter: Lars Hofhansl
>            Assignee: Lars Hofhansl
>             Fix For: 0.94.2
>
>
> Looking back through the 0.94 test runs these two tests accounted for 11 of 
> 34 failed tests.
> They should be fixed or (temporarily) disabled.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to