[ https://issues.apache.org/jira/browse/HBASE-23919?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael Stack resolved HBASE-23919. ----------------------------------- Resolution: Not A Problem HBASE-23993 fixed this condition > [Flakey Test] Standalone Zookeeper won't start (minizookeepercluster won't > come up) > ----------------------------------------------------------------------------------- > > Key: HBASE-23919 > URL: https://issues.apache.org/jira/browse/HBASE-23919 > Project: HBase > Issue Type: Bug > Components: flakies > Reporter: Michael Stack > Priority: Major > > I've seen this on occasion across different hardwares; the standalone > zookeeper won't come up and then random unit test fails in its junit startup > phase as part of launching mini cluster. > I've been trying to track this w/ a while now adding in logging, using more > of zk client instead of the copy/paste we've had a long while now, and adding > in logging (I've been running locally w/ zk logging set to INFO). > It looks like this currently where the last thing out of the server launch is > this.... > {code:java} > 2020-03-02 07:57:46,129 INFO [Time-limited test] > server.ZooKeeperServer(854): maxSessionTimeout set to -1 > > 2020-03-02 07:57:46,139 INFO [Time-limited test] > server.NIOServerCnxnFactory(89): binding to port 0.0.0.0/0.0.0.0:49316 > > 2020-03-02 07:57:46,181 INFO [Time-limited test] > zookeeper.MiniZooKeeperCluster(256): Started connectionTimeout=30000, > dir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/ > cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0, > clientPort=49316, > dataDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/ > zookeeper_0/version-2, > dataLogDir=/Users/stack/checkouts/hbase.git/hbase-server/target/test-data/89d57393-fe97-9200-630f-7843ee406bd2/cluster_6b4d6f67-7978-dc67-a1a3-7b1b0c0e4268/zookeeper_0/version-2, > tickTime=2000, maxClientCnxns=300, minSessionTimeout=4000, > maxSessionTimeout=40000, serverId=0{code} > ... then the client just does this over and over: > {code:java} > 2020-03-02 07:57:46,182 INFO [Time-limited test] > client.FourLetterWordMain(65): connecting to localhost 49316 > > 2020-03-02 07:57:46,213 INFO [Time-limited test] > zookeeper.MiniZooKeeperCluster(453): localhost:49316 not up > > java.net.SocketException: Connection reset > > > at > java.net.SocketInputStream.read(SocketInputStream.java:209) > > at > java.net.SocketInputStream.read(SocketInputStream.java:141) > > at > sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284) > > at > sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326) > > at > sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178) > > at > java.io.InputStreamReader.read(InputStreamReader.java:184) > > at > java.io.BufferedReader.fill(BufferedReader.java:161) > > at > java.io.BufferedReader.readLine(BufferedReader.java:324) > > at > java.io.BufferedReader.readLine(BufferedReader.java:389) > > at > org.apache.zookeeper.client.FourLetterWordMain.send4LetterWord(FourLetterWordMain.java:84) > > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.waitForServerUp(MiniZooKeeperCluster.java:442) > > at > org.apache.hadoop.hbase.zookeeper.MiniZooKeeperCluster.startup(MiniZooKeeperCluster.java:259) > > at > org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:130) > > at > org.apache.hadoop.hbase.HBaseZKTestingUtility.startMiniZKCluster(HBaseZKTestingUtility.java:103) > > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1107) > > at > org.apache.hadoop.hbase.HBaseTestingUtility.startMiniCluster(HBaseTestingUtility.java:1065) > > at > org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62) > > at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > > at > java.lang.reflect.Method.invoke(Method.java:498) > > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > > at > org.junit.internal.runners.statements.RunBefores.invokeMethod(RunBefores.java:33) > > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:24) > > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > > at > java.util.concurrent.FutureTask.run(FutureTask.java:266) > > at > java.lang.Thread.run(Thread.java:745) > {code} > > There is ZOOKEEPER-2714 but it is closed as not-reproducible and it looks a > little different going by the log emissions in that it registers the client > connections. > Noting this here in case others have seen similar or there are ideas out > there for how to fix. > Here for example is how the failure might look high-level: > {code:java} > > ------------------------------------------------------------------------------- > Test set: > org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable > > ------------------------------------------------------------------------------- > Tests run: 2, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 0.017 s <<< > FAILURE! - in > org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable > org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable Time > elapsed: 0.01 s <<< ERROR! > java.io.IOException: Waiting for startup of standalone server; server > isRunning=true > at > org.apache.hadoop.hbase.regionserver.TestRegionReplicasWithModifyTable.before(TestRegionReplicasWithModifyTable.java:62) > {code} > ... and then when you dig in, the test never even made it past cluster launch > in setup. -- This message was sent by Atlassian Jira (v8.3.4#803005)