[
https://issues.apache.org/jira/browse/CURATOR-535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Laverne Schrock updated CURATOR-535:
------------------------------------
Description:
When using one of the constructors for org.apache.curator.test.TestingServer
that doesn't take a port number, the org.apache.curator.test.InstanceSpec that
is constructed will chose random available ports to use. However, InstanceSpec
only binds those ports during construction and then unbinds them so that they
can be used when TestingServer.start() is called.
This disconnect between port selection creates a race condition where some
other process (or thread) could bind the port before TestingServer is started.
I've seen this very rarely in our integration test suite that spins up and
tears down TestingServer many times. I've attached a simple class for
reproducing the issue. If you run it in an environment with log4j loaded and
the attached log4j.properties, you should see output like the following (though
it sometimes takes more iterations):
{{completed iteration: 0}}
{{completed iteration: 500}}
{{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server
(random state: false) for instance:
InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707,
electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true,
serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={},
hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}}
{{java.net.BindException: Address already in use}}
{{ at sun.nio.ch.Net.bind0(Native Method)}}
{{ at sun.nio.ch.Net.bind(Net.java:433)}}
{{ at sun.nio.ch.Net.bind(Net.java:425)}}
{{ at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}}
{{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}}
{{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}}
{{ at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}}
{{ at
org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}}
{{ at
org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
{{java.lang.IllegalStateException: Timed out waiting for watch removal}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}}
{{ at
org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}}
{{ at org.apache.curator.test.TestingServer.start(TestingServer.java:148)}}
{{ at BugReproducer.main(BugReproducer.java:15)}}
was:
When using one of the constructors for org.apache.curator.test.TestingServer
that doesn't take a port number, the org.apache.curator.test.InstanceSpec that
is constructed will chose random available ports to use. However, InstanceSpec
only binds those ports during construction and then unbinds them so that they
can be used when TestingServer.start() is called.
This disconnect between port selection creates a race condition where some
other process (or thread) could bind the port before TestingServer is started.
{{I've seen this very rarely in our integration test suite that spins up and
tears down TestingServer many times. I've attached a simple class for
reproducing the issue. If you run it in an environment with log4j loaded and
the attached log4j.properties, you should see output like the following (though
it sometimes takes more iterations)}}
{{completed iteration: 0}}
{{completed iteration: 500}}
{{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server
(random state: false) for instance:
InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707,
electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true,
serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={},
hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}}
{{java.net.BindException: Address already in use}}
{{ at sun.nio.ch.Net.bind0(Native Method)}}
{{ at sun.nio.ch.Net.bind(Net.java:433)}}
{{ at sun.nio.ch.Net.bind(Net.java:425)}}
{{ at
sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}}
{{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}}
{{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}}
{{ at
org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}}
{{ at
org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}}
{{ at
org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}}
{{ at java.lang.Thread.run(Thread.java:748)}}
{{java.lang.IllegalStateException: Timed out waiting for watch removal}}
{{ at
org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}}
{{ at
org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}}
{{ at org.apache.curator.test.TestingServer.start(TestingServer.java:148)}}
{{ at BugReproducer.main(BugReproducer.java:15)}}
> TestServer random port selection has a race condition
> -----------------------------------------------------
>
> Key: CURATOR-535
> URL: https://issues.apache.org/jira/browse/CURATOR-535
> Project: Apache Curator
> Issue Type: Bug
> Affects Versions: 4.2.0
> Environment: Operating System:
> Fedora 30 (amd64)
> JVM:
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (build 1.8.0_212-b04)
> OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)
> Reporter: Laverne Schrock
> Assignee: Jordan Zimmerman
> Priority: Minor
> Attachments: BugReproducer.java
>
>
> When using one of the constructors for org.apache.curator.test.TestingServer
> that doesn't take a port number, the org.apache.curator.test.InstanceSpec
> that is constructed will chose random available ports to use. However,
> InstanceSpec only binds those ports during construction and then unbinds them
> so that they can be used when TestingServer.start() is called.
> This disconnect between port selection creates a race condition where some
> other process (or thread) could bind the port before TestingServer is started.
> I've seen this very rarely in our integration test suite that spins up and
> tears down TestingServer many times. I've attached a simple class for
> reproducing the issue. If you run it in an environment with log4j loaded and
> the attached log4j.properties, you should see output like the following
> (though it sometimes takes more iterations):
> {{completed iteration: 0}}
> {{completed iteration: 500}}
> {{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server
> (random state: false) for instance:
> InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707,
> electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true,
> serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={},
> hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}}
> {{java.net.BindException: Address already in use}}
> {{ at sun.nio.ch.Net.bind0(Native Method)}}
> {{ at sun.nio.ch.Net.bind(Net.java:433)}}
> {{ at sun.nio.ch.Net.bind(Net.java:425)}}
> {{ at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}}
> {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}}
> {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}}
> {{ at
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}}
> {{ at
> org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}
> {{java.lang.IllegalStateException: Timed out waiting for watch removal}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}}
> {{ at
> org.apache.curator.test.TestingServer.start(TestingServer.java:148)}}
> {{ at BugReproducer.main(BugReproducer.java:15)}}
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)