[
https://issues.apache.org/jira/browse/CURATOR-535?focusedWorklogId=776067&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-776067
]
ASF GitHub Bot logged work on CURATOR-535:
------------------------------------------
Author: ASF GitHub Bot
Created on: 31/May/22 01:37
Start Date: 31/May/22 01:37
Worklog Time Spent: 10m
Work Description: paul8263 commented on PR #406:
URL: https://github.com/apache/curator/pull/406#issuecomment-1141583133
Hi @tisonkun @eolivelli and @Randgalt ,
Thank you for your reply. This problem is unusual. I got this problem when
running unit tests in other project which relies on Zookeeper. The unit tests
are running parallelly so that TestServer creating process might get a race
condition when allocating unused ports. Currently my solution is the steps
below:
1. Get a random unused port.
2. Implementing a file lock.
3. Allocating the port to TestServer.
4. Check if TestServer starts properly. If it starts successfully, release
the file lock.
I would like to move those steps inside TestServer creation and start
process. However I worry that introducing a file lock might not be the best
solution as it only solves an unusual problem at the cost of performance
degradation.
Have you got any better ideas? I think using the file lock should be
considered as the last resort. Correct me if I am wrong.
Issue Time Tracking
-------------------
Worklog Id: (was: 776067)
Time Spent: 50m (was: 40m)
> TestServer random port selection has a race condition
> -----------------------------------------------------
>
> Key: CURATOR-535
> URL: https://issues.apache.org/jira/browse/CURATOR-535
> Project: Apache Curator
> Issue Type: Bug
> Affects Versions: 4.2.0
> Environment: Operating System:
> Fedora 30 (amd64)
> JVM:
> openjdk version "1.8.0_212"
> OpenJDK Runtime Environment (build 1.8.0_212-b04)
> OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)
> Reporter: Laverne Schrock
> Priority: Minor
> Attachments: BugReproducer.java, log4j.properties
>
> Time Spent: 50m
> Remaining Estimate: 0h
>
> When using one of the constructors for org.apache.curator.test.TestingServer
> that doesn't take a port number, the org.apache.curator.test.InstanceSpec
> that is constructed will chose random available ports to use. However,
> InstanceSpec only binds those ports during construction and then unbinds them
> so that they can be used when TestingServer.start() is called.
> This disconnect between port selection creates a race condition where some
> other process (or thread) could bind the port before TestingServer is started.
> I've seen this very rarely in our integration test suite that spins up and
> tears down TestingServer many times. I've attached a simple class for
> reproducing the issue. If you run it in an environment with log4j loaded and
> the attached log4j.properties, you should see output like the following
> (though it sometimes takes more iterations):
> {{completed iteration: 0}}
> {{completed iteration: 500}}
> {{2019-08-02 09:47:06 ERROR TestingZooKeeperServer:162 - From testing server
> (random state: false) for instance:
> InstanceSpec\{dataDirectory=/tmp/1564753624792-1, port=34707,
> electionPort=33621, quorumPort=45995, deleteDataDirectoryOnClose=true,
> serverId=1286, tickTime=-1, maxClientCnxns=-1, customProperties={},
> hostname=127.0.0.1} org.apache.curator.test.InstanceSpec@59c43d10}}
> {{java.net.BindException: Address already in use}}
> {{ at sun.nio.ch.Net.bind0(Native Method)}}
> {{ at sun.nio.ch.Net.bind(Net.java:433)}}
> {{ at sun.nio.ch.Net.bind(Net.java:425)}}
> {{ at
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)}}
> {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)}}
> {{ at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67)}}
> {{ at
> org.apache.zookeeper.server.NIOServerCnxnFactory.configure(NIOServerCnxnFactory.java:687)}}
> {{ at
> org.apache.zookeeper.server.ServerCnxnFactory.configure(ServerCnxnFactory.java:76)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.internalRunFromConfig(TestingZooKeeperMain.java:239)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.runFromConfig(TestingZooKeeperMain.java:132)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperServer$1.run(TestingZooKeeperServer.java:158)}}
> {{ at java.lang.Thread.run(Thread.java:748)}}
> {{java.lang.IllegalStateException: Timed out waiting for watch removal}}
> {{ at
> org.apache.curator.test.TestingZooKeeperMain.blockUntilStarted(TestingZooKeeperMain.java:146)}}
> {{ at
> org.apache.curator.test.TestingZooKeeperServer.start(TestingZooKeeperServer.java:167)}}
> {{ at
> org.apache.curator.test.TestingServer.start(TestingServer.java:148)}}
> {{ at BugReproducer.main(BugReproducer.java:15)}}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)