Stig,
thanks for your information.

Unfortunately this is not doable in our test suite, because we have to
create server side configuration for an ensemble of servers before starting
them.

I am afraid that in order to implement a safe port-assignment we will have
to instrument all of the code that is opening server side sockets in order
to use mocks or pre-created sockets.
But this is not easy because we sometimes have tests that close and reopen
the ServerSocket, but doing so we will "lose" the port.

I saw in other projects (non OSS, so I cannot send pointers) that you can
receive from the Maven environment the if of the "forked JVM" and then you
can compute all of the ports used by the tests from that value.
For instance if you start 4 parallel tests you will have that JVM id
spanning from 0 to 3 and you receive that value as a System Property.
Then the test can use that I in order to compute static ports, like JVMID *
1000 + X .

I am cc'ing one of my former colleagues at EmailSuccess.com, Diego, that
probably can share his experience in EmailSuccess

Enrico

Il giorno sab 6 feb 2021 alle ore 16:32 Stig Rohde Døssing <s...@apache.org>
ha scritto:

> I'm not sure if this is easy to solve for Zookeeper, but going to make a
> suggestion here I've seen work on other projects.
>
> The "find a free port" code is inherently racy. I've seen that solution
> used in many other projects, and it never works that well.
>
> A much safer way to do it is to pass port 0 directly to the server you're
> trying to start. As you know from the port reservation code, this will
> cause the OS to use a free port. The port selected by the OS can be
> retrieved by the test by calling e.g.
>
> https://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#getLocalPort()
> .
>
>
> You go from this flow
> Open ServerSocket on 0
> Get port via ServerSocket.getLocalPort
> Close socket <- This is the dangerous bit, as after this, the port is again
> up for grabs for other tests to acquire
> Pass chosen port to server
> Server opens ServerSocket on port
>
> To
> Pass 0 to server
> Server opens ServerSocket on 0, getting a random port
> Test calls some method on the server that ends up in a call to
> ServerSocket.getLocalPort
>
> This won't work for cases where you're testing e.g. the client starting
> before the server, and connecting once the server has booted, but for the
> common use case of starting a server, and then starting a client connected
> to that server, this should work.
>
> On 2021/02/05 15:58:00, Christopher <c...@apache.org> wrote:
> > FWIW, the maven-surefire-plugin should be able to retry temporarily>
> > failing tests, but it isn't working with JUnit5 until the plugin is>
> > updated to a newer version. However, the newer version of the plugin>
> > didn't work with the versions of JUnit that ZK is using. When I tried>
> > to update maven-surefire-plugin, things got worse due to JUnit4/JUnit5>
> > stuff that I don't understand.>
> >
> > I have created a PR to trigger the tests to run on JDK8 instead of>
> > JDK11, to demonstrate that the tests are flaky there (or to prove me>
> > wrong), but it doesn't need to be merged, as it's just a test:>
> > https://github.com/apache/zookeeper/pull/1595>
> >
> > On Fri, Feb 5, 2021 at 10:48 AM Christopher <ct...@apache.org> wrote:>
> > >>
> > > These tests are flaky on JDK8, too, when I tried. It's my>
> > > understanding that's why they were not being run on Travis previously>
> > > (still aren't). Most of the tests that I see failing are due to the>
> > > "Address already in use" bind error. This may be more likely in a>
> > > virtualized environment like GitHub Actions (vs. non-vritualized>
> > > Jenkins), but I don't think it is unique to that environment... just>
> > > maybe more likely for whatever reason.>
> > >>
> > > I have spent quite a bit of time looking into this, and I think the>
> > > main cause is that the port reservation stuff isn't working the way it>
> > > should. It tries to bind to a port, and then it closes the>
> > > ServerSocket that was used to find an available port. What it should>
> > > do instead is return the ServerSocket itself for use in the calling>
> > > code, after it has successfully bound, rather than returning an>
> > > integer. But, that might be a big change, and there might be a simpler>
> > > fix.>
> > >>
> > > On Fri, Feb 5, 2021 at 10:25 AM Enrico Olivelli <eo...@gmail.com>
> wrote:>
> > > >>
> > > > Hi,>
> > > > I see that the new test workflow with Java 11 is very flaky.>
> > > >>
> > > > This is an example>
> > > >
>
> https://github.com/apache/zookeeper/pull/1592/checks?check_run_id=1830428694
> >
>
> > > >>
> > > > I would like to not consider it blocker for merging pull requests.>
> > > >>
> > > > That said, we should investigate further, the tests that are failing
> were>
> > > > not flaky on JDK8>
> > > >>
> > > > Thoughts ?>
> > > > Enrico>
> >
>

Reply via email to