Diego,
Thanks for sharing your fix.

I think it is the best way to go.

Would you like to try to implement it in the Zookeeper code base?

Enrico

Il Lun 8 Feb 2021, 12:35 Diego Salvi - Diennea <diego.sa...@diennea.com> ha
scritto:

> Hi Stig,
>
>
>
> Yes you can receive from maven the fork instance and the maximum fork
> count and then using such values to compute custom ports to run parallel
> tests.
>
> The boring part is to use configure every test to use computed ports and
> not default one.
>
>
>
> You have to configure surefire to setup some system properties:
>
>
>
> <properties>
>
>     <forkCount>4</forkCount>
>
> </properties>
>
>
>
> <plugins>
>
>   <plugin>
>
>     <groupId>org.apache.maven.plugins</groupId>
>
>     <artifactId>maven-surefire-plugin</artifactId>
>
>     <version>2.21.0</version>
>
>     <configuration>
>
>       <forkCount>${forkCount}</forkCount>
>
>       <runOrder>balanced</runOrder>
>
>       <systemPropertyVariables>
>
>         <!-- Due to a surefire bug we cannot get directly the fork number
> but we need a prefix (“current:”) or it will be null -->
>
>         <my.fork.num>current:${surefire.forkNumber}</my.fork.num>
>
>         <my.fork.max>${forkCount}</my.fork.max>
>
>       </systemPropertyVariables>
>
>     </configuration>
>
>   </plugin>
>
> </plugins>
>
>
>
> Then you can create a base test class o a test rule to read such system
> properties and evaluate a custom port when requested.
>
> I gone for the base class but the test rule is possible too:
>
>
>
> public abstract class BaseTest {
>
>
>
>     public static final String NUM_PROPERTY_PREFIX = "current:";
>
>
>
>     /**
>
>      * Fork "position" in [0,len)
>
>      */
>
>     protected final int pos;
>
>
>
>     /**
>
>      * Max fork count
>
>      */
>
>     protected final int len;
>
>
>
>     public BaseTestUtils() {
>
>         System.out.println("[TESTENV] my.fork.num: " +
> System.getProperty("my.fork.num"));
>
>         System.out.println("[TESTENV] my.fork.max: " +
> System.getProperty("my.fork.max"));
>
>
>
>         final String num = System.getProperty("my.fork.num",
> NUM_PROPERTY_PREFIX + "1");
>
>
>
>         pos = Integer.valueOf(num.substring(NUM_PROPERTY_PREFIX.length(),
> num.length())) - 1;
>
>         len = Integer.getInteger("my.fork.max", 1);
>
>
>
>     }
>
>
>
>     public int evaluatePort(int base) {
>
>         return evaluatePort(pos, len, base);
>
>     }
>
>
>
>     private static int evaluatePort(int pos, int len, int base) {
>
>
>
>         /* max: 65535, we use just first 16 bits */
>
>
>
>         if (len <= pos || len <= 0 || pos < 0) {
>
>             throw new IllegalArgumentException("Len or pos invalid");
>
>         }
>
>
>
>         int max = len - 1;
>
>         int prefixBitsNeeded = Integer.SIZE -
> Integer.numberOfLeadingZeros(max);
>
>
>
>         int bitsRemaining = 16 - prefixBitsNeeded;
>
>
>
>         int suffixBitNeeded = Integer.SIZE -
> Integer.numberOfLeadingZeros(base);
>
>
>
>         if (suffixBitNeeded > bitsRemaining) {
>
>             throw new IllegalArgumentException("Base too large: pos " +
> pos + " len " + len + " base " + base
>
>                     + "; needed bits " + suffixBitNeeded + " remaining " +
> bitsRemaining);
>
>         }
>
>
>
>         int port = (pos << bitsRemaining) + base;
>
>
>
>         return port;
>
>     }
>
>
>
> }
>
>
>
> In EmailSuccess we use a tricky “shift” logic to compute a new port but
> smarter or simpler ones could be written too (something like port + pos *
> 5000).
>
>
>
> Diego Salvi
>
>
>
>
>
> *Da: *Enrico Olivelli <eolive...@gmail.com>
> *Data: *lunedì 8 febbraio 2021 11:49
> *A: *DevZooKeeper <dev@zookeeper.apache.org>
> *Cc: *Diego Salvi - Diennea <diego.sa...@diennea.com>
> *Oggetto: *Re: Java 11 tests are very flaky on GitHub Actions
>
>
>
> Stig,
>
> thanks for your information.
>
> Unfortunately this is not doable in our test suite, because we have to
> create server side configuration for an ensemble of servers before starting
> them.
>
> I am afraid that in order to implement a safe port-assignment we will have
> to instrument all of the code that is opening server side sockets in order
> to use mocks or pre-created sockets.
> But this is not easy because we sometimes have tests that close and reopen
> the ServerSocket, but doing so we will "lose" the port.
>
> I saw in other projects (non OSS, so I cannot send pointers) that you can
> receive from the Maven environment the if of the "forked JVM" and then you
> can compute all of the ports used by the tests from that value.
> For instance if you start 4 parallel tests you will have that JVM id
> spanning from 0 to 3 and you receive that value as a System Property.
> Then the test can use that I in order to compute static ports, like JVMID
> * 1000 + X .
>
> I am cc'ing one of my former colleagues at EmailSuccess.com, Diego, that
> probably can share his experience in EmailSuccess
>
> Enrico
>
>
>
> Il giorno sab 6 feb 2021 alle ore 16:32 Stig Rohde Døssing <
> s...@apache.org> ha scritto:
>
> I'm not sure if this is easy to solve for Zookeeper, but going to make a
> suggestion here I've seen work on other projects.
>
> The "find a free port" code is inherently racy. I've seen that solution
> used in many other projects, and it never works that well.
>
> A much safer way to do it is to pass port 0 directly to the server you're
> trying to start. As you know from the port reservation code, this will
> cause the OS to use a free port. The port selected by the OS can be
> retrieved by the test by calling e.g.
>
> https://docs.oracle.com/javase/7/docs/api/java/net/ServerSocket.html#getLocalPort()
> .
>
>
> You go from this flow
> Open ServerSocket on 0
> Get port via ServerSocket.getLocalPort
> Close socket <- This is the dangerous bit, as after this, the port is again
> up for grabs for other tests to acquire
> Pass chosen port to server
> Server opens ServerSocket on port
>
> To
> Pass 0 to server
> Server opens ServerSocket on 0, getting a random port
> Test calls some method on the server that ends up in a call to
> ServerSocket.getLocalPort
>
> This won't work for cases where you're testing e.g. the client starting
> before the server, and connecting once the server has booted, but for the
> common use case of starting a server, and then starting a client connected
> to that server, this should work.
>
> On 2021/02/05 15:58:00, Christopher <c...@apache.org> wrote:
> > FWIW, the maven-surefire-plugin should be able to retry temporarily>
> > failing tests, but it isn't working with JUnit5 until the plugin is>
> > updated to a newer version. However, the newer version of the plugin>
> > didn't work with the versions of JUnit that ZK is using. When I tried>
> > to update maven-surefire-plugin, things got worse due to JUnit4/JUnit5>
> > stuff that I don't understand.>
> >
> > I have created a PR to trigger the tests to run on JDK8 instead of>
> > JDK11, to demonstrate that the tests are flaky there (or to prove me>
> > wrong), but it doesn't need to be merged, as it's just a test:>
> > https://github.com/apache/zookeeper/pull/1595>
> >
> > On Fri, Feb 5, 2021 at 10:48 AM Christopher <ct...@apache.org> wrote:>
> > >>
> > > These tests are flaky on JDK8, too, when I tried. It's my>
> > > understanding that's why they were not being run on Travis previously>
> > > (still aren't). Most of the tests that I see failing are due to the>
> > > "Address already in use" bind error. This may be more likely in a>
> > > virtualized environment like GitHub Actions (vs. non-vritualized>
> > > Jenkins), but I don't think it is unique to that environment... just>
> > > maybe more likely for whatever reason.>
> > >>
> > > I have spent quite a bit of time looking into this, and I think the>
> > > main cause is that the port reservation stuff isn't working the way it>
> > > should. It tries to bind to a port, and then it closes the>
> > > ServerSocket that was used to find an available port. What it should>
> > > do instead is return the ServerSocket itself for use in the calling>
> > > code, after it has successfully bound, rather than returning an>
> > > integer. But, that might be a big change, and there might be a simpler>
> > > fix.>
> > >>
> > > On Fri, Feb 5, 2021 at 10:25 AM Enrico Olivelli <eo...@gmail.com>
> wrote:>
> > > >>
> > > > Hi,>
> > > > I see that the new test workflow with Java 11 is very flaky.>
> > > >>
> > > > This is an example>
> > > >
>
> https://github.com/apache/zookeeper/pull/1592/checks?check_run_id=1830428694
> >
>
> > > >>
> > > > I would like to not consider it blocker for merging pull requests.>
> > > >>
> > > > That said, we should investigate further, the tests that are failing
> were>
> > > > not flaky on JDK8>
> > > >>
> > > > Thoughts ?>
> > > > Enrico>
> >
>
>
> ------------------------------
>
> CONFIDENTIALITY & PRIVACY NOTICE
> This e-mail (including any attachments) is strictly confidential and may
> also contain privileged information. If you are not the intended recipient
> you are not authorised to read, print, save, process or disclose this
> message. If you have received this message by mistake, please inform the
> sender immediately and destroy this e-mail, its attachments and any copies.
> Any use, distribution, reproduction or disclosure by any person other than
> the intended recipient is strictly prohibited and the person responsible
> may incur in penalties.
> The use of this e-mail is only for professional purposes; there is no
> guarantee that the correspondence towards this e-mail will be read only by
> the recipient, because, under certain circumstances, there may be a need to
> access this email by third subjects belonging to the Company.
>

Reply via email to