It’s not easy to debug the failing builds: Unit tests results cannot be uploaded for some reason and we only have surefire-reports.zip in the artifacts which contains all the test results zipped in XML format.
Test logs are completely missing or I cannot find them. Andor > On 2021. Jan 15., at 16:14, Andor Molnar <an...@apache.org> wrote: > > That’s not entirely true. > > First, we have PortAssignment class which is responsible for assigning unique > port numbers to test instances. It also has a retry logic to increment port > numbers which have failed to bind. > Second, ASF Jenkins running the same test suite and manages to get green > builds every now and then by running tests parallel on 4 threads. > > There must be some reason for that Github builds are much more unstable. > Do they fail always with the same error? > > Andor > > > >> On 2021. Jan 15., at 4:18, Christopher <ctubb...@apache.org> wrote: >> >> On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote: >> >>> Nicely done guys. I've just noticed the changes. :) >>> >>> Does it work out well so far? I can only see failing builds here: (at >>> least 1 test of the matrix always fails) >>> >>> https://github.com/apache/zookeeper/actions >>> >>> >> The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks >> useful, because it catches new checkstyle errors as a result of the >> proposed change: >> https://github.com/apache/zookeeper/runs/1705766219 >> >> The one matrix item that always fails is the one that runs all the Java >> tests. There are a lot of the JUnit Java tests that are flaky. Many of them >> are probably just failing because of collisions with other tests (port >> reuse is a common one). Many of the tests reuse the same port numbers for >> the integration test, so they can't run at the same time, but the test >> cases don't check for port in use or wait for it to be free, and the tests >> aren't designed to select a random or free port, either. A lot of these >> could be fixed with improvements to the testing code, so that port numbers >> don't collide when tests are run concurrently. >> >> There are other flaky tests, too, but I'm not that familiar with any of >> these tests, just reading the error messages. >> >> >>> I'm not saying that our ASF Jenkins build is rock solid, but at least >>> it's having green runs every now and then. >>> >>> >>> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/ >>> >>> >> I suspect some of these flaky Java test cases probably fail on Jenkins a >> lot, too. If they can be tweaked to run more reliably in GitHub Actions, I >> bet the Jenkins precommit jobs would also pass more reliably, too. >> >> >>> Andor >>> >>> >>> >>> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote: >>>> Hi Enrico, et al, >>>> >>>> I think the situation is a little more complicated than simply >>>> selecting between "option 1" and "option 2". So, please allow me to >>>> provide some additional clarification about what my PR does and does >>>> not do (responses inline, see below). >>>> >>>> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli >>>> <eolive...@gmail.com> wrote: >>>>> >>>>> Hello, >>>>> Christopher send a patch that enabled PR validation in GitHub >>>>> Actions [1] >>>>> >>>>> I would like to start a discussion and explain what's going on. >>>>> I was talking with Andor about the lack of the "magic words" on >>>>> Pull >>>>> Request Validation that restart the build on the new Jenkins. >>>>> >>>>> I cited that in Apache BookKeeper and in Apache Pulsar we have a >>>>> "github >>>>> bot" that interacts with the comments in the Pull Requests and it >>>>> is able >>>>> to rerun the failure builds. >>>>> This kind of "bots" is becoming pretty common on github. >>>>> >>>>> Christopher followed up our discussion and created that patch. >>>> >>>> To be clear, the GitHub Actions doesn't enable any "magic words" to >>>> restart, but it does provide greater control over re-build from the >>>> GitHub UI. It's at most a few clicks away. >>>> >>>>> You can't see GH Action working on ZK repo, because we should >>>>> enable them >>>>> with the help of Infra. >>>> >>>> To clarify here as well, INFRA isn't involved in enabling GH Actions. >>>> It's simply a matter of adding the workflow to the repository (which >>>> my PR does). It is automatically enabled once the workflow is merged >>>> in. >>>> >>>>> >>>>> From my point of view using GitHub Actions will be interesting and >>>>> useful >>>>> if and only if we add the 'bot' that reruns the failures. >>>> >>>> I think there's other value to consider, also... like replacing >>>> Travis >>>> CI (which asks for too many permissions IMO), and having much greater >>>> control over builds, including having access to surefire reports from >>>> failed tests, using custom containers, and having access to thousands >>>> of user-generated recipes to perform custom steps in the build. There >>>> are multiple benefits. >>>> >>>>> >>>>> This is not possible on the new ASF Jenkins, so only committers (I >>>>> am not >>>>> sure that you also need some special additional karma) can trigger >>>>> a new >>>>> build by logging into Jenkins. >>>> >>>> Unfortunately, that's still true of GH Actions... only committers >>>> would be able to trigger a rebuild in the GitHub UI for the build >>>> that >>>> was automatically triggered when the PR was created (or updated). >>>> Non-committers can only retrigger by updating the PR, just like they >>>> do now with Jenkins. However, the one advantage that non-committers >>>> do >>>> get is that they can execute the same GH Actions builds in their own >>>> repo, without needing an account on Jenkins or Travis, during their >>>> development process, before they even issue a PR. So, they will be >>>> able to see if their branch is likely to get a green build before >>>> they >>>> even open a PR (if they take advantage of that option). >>>> >>>>> >>>>> The work of Christopher is not only about enabling GitHub Actions >>>>> but it is >>>>> also about cleaning up the validation process and running only part >>>>> of the >>>>> tests, this can be discussed on a separate thread (I would like to >>>>> run all >>>>> of the tests for instance, not only a selection). So please comment >>>>> on the >>>>> PR about the changes. >>>> >>>> Running only part of the tests was not my goal. I just did that as an >>>> example to show what was possible. I can change it to try to run the >>>> full build. However, I know that currently doesn't work (too many >>>> tests fail in the GH Actions runner, just like too many failed in >>>> Travis, which is why they were disabled there). Either way, follow-on >>>> work will need to be done to tweak the tests so they can build >>>> reliably in this environment and eventually replace the Jenkins >>>> precommit validation. This PR is not yet ready to replace the need >>>> for >>>> the Jenkins precommit. It only lays the groundwork for getting there >>>> eventually. >>>> >>>> What the PR does now is: >>>> 1. Replace Travis and run *some* tests (Travis ran none) >>>> 2. Demonstrates how to run multiple tasks in a matrix build >>>> 3. Lays the groundwork for incremental test improvements so the >>>> Jenkins precommit job can also be replaced by GH Actions validations, >>>> so all validations are (eventually) managed in one place. >>>> >>>> The PR can still be changed to do whatever you want. I've merely >>>> layed >>>> down the groundwork that can be built upon. If you want it to try to >>>> run all the tests, I can do that. If you merely want it to replace >>>> Travis and build no tests (yet), I can do that. If you want me to add >>>> jdk8 to the matrix build instead of just jdk11, I can do that. If you >>>> want to have a build with jdk11, but run the full ITs with jdk8, I >>>> can >>>> do that. If you want to try to run the full ITs with both jdk8 and >>>> jdk11, I can do that. Just let me know what you want and I will >>>> update >>>> the PR accordingly. >>>> >>>> In essence, there are *far* more options available to the project >>>> than >>>> the "option 1" and "option 2" described below. It is the enormous >>>> possibility space that GH Actions opens up, that I think is the real >>>> advantage to the project, far more than any other advantages. >>>> >>>>> >>>>> Here the discussion is more about >>>>> "Use GH actions and rerun the tests" (option 1) >>>>> vs >>>>> "Use ASFK jenkins for everything + Travis for PRs on additional >>>>> architectures" (option 2, that is to keep the current situation) >>>>> >>>> >>>> My PR does keep the Travis build for s390x builds (without tests), >>>> because getting that arch building in GH Actions is tricky, so I >>>> didn't try it on my first attempt. It may be possible to get that to >>>> build also. >>>> >>>>> Regards >>>>> Enrico >>>>> [1] https://github.com/apache/zookeeper/pull/1508 >>>> >>>> I hope that clarifies a little bit of what I was trying to accomplish >>>> in my PR, since my intentions may not be identical to what others >>>> were >>>> hoping it would achieve. Feel free to ask me additional questions, or >>>> to direct me to make changes to my PR in any way. I realize that this >>>> is something that may not get merged right away (or ever)... >>>> especially if it doesn't achieve what you were hoping it would >>>> achieve. But, I still think there's advantages to using GH Actions >>>> that are worth considering. >>>> >>>> Thanks, >>>> Christopher >>> >>> >>> >