Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Andor Molnar Fri, 15 Jan 2021 07:21:53 -0800

It’s not easy to debug the failing builds:

Unit tests results cannot be uploaded for some reason and we only have 
surefire-reports.zip in the artifacts which contains all the test results 
zipped in XML format.


Test logs are completely missing or I cannot find them.

Andor




> On 2021. Jan 15., at 16:14, Andor Molnar <an...@apache.org> wrote:
> 
> That’s not entirely true.
> 
> First, we have PortAssignment class which is responsible for assigning unique 
> port numbers to test instances. It also has a retry logic to increment port 
> numbers which have failed to bind.
> Second, ASF Jenkins running the same test suite and manages to get green 
> builds every now and then by running tests parallel on 4 threads.
> 
> There must be some reason for that Github builds are much more unstable. 
> Do they fail always with the same error?
> 
> Andor
> 
> 
> 
>> On 2021. Jan 15., at 4:18, Christopher <ctubb...@apache.org> wrote:
>> 
>> On Thu, Jan 14, 2021 at 3:52 PM Andor Molnar <an...@apache.org> wrote:
>> 
>>> Nicely done guys. I've just noticed the changes. :)
>>> 
>>> Does it work out well so far? I can only see failing builds here: (at
>>> least 1 test of the matrix always fails)
>>> 
>>> https://github.com/apache/zookeeper/actions
>>> 
>>> 
>> The CI build for PR https://github.com/apache/zookeeper/pull/1579 looks
>> useful, because it catches new checkstyle errors as a result of the
>> proposed change:
>> https://github.com/apache/zookeeper/runs/1705766219
>> 
>> The one matrix item that always fails is the one that runs all the Java
>> tests. There are a lot of the JUnit Java tests that are flaky. Many of them
>> are probably just failing because of collisions with other tests (port
>> reuse is a common one). Many of the tests reuse the same port numbers for
>> the integration test, so they can't run at the same time, but the test
>> cases don't check for port in use or wait for it to be free, and the tests
>> aren't designed to select a random or free port, either. A lot of these
>> could be fixed with improvements to the testing code, so that port numbers
>> don't collide when tests are run concurrently.
>> 
>> There are other flaky tests, too, but I'm not that familiar with any of
>> these tests, just reading the error messages.
>> 
>> 
>>> I'm not saying that our ASF Jenkins build is rock solid, but at least
>>> it's having green runs every now and then.
>>> 
>>> 
>>> https://ci-hadoop.apache.org/view/ZooKeeper/job/zookeeper-multi-branch-build/
>>> 
>>> 
>> I suspect some of these flaky Java test cases probably fail on Jenkins a
>> lot, too. If they can be tweaked to run more reliably in GitHub Actions, I
>> bet the Jenkins precommit jobs would also pass more reliably, too.
>> 
>> 
>>> Andor
>>> 
>>> 
>>> 
>>> On Sun, 2020-11-01 at 00:31 -0400, Christopher wrote:
>>>> Hi Enrico, et al,
>>>> 
>>>> I think the situation is a little more complicated than simply
>>>> selecting between "option 1" and "option 2". So, please allow me to
>>>> provide some additional clarification about what my PR does and does
>>>> not do (responses inline, see below).
>>>> 
>>>> On Sat, Oct 31, 2020 at 12:00 PM Enrico Olivelli
>>>> <eolive...@gmail.com> wrote:
>>>>> 
>>>>> Hello,
>>>>> Christopher send a patch that enabled PR validation in GitHub
>>>>> Actions [1]
>>>>> 
>>>>> I would like to start a discussion and explain what's going on.
>>>>> I was talking with Andor about the lack of the "magic words" on
>>>>> Pull
>>>>> Request Validation that restart the build on the new Jenkins.
>>>>> 
>>>>> I cited that in Apache BookKeeper and in Apache Pulsar we have a
>>>>> "github
>>>>> bot" that interacts with the comments in the Pull Requests and it
>>>>> is able
>>>>> to rerun the failure builds.
>>>>> This kind of "bots" is becoming pretty common on github.
>>>>> 
>>>>> Christopher followed up our discussion and created that patch.
>>>> 
>>>> To be clear, the GitHub Actions doesn't enable any "magic words" to
>>>> restart, but it does provide greater control over re-build from the
>>>> GitHub UI. It's at most a few clicks away.
>>>> 
>>>>> You can't see GH Action working on ZK repo, because we should
>>>>> enable them
>>>>> with the help of Infra.
>>>> 
>>>> To clarify here as well, INFRA isn't involved in enabling GH Actions.
>>>> It's simply a matter of adding the workflow to the repository (which
>>>> my PR does). It is automatically enabled once the workflow is merged
>>>> in.
>>>> 
>>>>> 
>>>>> From my point of view using GitHub Actions will be interesting and
>>>>> useful
>>>>> if and only if we add the 'bot' that reruns the failures.
>>>> 
>>>> I think there's other value to consider, also... like replacing
>>>> Travis
>>>> CI (which asks for too many permissions IMO), and having much greater
>>>> control over builds, including having access to surefire reports from
>>>> failed tests, using custom containers, and having access to thousands
>>>> of user-generated recipes to perform custom steps in the build. There
>>>> are multiple benefits.
>>>> 
>>>>> 
>>>>> This is not possible on the new ASF Jenkins, so only committers (I
>>>>> am not
>>>>> sure that you also need some special additional karma) can trigger
>>>>> a new
>>>>> build by logging into Jenkins.
>>>> 
>>>> Unfortunately, that's still true of GH Actions... only committers
>>>> would be able to trigger a rebuild in the GitHub UI for the build
>>>> that
>>>> was automatically triggered when the PR was created (or updated).
>>>> Non-committers can only retrigger by updating the PR, just like they
>>>> do now with Jenkins. However, the one advantage that non-committers
>>>> do
>>>> get is that they can execute the same GH Actions builds in their own
>>>> repo, without needing an account on Jenkins or Travis, during their
>>>> development process, before they even issue a PR. So, they will be
>>>> able to see if their branch is likely to get a green build before
>>>> they
>>>> even open a PR (if they take advantage of that option).
>>>> 
>>>>> 
>>>>> The work of Christopher is not only about enabling GitHub Actions
>>>>> but it is
>>>>> also about cleaning up the validation process and running only part
>>>>> of the
>>>>> tests, this can be discussed on a separate thread (I would like to
>>>>> run all
>>>>> of the tests for instance, not only a selection). So please comment
>>>>> on the
>>>>> PR about the changes.
>>>> 
>>>> Running only part of the tests was not my goal. I just did that as an
>>>> example to show what was possible. I can change it to try to run the
>>>> full build. However, I know that currently doesn't work (too many
>>>> tests fail in the GH Actions runner, just like too many failed in
>>>> Travis, which is why they were disabled there). Either way, follow-on
>>>> work will need to be done to tweak the tests so they can build
>>>> reliably in this environment and eventually replace the Jenkins
>>>> precommit validation. This PR is not yet ready to replace the need
>>>> for
>>>> the Jenkins precommit. It only lays the groundwork for getting there
>>>> eventually.
>>>> 
>>>> What the PR does now is:
>>>> 1. Replace Travis and run *some* tests (Travis ran none)
>>>> 2. Demonstrates how to run multiple tasks in a matrix build
>>>> 3. Lays the groundwork for incremental test improvements so the
>>>> Jenkins precommit job can also be replaced by GH Actions validations,
>>>> so all validations are (eventually) managed in one place.
>>>> 
>>>> The PR can still be changed to do whatever you want. I've merely
>>>> layed
>>>> down the groundwork that can be built upon. If you want it to try to
>>>> run all the tests, I can do that. If you merely want it to replace
>>>> Travis and build no tests (yet), I can do that. If you want me to add
>>>> jdk8 to the matrix build instead of just jdk11, I can do that. If you
>>>> want to have a build with jdk11, but run the full ITs with jdk8, I
>>>> can
>>>> do that. If you want to try to run the full ITs with both jdk8 and
>>>> jdk11, I can do that. Just let me know what you want and I will
>>>> update
>>>> the PR accordingly.
>>>> 
>>>> In essence, there are *far* more options available to the project
>>>> than
>>>> the "option 1" and "option 2" described below. It is the enormous
>>>> possibility space that GH Actions opens up, that I think is the real
>>>> advantage to the project, far more than any other advantages.
>>>> 
>>>>> 
>>>>> Here the discussion is more about
>>>>> "Use GH actions and rerun the tests" (option 1)
>>>>> vs
>>>>> "Use ASFK jenkins for everything + Travis for PRs on additional
>>>>> architectures" (option 2, that is to keep the current situation)
>>>>> 
>>>> 
>>>> My PR does keep the Travis build for s390x builds (without tests),
>>>> because getting that arch building in GH Actions is tricky, so I
>>>> didn't try it on my first attempt. It may be possible to get that to
>>>> build also.
>>>> 
>>>>> Regards
>>>>> Enrico
>>>>> [1] https://github.com/apache/zookeeper/pull/1508
>>>> 
>>>> I hope that clarifies a little bit of what I was trying to accomplish
>>>> in my PR, since my intentions may not be identical to what others
>>>> were
>>>> hoping it would achieve. Feel free to ask me additional questions, or
>>>> to direct me to make changes to my PR in any way. I realize that this
>>>> is something that may not get merged right away (or ever)...
>>>> especially if it doesn't achieve what you were hoping it would
>>>> achieve. But, I still think there's advantages to using GH Actions
>>>> that are worth considering.
>>>> 
>>>> Thanks,
>>>> Christopher
>>> 
>>> 
>>> 
>

Re: Pull Request Validation on GitHub Actions (ZOOKEEPER-3973)

Reply via email to