Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests [v12]

Christian Stein Mon, 13 Jun 2022 08:05:40 -0700

On Mon, 13 Jun 2022 06:45:49 GMT, Magnus Ihse Bursie <i...@openjdk.org> wrote:


>> With project Skara, the ability to run a set of sanity build and test jobs 
>> on selected platforms was added. This functionality was driven by 
>> `.github/workflows/submit.yml`. This file unfortunately lacks any real 
>> structure, and contains a lot of code duplication and redundancy. This has 
>> made it hard to add functionality, and new platforms to test, and it has 
>> made it even harder to debug issues. (This is hard enough as it is, since we 
>> have no direct access to the platforms that GHA runs on.)
>> 
>> Since the GHA tests are important for a large subset of the community, we 
>> need to do better. 
>> 
>> ## GitHub Actions framework rewrite
>>  
>> This is a complete overhaul of the GHA testing framework. I started out 
>> trying to just tease the old `submit.yml` apart, trying to de-duplicate 
>> code, but I soon realized a much more thorough rework was needed.
>> 
>> ### Design description
>> 
>> The principle for the new design was to avoid code duplication, and to 
>> improve readability of the code. The latter is extra important since the GHA 
>> "language" is very limited, needs a lot of quirks and workarounds, and is 
>> probably not well known by many OpenJDK developers. I've strived to find 
>> useful layers of abstraction to make the expressions as clear as possible.
>> 
>> Unfortunately, the Workflow/Action YAML language is quite limited. There are 
>> two ways to avoid duplication, "local composite actions" and "callable 
>> workflows". They both have several limitations:
>> 
>>  * "Callable workflows" can only be used in a single redirection. They are 
>> (apparently) inlined into the "calling workflow" at run time, and as such, 
>> they are present without having to check out the source code. (Which is a 
>> lengthy process.)
>> 
>>  * "Local composite actions" can use other actions, but you must start by 
>> checking out the repo.
>> 
>> To use the strength of both kinds of sub-modules, I'm using "callable 
>> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It 
>> is not allowed to mix "strategies" (that is, the method of automatically 
>> creating a test matrix) when calling callable workflows, so I needed to have 
>> some amount of duplication in `main.yml` that could have been avoided 
>> otherwise.
>> 
>> All the callable workflows need to check out the source code anyway, so 
>> there is no real additional cost of using "local composite actions" for 
>> abstraction of these workflows. (A bit of a lucky break.) I've created "high 
>> level" actions, corresponding to something like a function call. The goal 
>> here was both to avoid duplication, and to improve readability of the 
>> workflows.
>> 
>> The four `build-<platform>.yml` files are very similar. But in the end of 
>> the day, only like 50% of the source code is shared, and the platform 
>> specific changes permeate the files. So I decided to keep them separately, 
>> since mixing them all into one would have made a mess, due to the lack of 
>> proper abstraction mechanisms. But that also mean that if we change platform 
>> independent code in building, we need to remember to update it in all four 
>> places.
>> 
>> In the strictest sense, this is a "refactoring" in that the functionality 
>> should be equal to the old `submit.yml`. The same platforms should build, 
>> with the same arguments, and the same tests should run. When I look at the 
>> code now, I see lots of potential for improvement here, by rethinking what 
>> we do run. But let's save that discussion for the next PR.
>> 
>> There is one major change, though. Windows is no longer running on Cygwin, 
>> but on MSYS2. This was not really triggered by the recurring build issues on 
>> Cygwin (though that certainly did help me in thinking I made the right 
>> choice), but the sheer impossibility of getting Cygwin to behave as a normal 
>> unix shell on GHA Windows hosts. I spent countless hours trying to work out 
>> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn 
>> of the POSIX compliance mode that kept turning on by itself and made bash 
>> choke on several of our scripts, by playing tricks with the `PATH`, but in 
>> the end to no avail. There were no single combination of hacks and 
>> workarounds that could get us past the entire chain from configure, to 
>> build, to testing. (The old solution user PowerShell instead to get around 
>> these limitations.) I'm happy to report that I have had absolutely zero 
>> issues with MSYS2 since I made the switch (and understood how to set the 
>> PATH properly), and I'm seriously c
 onsidering switching stance to recommend using MSYS2 instead of Cygwin as the 
primary winenv for building the JDK.
>> 
>> ### Example run
>> 
>> A good example on how a run looks like with the new GHA system is [the run 
>> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164).
>> 
>> ### New features
>> 
>> While the primary focus was to convert the old system to a new framework, 
>> more accommodating to development, and to wait with further enhancements for 
>> the future, I have made a few additional features already in this PR. Most 
>> of them are related to needs that arose during development of this PR.
>> 
>> * A build failure summary, similar to the recently added test failure 
>> summary, is added when the build step fails
>> 
>> * The test reporting has been extended to all platforms, including Windows
>> 
>> * Test reporting has been improved slightly, and gotten multiple bug fixes
>> 
>> * All artifacts are now available for individual download. This includes:
>> 
>>   * The build bundles, per platform
>>   * The test results, per platform and test suite
>>   * Build failure logs, in case of build failure
>> 
>>   The build bundles have a retention period of 24 h, but the rest uses 
>> GitHub's default retention period (currently 90 days). The idea is that you 
>> can use GHA to download builds for platforms you might not have access to, 
>> but after that, conserving the builds does not make sense. GitHub currently 
>> provides free, unlimited storage (within the retention period) for 
>> artifacts, so we can afford this.
>> 
>> * The GHA process starts up much faster, which mean that e.g. a build 
>> failure on an exotic platform will show up earlier. This will not really 
>> affect the overall run time though, since it is bounded by variables such as 
>> queuing for workers, and waiting on tests with somewhat arbitrarily run 
>> times to finish.
>> 
>> ### Additional changes outside GHA
>> 
>> I also needed to make a few tweaks to the build system to play nice with the 
>> new GHA code.
>> 
>> * The build failure summary is now stored in 
>> build/$BUILD/make-support/failure-summary.log
>> 
>> * The configure summary now indicates what devkit or sysroot is used, if any
>> 
>> * The --with-sysroot argument is now properly normalized
>> 
>> ### Test failures
>> 
>> A handful of tests, which relies on shell behavior, turned out to fail on 
>> Windows when running under MSYS2. I have filed separate bugs, and submitted 
>> PRs, to get these fixed:
>> 
>> * https://bugs.openjdk.org/browse/JDK-8287902
>> 
>> * https://bugs.openjdk.org/browse/JDK-8287895
>
> Magnus Ihse Bursie has updated the pull request incrementally with one 
> additional commit since the last revision:
> 
>   Remove bundle artifacts

Splitting logic into smaller actions, scripts and workflow definition is great!
`jtreg`-related changes look good to me.

-------------

Marked as reviewed by cstein (Author).

PR: https://git.openjdk.org/jdk/pull/9063

Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests [v12]

Reply via email to