Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests [v3]

Magnus Ihse Bursie Thu, 09 Jun 2022 05:45:34 -0700

> With project Skara, the ability to run a set of sanity build and test jobs on 
> selected platforms was added. This functionality was driven by 
> `.github/workflows/submit.yml`. This file unfortunately lacks any real 
> structure, and contains a lot of code duplication and redundancy. This has 
> made it hard to add functionality, and new platforms to test, and it has made 
> it even harder to debug issues. (This is hard enough as it is, since we have 
> no direct access to the platforms that GHA runs on.)
> 
> Since the GHA tests are important for a large subset of the community, we 
> need to do better. 
> 
> ## GitHub Actions framework rewrite
>  
> This is a complete overhaul of the GHA testing framework. I started out 
> trying to just tease the old `submit.yml` apart, trying to de-duplicate code, 
> but I soon realized a much more thorough rework was needed.
> 
> ### Design description
> 
> The principle for the new design was to avoid code duplication, and to 
> improve readability of the code. The latter is extra important since the GHA 
> "language" is very limited, needs a lot of quirks and workarounds, and is 
> probably not well known by many OpenJDK developers. I've strived to find 
> useful layers of abstraction to make the expressions as clear as possible.
> 
> Unfortunately, the Workflow/Action YAML language is quite limited. There are 
> two ways to avoid duplication, "local composite actions" and "callable 
> workflows". They both have several limitations:
> 
>  * "Callable workflows" can only be used in a single redirection. They are 
> (apparently) inlined into the "calling workflow" at run time, and as such, 
> they are present without having to check out the source code. (Which is a 
> lengthy process.)
> 
>  * "Local composite actions" can use other actions, but you must start by 
> checking out the repo.
> 
> To use the strength of both kinds of sub-modules, I'm using "callable 
> workflows" from `main.yml` to call `build-<platform>.yml` and `test.yml`. It 
> is not allowed to mix "strategies" (that is, the method of automatically 
> creating a test matrix) when calling callable workflows, so I needed to have 
> some amount of duplication in `main.yml` that could have been avoided 
> otherwise.
> 
> All the callable workflows need to check out the source code anyway, so there 
> is no real additional cost of using "local composite actions" for abstraction 
> of these workflows. (A bit of a lucky break.) I've created "high level" 
> actions, corresponding to something like a function call. The goal here was 
> both to avoid duplication, and to improve readability of the workflows.
> 
> The four `build-<platform>.yml` files are very similar. But in the end of the 
> day, only like 50% of the source code is shared, and the platform specific 
> changes permeate the files. So I decided to keep them separately, since 
> mixing them all into one would have made a mess, due to the lack of proper 
> abstraction mechanisms. But that also mean that if we change platform 
> independent code in building, we need to remember to update it in all four 
> places.
> 
> In the strictest sense, this is a "refactoring" in that the functionality 
> should be equal to the old `submit.yml`. The same platforms should build, 
> with the same arguments, and the same tests should run. When I look at the 
> code now, I see lots of potential for improvement here, by rethinking what we 
> do run. But let's save that discussion for the next PR.
> 
> There is one major change, though. Windows is no longer running on Cygwin, 
> but on MSYS2. This was not really triggered by the recurring build issues on 
> Cygwin (though that certainly did help me in thinking I made the right 
> choice), but the sheer impossibility of getting Cygwin to behave as a normal 
> unix shell on GHA Windows hosts. I spent countless hours trying to work out 
> limitations, by setting `SHELLOPTS=igncr`, by running `set +x posix` to turn 
> of the POSIX compliance mode that kept turning on by itself and made bash 
> choke on several of our scripts, by playing tricks with the `PATH`, but in 
> the end to no avail. There were no single combination of hacks and 
> workarounds that could get us past the entire chain from configure, to build, 
> to testing. (The old solution user PowerShell instead to get around these 
> limitations.) I'm happy to report that I have had absolutely zero issues with 
> MSYS2 since I made the switch (and understood how to set the PATH properly), 
> and I'm seriously co
 nsidering switching stance to recommend using MSYS2 instead of Cygwin as the 
primary winenv for building the JDK.
> 
> ### Example run
> 
> A good example on how a run looks like with the new GHA system is [the run 
> for this PR](https://github.com/magicus/jdk/actions/runs/2454577164).
> 
> ### New features
> 
> While the primary focus was to convert the old system to a new framework, 
> more accommodating to development, and to wait with further enhancements for 
> the future, I have made a few additional features already in this PR. Most of 
> them are related to needs that arose during development of this PR.
> 
> * A build failure summary, similar to the recently added test failure 
> summary, is added when the build step fails
> 
> * The test reporting has been extended to all platforms, including Windows
> 
> * Test reporting has been improved slightly, and gotten multiple bug fixes
> 
> * All artifacts are now available for individual download. This includes:
> 
>   * The build bundles, per platform
>   * The test results, per platform and test suite
>   * Build failure logs, in case of build failure
> 
>   The build bundles have a retention period of 24 h, but the rest uses 
> GitHub's default retention period (currently 90 days). The idea is that you 
> can use GHA to download builds for platforms you might not have access to, 
> but after that, conserving the builds does not make sense. GitHub currently 
> provides free, unlimited storage (within the retention period) for artifacts, 
> so we can afford this.
> 
> * The GHA process starts up much faster, which mean that e.g. a build failure 
> on an exotic platform will show up earlier. This will not really affect the 
> overall run time though, since it is bounded by variables such as queuing for 
> workers, and waiting on tests with somewhat arbitrarily run times to finish.
> 
> ### Additional changes outside GHA
> 
> I also needed to make a few tweaks to the build system to play nice with the 
> new GHA code.
> 
> * The build failure summary is now stored in 
> build/$BUILD/make-support/failure-summary.log
> 
> * The configure summary now indicates what devkit or sysroot is used, if any
> 
> * The --with-sysroot argument is now properly normalized
> 
> ### Test failures
> 
> A handful of tests, which relies on shell behavior, turned out to fail on 
> Windows when running under MSYS2. I have filed separate bugs, and submitted 
> PRs, to get these fixed:
> 
> * https://bugs.openjdk.org/browse/JDK-8287902
> 
> * https://bugs.openjdk.org/browse/JDK-8287895


Magnus Ihse Bursie has updated the pull request incrementally with one 
additional commit since the last revision:

  Use Java 11 to build jtreg. Also temporarily switch out jtreg source ref to 
PR that fixes build on msys2. (This will be updated once the fix is in an 
official jtreg release.)

-------------

Changes:
  - all: https://git.openjdk.java.net/jdk/pull/9063/files
  - new: https://git.openjdk.java.net/jdk/pull/9063/files/d75222c2..220f9209

Webrevs:
 - full: https://webrevs.openjdk.java.net/?repo=jdk&pr=9063&range=02
 - incr: https://webrevs.openjdk.java.net/?repo=jdk&pr=9063&range=01-02

  Stats: 2 lines in 1 file changed: 0 ins; 0 del; 2 mod
  Patch: https://git.openjdk.java.net/jdk/pull/9063.diff
  Fetch: git fetch https://git.openjdk.java.net/jdk pull/9063/head:pull/9063

PR: https://git.openjdk.java.net/jdk/pull/9063

Re: RFR: 8287906: Rewrite of GitHub Actions (GHA) sanity tests [v3]

Reply via email to