Re: RFR: 8287366: Improve test failure reporting in GHA

Magnus Ihse Bursie Mon, 06 Jun 2022 14:11:51 -0700

On Mon, 6 Jun 2022 12:57:25 GMT, Jaikiran Pai <j...@openjdk.org> wrote:


>> It is currently both tricky and tedious to figure out what went wrong when a 
>> jtreg test fails in GHA.
>> 
>> We should utilize the full potential of GitHub Action summaries and error 
>> annotations to make finding failures easier and more discoverable.
>> 
>> With this PR, the overview of failures are presented on the "Summary" page 
>> for the action (the top-most line to the left, with the outline house icon). 
>> Below the `submit.yml` dependency graph, you'll find the annotations, which 
>> will look like this:
>> 
>> 
>> Linux x86 (jdk/tier1 part 1)
>> Test run reported 34 test failure(s) and 0 error(s). See summary for details.
>> 
>> 
>> Below the annotations follow the summaries. Go have a look at the runs for 
>> this PR to see what it looks like! In short, there is a separate summary per 
>> test job. The first part lists the names of the failed tests. This will 
>> always be included. Below this (with links from the summary list) are 
>> detailed information for each failed test. This include the jtreg output, 
>> and the `hs_err` file(s), if present. The latter part has a limit from 
>> Github on 1 MB. If this limit is broken, no detailed information at all is 
>> presented (sorry 'bout that; GitHub's rules).
>> 
>> This PR is deliberately based on a commit prior to the fix for JDK-8287137 
>> (Problemlist failing x86_32 tests after Loom integration), so you can see 
>> for yourself how the GHA runs looks in case of a "train wreck" testing 
>> situation, like on x86 after Loom. As you can see, most of the output part 
>> of the summaries got larger than the 1 MB limit, which means they were not 
>> shown. Only the summary for `Linux x86 (hs/tier1 runtime)` displays as 
>> intended. OTOH, this shows that the system has a "graceful degradation" mode 
>> for even large amount of failures like this. And, since I don't see a Loom 
>> v2.0 coming anytime soon, I believe this amount of failed tests are unlikely 
>> to be a realistic scenario.
>> 
>> Finally: the duplication in submit.yml is really, really annoying. :-( I 
>> have copied the same code block to three places. The fourth place, for 
>> Windows, do not get any support at this time. Concurrently with this change, 
>> I have started a separate branch where I split up submit.yml into reusable 
>> parts, using "callable workflows" and "custom actions". As part of this 
>> effort, I will also change the windows jobs to use cygwin bash instead of 
>> PowerShell. Until then, I could not be bothered to even think about 
>> implementing this functionality in PS. When that change is integrated, 
>> Windows will get this functionality for free, too.
>
>> With this PR, the overview of failures are presented on the "Summary" page 
>> for the action (the top-most line to the left, with the outline house icon).
> 
> @magicus, thank you. This is really useful. I didn't even know that this 
> "Summary" page existed. I now checked this page on one of my PRs (which 
> includes this commit) and it does indeed make it much simpler to analyze 
> these failures.

@jaikiran Thanks for the kind words. I think I should perhaps do some tweaking 
to the Skara bots that link to the GHA runs, so it easier to go to the summary 
page.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8901

Re: RFR: 8287366: Improve test failure reporting in GHA

Reply via email to