cloud-fan opened a new pull request, #55563:
URL: https://github.com/apache/spark/pull/55563

   ### What changes were proposed in this pull request?
   
   Today, three common Spark CI failures cascade into ~7 red checks each, 
surfacing only a generic "Process completed with exit code 1" — the actual 
file:line lives buried in a multi-megabyte sbt log. This PR teaches each of the 
three to emit GitHub Actions `::error` annotations so the real culprit shows up 
inline on the PR's "Files changed" tab.
   
   **1. `dev/scalastyle` — scalastyle violations as annotations.** When running 
under GitHub Actions, parse scalastyle's native `error file=PATH message=... 
line=N` console output and re-emit it as `::error 
file=...,line=...,title=Scalastyle::...`. A single violation in catalyst that 
previously cascaded into Linters / Java 17 Maven / Java 25 Maven / 
Documentation generation / sparkr / Docker integration / TPC-DS now appears as 
one inline annotation in every failing job.
   
   **2. `docs/_plugins/build_api_docs.rb` — unidoc PR-scope hazard scan.** 
Extend SPARK-56630's `diagnose_unidoc_failure` so its third branch (javadoc 
died *before* per-class HTML generation, so no `Generating .../X.html` line is 
available to pin a culprit) also runs a small PR-scope scan. The scan diffs the 
changed files against the master tip (using the docs job's existing `Merged 
commit` layout) and reports doc-tag patterns known to crash the standard 
doclet's tree builder. Today only one pattern is checked: block-form 
`@inheritDoc` (which is not a valid Javadoc tag — Spark's custom block tag is 
the lowercase `@inheritdoc` registered via `SparkBuild.scala`'s `-tag 
inheritdoc`). Each hit is also emitted as a GitHub annotation.
   
   **3. `dev/run-tests.py` — Scala/Java compile errors as annotations.** 
`exec_sbt` already streams sbt output line by line. Add a side-channel that, in 
CI, matches sbt's canonical `[error] PATH:LINE[:COL]: message` shape and 
re-emits it as `::error file=...,line=...,title=Compile error::...`. 
genjavadoc-generated stub paths under `target/java/...` are filtered out — 
those errors are intentionally non-fatal (`--ignore-source-errors` is set for 
unidoc) and would otherwise drown the actually-actionable annotations.
   
   All three sites use the same workflow command syntax, so any failing job 
annotated this way shows the exact violation inline next to the offending line 
— no full job log needed, no cross-job navigation needed.
   
   ### Why are the changes needed?
   
   Cascade failures are the dominant source of CI debug time on this project. A 
single scalastyle violation in catalyst recently caused 7 red checks at once, 
each surfacing only a generic "exit code 1" annotation; the user had to grep 
through a multi-megabyte job log to find the actual file:line. Same shape for 
compile errors and the recent block-form-`@inheritDoc` unidoc crash. GitHub 
Actions has structured annotation support for exactly this case — Spark just 
isn't using it yet for these three failure modes.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No — CI-only ergonomics. No change to how the checks succeed or fail; only 
the in-PR rendering of failure detail.
   
   ### How was this patch tested?
   
   Each parser was unit-checked locally with representative real samples taken 
from prior CI runs:
   
   - **scalastyle:** `dev/scalastyle`'s post-processor against scalastyle's 
native console output — confirmed correct conversion to `::error 
file=...,line=...,title=Scalastyle::...` syntax.
   - **unidoc hazard scan:** the regex was checked against the pattern that 
caused the recent SPARK-52729 unidoc crash (block-form `@inheritDoc`) plus 
near-miss forms that must NOT match (`{@inheritDoc}` inline, lowercase 
`@inheritdoc`, trailing text after the tag).
   - **sbt compile errors:** matcher confirmed against real Spark `[error] 
PATH:N:C: msg` lines and against genjavadoc stubs (`target/java/...`) that must 
be filtered.
   
   End-to-end validation will come from this PR's own CI run; if any of the 
three emitters mis-classifies real output, the worst case is an extra noisy 
annotation — the underlying tool output is unchanged.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude (Anthropic)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to