cloud-fan opened a new pull request, #55563:
URL: https://github.com/apache/spark/pull/55563
### What changes were proposed in this pull request?
Today, three common Spark CI failures cascade into ~7 red checks each,
surfacing only a generic "Process completed with exit code 1" — the actual
file:line lives buried in a multi-megabyte sbt log. This PR teaches each of the
three to emit GitHub Actions `::error` annotations so the real culprit shows up
inline on the PR's "Files changed" tab.
**1. `dev/scalastyle` — scalastyle violations as annotations.** When running
under GitHub Actions, parse scalastyle's native `error file=PATH message=...
line=N` console output and re-emit it as `::error
file=...,line=...,title=Scalastyle::...`. A single violation in catalyst that
previously cascaded into Linters / Java 17 Maven / Java 25 Maven /
Documentation generation / sparkr / Docker integration / TPC-DS now appears as
one inline annotation in every failing job.
**2. `docs/_plugins/build_api_docs.rb` — unidoc PR-scope hazard scan.**
Extend SPARK-56630's `diagnose_unidoc_failure` so its third branch (javadoc
died *before* per-class HTML generation, so no `Generating .../X.html` line is
available to pin a culprit) also runs a small PR-scope scan. The scan diffs the
changed files against the master tip (using the docs job's existing `Merged
commit` layout) and reports doc-tag patterns known to crash the standard
doclet's tree builder. Today only one pattern is checked: block-form
`@inheritDoc` (which is not a valid Javadoc tag — Spark's custom block tag is
the lowercase `@inheritdoc` registered via `SparkBuild.scala`'s `-tag
inheritdoc`). Each hit is also emitted as a GitHub annotation.
**3. `dev/run-tests.py` — Scala/Java compile errors as annotations.**
`exec_sbt` already streams sbt output line by line. Add a side-channel that, in
CI, matches sbt's canonical `[error] PATH:LINE[:COL]: message` shape and
re-emits it as `::error file=...,line=...,title=Compile error::...`.
genjavadoc-generated stub paths under `target/java/...` are filtered out —
those errors are intentionally non-fatal (`--ignore-source-errors` is set for
unidoc) and would otherwise drown the actually-actionable annotations.
All three sites use the same workflow command syntax, so any failing job
annotated this way shows the exact violation inline next to the offending line
— no full job log needed, no cross-job navigation needed.
### Why are the changes needed?
Cascade failures are the dominant source of CI debug time on this project. A
single scalastyle violation in catalyst recently caused 7 red checks at once,
each surfacing only a generic "exit code 1" annotation; the user had to grep
through a multi-megabyte job log to find the actual file:line. Same shape for
compile errors and the recent block-form-`@inheritDoc` unidoc crash. GitHub
Actions has structured annotation support for exactly this case — Spark just
isn't using it yet for these three failure modes.
### Does this PR introduce _any_ user-facing change?
No — CI-only ergonomics. No change to how the checks succeed or fail; only
the in-PR rendering of failure detail.
### How was this patch tested?
Each parser was unit-checked locally with representative real samples taken
from prior CI runs:
- **scalastyle:** `dev/scalastyle`'s post-processor against scalastyle's
native console output — confirmed correct conversion to `::error
file=...,line=...,title=Scalastyle::...` syntax.
- **unidoc hazard scan:** the regex was checked against the pattern that
caused the recent SPARK-52729 unidoc crash (block-form `@inheritDoc`) plus
near-miss forms that must NOT match (`{@inheritDoc}` inline, lowercase
`@inheritdoc`, trailing text after the tag).
- **sbt compile errors:** matcher confirmed against real Spark `[error]
PATH:N:C: msg` lines and against genjavadoc stubs (`target/java/...`) that must
be filtered.
End-to-end validation will come from this PR's own CI run; if any of the
three emitters mis-classifies real output, the worst case is an extra noisy
annotation — the underlying tool output is unchanged.
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude (Anthropic)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]