[PR] [SPARK-56630][INFRA] Surface javadoc crash culprit in unidoc failure output [spark]

via GitHub Fri, 24 Apr 2026 20:40:00 -0700


cloud-fan opened a new pull request, #55548:
URL: https://github.com/apache/spark/pull/55548


   ### What changes were proposed in this pull request?
   
   Adds a diagnostic banner to the unidoc step in 
`docs/_plugins/build_api_docs.rb`. When `build/sbt unidoc` fails, the script 
now scans the captured sbt output and prints a framed summary naming the 
`<Class>.html` javadoc was generating when it died, the inferred source class 
to audit, and a one-paragraph hint about the usual scaladoc triggers.
   
   Implementation:
   - `stream_and_capture` tees sbt output to both stdout and 
`target/unidoc-build.log` (Ruby-only, no shell `pipefail` reliance).
   - `diagnose_unidoc_failure` finds the last `Generating .../<Class>.html...` 
line before `javadoc exited with exit code N` and prints a culprit-pointer 
banner. ANSI colour codes are stripped before regex matching.
   - When the failure mode doesn't match the mid-HTML-crash pattern (e.g. 
scaladoc failure, sbt env issue), the banner says so and points back to the 
full log.
   
   ### Why are the changes needed?
   
   Today, when javadoc hard-exits during unidoc HTML generation -- typically 
because of a specific scaladoc construct (e.g. wiki-style `[[Class]]` links or 
backtick-inline code refs) in an exposed Scala source -- the failing PR's CI 
log shows ~100 `[error]` lines on `target/java/...` files. Those errors are 
benign: they're genjavadoc-emitted Java stubs (`static public abstract R 
apply(T1, T2, T3, T4)`) that every PR produces, and `javadoc` always complains 
about them but normally still finishes. They are not the cause of the failure.
   
   The actual signal is the last `Generating .../<Class>.html...` line before 
`javadoc exited with exit code 1`, which a developer has to find by hand in a 
multi-thousand-line log. The error reporting does not differentiate the benign 
noise from the real crash, so the failure consistently looks like it's "in" 
`ErrorInfo.java` / `LexicalThreadLocal.java` / similar, when it's actually in a 
Scala source that none of those names point to.
   
   A recent example: PR #51419 hit this exact misdirection -- the log was full 
of errors on `common/utils/target/java/...` stubs, but the real culprit was a 
doc comment in `CatalogV2Implicits.IdentifierHelper` that triggered a hard exit 
during HTML generation. The diagnostic in this PR would have named that class 
directly.
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. CI-only output change visible in the unidoc step of the doc-gen job.
   
   ### How was this patch tested?
   
   - Dry-ran the parser logic against the captured failing log from PR #51419 
-- it correctly extracts 
`org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html`
 as the crash class.
   - The second commit on this branch (`DO NOT MERGE: break a docstring to 
validate the unidoc diagnostic`) intentionally reintroduces the same 
`[[...]]`+backtick-inline scaladoc pattern in 
`CatalogV2Implicits.IdentifierHelper.asTableIdentifierOpt` so that this PR's CI 
run actually exercises the new path. Once the banner fires and names that class 
on the failing CI run, that commit will be dropped from this PR.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   
   Generated-by: Claude (Anthropic)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-56630][INFRA] Surface javadoc crash culprit in unidoc failure output [spark]

Reply via email to