cloud-fan opened a new pull request, #55548: URL: https://github.com/apache/spark/pull/55548
### What changes were proposed in this pull request? Adds a diagnostic banner to the unidoc step in `docs/_plugins/build_api_docs.rb`. When `build/sbt unidoc` fails, the script now scans the captured sbt output and prints a framed summary naming the `<Class>.html` javadoc was generating when it died, the inferred source class to audit, and a one-paragraph hint about the usual scaladoc triggers. Implementation: - `stream_and_capture` tees sbt output to both stdout and `target/unidoc-build.log` (Ruby-only, no shell `pipefail` reliance). - `diagnose_unidoc_failure` finds the last `Generating .../<Class>.html...` line before `javadoc exited with exit code N` and prints a culprit-pointer banner. ANSI colour codes are stripped before regex matching. - When the failure mode doesn't match the mid-HTML-crash pattern (e.g. scaladoc failure, sbt env issue), the banner says so and points back to the full log. ### Why are the changes needed? Today, when javadoc hard-exits during unidoc HTML generation -- typically because of a specific scaladoc construct (e.g. wiki-style `[[Class]]` links or backtick-inline code refs) in an exposed Scala source -- the failing PR's CI log shows ~100 `[error]` lines on `target/java/...` files. Those errors are benign: they're genjavadoc-emitted Java stubs (`static public abstract R apply(T1, T2, T3, T4)`) that every PR produces, and `javadoc` always complains about them but normally still finishes. They are not the cause of the failure. The actual signal is the last `Generating .../<Class>.html...` line before `javadoc exited with exit code 1`, which a developer has to find by hand in a multi-thousand-line log. The error reporting does not differentiate the benign noise from the real crash, so the failure consistently looks like it's "in" `ErrorInfo.java` / `LexicalThreadLocal.java` / similar, when it's actually in a Scala source that none of those names point to. A recent example: PR #51419 hit this exact misdirection -- the log was full of errors on `common/utils/target/java/...` stubs, but the real culprit was a doc comment in `CatalogV2Implicits.IdentifierHelper` that triggered a hard exit during HTML generation. The diagnostic in this PR would have named that class directly. ### Does this PR introduce _any_ user-facing change? No. CI-only output change visible in the unidoc step of the doc-gen job. ### How was this patch tested? - Dry-ran the parser logic against the captured failing log from PR #51419 -- it correctly extracts `org/apache/spark/sql/connector/catalog/CatalogV2Implicits.IdentifierHelper.html` as the crash class. - The second commit on this branch (`DO NOT MERGE: break a docstring to validate the unidoc diagnostic`) intentionally reintroduces the same `[[...]]`+backtick-inline scaladoc pattern in `CatalogV2Implicits.IdentifierHelper.asTableIdentifierOpt` so that this PR's CI run actually exercises the new path. Once the banner fires and names that class on the failing CI run, that commit will be dropped from this PR. ### Was this patch authored or co-authored using generative AI tooling? Generated-by: Claude (Anthropic) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
