bobbai00 opened a new pull request, #5417: URL: https://github.com/apache/texera/pull/5417
### What changes were proposed in this PR? Auto-generates each module's `NOTICE-binary` from the third-party `META-INF/NOTICE` files in its bundled jars — replacing the hand-curated subsets introduced in #4668 — and adds a CI drift-check so the committed files can never silently rot when dependencies change. - **New generator — `bin/licensing/generate_notice_binary.py`:** walks a module's dist `lib/` dir, extracts every `META-INF/NOTICE` (and root-level `NOTICE`) from each bundled jar, skips first-party `org.apache.texera.*` jars, dedupes by content hash so jars sharing an upstream notice collapse into one block, prepends the project's own root `NOTICE`, and emits one block per unique notice with a synthesized heading + the contributing-jar list. Output is deterministic (CRLF→LF normalized, stably sorted by jar-count). An optional `--extras <file>` appends non-jar attributions. - **`amber/NOTICE-binary-extras` (new):** the aiohttp + Matplotlib notices, which ship as Python wheels (not jars) and so can't be extracted from the `lib/` dir. - **6 per-module `NOTICE-binary` files regenerated** from the actual bundled jars: `amber`, `access-control-service`, `config-service`, `file-service`, `computing-unit-managing-service`, `workflow-compiling-service`. - **CI drift-check (`build.yml`):** after each dist is built and unzipped, a new step regenerates that module's `NOTICE-binary` and diffs it against the committed file, failing the build with a one-line fix-up command on any drift. The amber check runs in the scala job; the five platform services are each checked in the per-service `platform` matrix job, alongside the existing `LICENSE-binary` check. `LICENSE-binary` stays hand-maintained (it needs human judgment on each license); only `NOTICE-binary` — a mechanical carry-forward of upstream notices — is generated. So future dep bumps fail CI with the exact command to regenerate, instead of silently drifting. ### Any related issues, documentation, discussions? Closes #4674 Builds on #4668 (already merged). Slated for the v1.2 milestone, per the issue discussion. ASF guidance: https://infra.apache.org/licensing-howto.html (Apache-2.0 §4(d)). ### How was this PR tested? - Built all six module dists locally (`sbt <project>/Universal/stage`) and ran the generator against each freshly-built `lib/`; the committed `NOTICE-binary` files are byte-identical to the generator output, so the new CI drift-check passes for every module. - Verified the existing `LICENSE-binary` checks (`check_binary_deps.py`, PR mode) still pass against the same libs for all six modules. - `build.yml` validated as well-formed YAML. ### Was this PR authored or co-authored using generative AI tooling? Generated-by: Claude Opus 4.8 (1M context) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
