bobbai00 opened a new pull request, #4668:
URL: https://github.com/apache/texera/pull/4668
### What changes were proposed in this PR?
Splits the monolithic root `LICENSE-binary` and `NOTICE-binary` into
per-module ground-truth files so each Docker image's `/texera/LICENSE`
describes only the third-party components actually bundled in that image, per
ASF licensing guidance.
**Per-module files added** (root files kept unchanged for the source
tarball):
| Path | Contents |
|---|---|
| `access-control-service/LICENSE-binary` + `NOTICE-binary` | 113 jars / 18
NOTICE blocks |
| `config-service/LICENSE-binary` + `NOTICE-binary` | 115 jars / 18 NOTICE
blocks |
| `file-service/LICENSE-binary` + `NOTICE-binary` | 310 jars / 25 NOTICE
blocks |
| `workflow-compiling-service/LICENSE-binary` + `NOTICE-binary` | 319 jars /
26 NOTICE blocks |
| `computing-unit-managing-service/LICENSE-binary` + `NOTICE-binary` | 349
jars / 26 NOTICE blocks (only image bundling Bouncy Castle) |
| `amber/LICENSE-binary-java` + `NOTICE-binary` | 404 jars / 27 NOTICE
blocks (`WorkflowExecutionService`, shared by web/master/runner) |
| `amber/LICENSE-binary-python` | 113 packages (master/runner only) |
| `frontend/LICENSE-binary` | 114 npm packages (Angular bundle, dashboard
image) |
| `agent-service/LICENSE-binary` | 57 npm packages |
Counts were derived by enumerating each container's actual bundled jars (`ls
/texera/lib/`), pip-listed Python packages, and `node_modules` (recursively,
including `@scope/name` packages), then filtering the root `LICENSE-binary`
down. No new entries were invented; `combined ⊆ root` strictly.
**New script** — `bin/licensing/concat_license_binary.py`:
- Style-matched to the existing `audit_jar_licenses.py` /
`check_binary_deps.py`.
- Merges multiple per-module LICENSE-binary files at the **license-group
level**: each Apache-2.0 / MIT / BSD / ... section in the output contains all
the ecosystem subsections (`Scala/Java jars:`, `Python packages:`, `Angular /
npm packages:`, `Agent service npm packages:`, `Source files derived from ...`)
inline, rather than stacking the inputs end-to-end.
- Reuses the Apache-2.0 license header verbatim, deduplicates entries by id,
emits a single trailer.
**Dockerfile updates** — 9 dockerfiles:
- 5 standalone Scala services + agent-service: copy only their own
per-module `LICENSE-binary` (and `NOTICE-binary` for the Scala ones) into
`/texera/LICENSE` and `/texera/NOTICE`.
- 3 multi-aspect images run `concat_license_binary.py` at build time:
- `computing-unit-master` and `computing-unit-worker`: union
`amber/LICENSE-binary-java` + `amber/LICENSE-binary-python`.
- `texera-web-application`: union `amber/LICENSE-binary-java` +
`frontend/LICENSE-binary` (cross-stage `COPY --from=build-frontend`).
- `python3-minimal` added to the Scala build stage of the 3 multi-aspect
dockerfiles to run the concat script.
**CI** — `.github/workflows/build.yml` (the workflow `required-checks.yml`
orchestrates):
- The four existing `check_binary_deps.py` invocations (frontend npm, scala
jar, python, agent-npm) now build a fresh combined LICENSE-binary from all 9
per-module files via `concat_license_binary.py /tmp/combined-LICENSE-binary …`,
then pass `--license-binary /tmp/combined-LICENSE-binary` to the existing
tooling. The per-module files become the authoritative claim source for dep
validation.
### Any related issues, documentation, discussions?
Closes #4667
ASF guidance: https://infra.apache.org/licensing-howto.html
### How was this PR tested?
- Local concat smoke tests across all merge scenarios — full union (15
license groups, 829 entries), amber java+python (13 groups, 495 entries), web
app java+frontend (12 groups, 496 entries).
- `combined ⊆ root` verified per ecosystem: jar (544 ⊆ 566), python (113 ==
113), npm (112 == 112), agent-npm (57 == 57). No invented entries.
- 22 stale `javax.*` / `jersey-2.x` / `hk2-2.x` jars currently listed in
root LICENSE-binary but absent from every container correctly drop out of
combined — converts current `STALE` warnings into clean `OK`.
- Container deps were enumerated against `IMAGE_REGISTRY=ghcr.io/apache`
`IMAGE_TAG=61ce334cb` images.
Full Docker build + CI workflow run will validate end-to-end on the PR.
### Was this PR authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Opus 4.7)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]