bobbai00 opened a new issue, #4674: URL: https://github.com/apache/texera/issues/4674
### What happened? After #4668 lands, the per-module `NOTICE-binary` files describe each Docker image's bundled third-party content, but they're hand-curated subsets of the previously-curated root `NOTICE-binary`. Hand-curated NOTICE files rot fast — every dep bump silently drifts the committed content from what the jars' `META-INF/NOTICE` actually carry. ASF compliance under Apache-2.0 §4(d) requires reproducing the attribution notices in every Apache-2.0 dep's bundled `NOTICE` file. Those notices live in each jar's `META-INF/NOTICE`. The right source of truth is the jars themselves. ### Proposed change Add a generator that produces each `<module>/NOTICE-binary` from the actual bundled jars: 1. Walks the module's `lib/` dir. 2. For each jar, extracts every `META-INF/NOTICE`-style file. 3. Dedupes by content hash so jars sharing an upstream NOTICE collapse into one block. 4. Emits one block per unique blob with a synthesized project heading + the verbatim upstream content. 5. Optional `--extras` for non-jar attributions (Apache-2.0 Python wheels like aiohttp + Matplotlib that don't ship a NOTICE inside any jar). Then add a CI check that regenerates `<module>/NOTICE-binary` against the freshly-built dist `lib/` and diffs against the committed file. Drift fails the build with a one-line fix-up command. ### Version 1.1.0-incubating (Pre-release/Master) ### Depends on This change requires #4668 to land first (which introduces the per-module `NOTICE-binary` files in the first place). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
