yashmayya opened a new pull request, #17818: URL: https://github.com/apache/pinot/pull/17818
[This release wiki](https://cwiki.apache.org/confluence/display/PINOT/Creating+an+Apache+Release) describes a manual, error-prone process for updating `LICENSE-binary` and `NOTICE-binary` that involves temporarily hacking `pinot-distribution/pom.xml`, generating HTML dependency reports, copy-pasting from a browser, hacking POMs again to wire the `maven-shade-plugin`'s `ApacheNoticeResourceTransformer`, and then reverting all changes. This script replaces that entire process. <hr> <h3>What the script does:</h3> - Parses `pinot-assembly.xml` to discover the exact set of modules shipped in the binary tarball - the single source of truth for what needs licensing - then runs `mvn dependency:list` on each to resolve the full transitive dependency set (compile + runtime scope, excluding `org.apache.pinot` artifacts). - Diffs old vs new dependencies by parsing the current `LICENSE-binary`, computing added/removed/version-bumped deps, and auto-detecting licenses for new deps from POM metadata (<licenses> element) to classify them into the correct license section. - Generates an updated `LICENSE-binary` with deps sorted within each section, version bumps applied in-place, removals cleaned up, and new deps inserted under the right license heading - plus a human-readable report summarizing all changes - Generates an updated `NOTICE-binary` by extracting `META-INF/NOTICE` from each dependency JAR in ~/.m2/repository, stripping duplicate ASF boilerplate, deduplicating by title, and merging into a single file. - Flags items needing manual attention: deps whose license couldn't be auto-detected, deps that changed license type on a version bump, empty license sections, and orphaned files in licenses-binary/ <hr> <h3>Improvements over the manual process:</h3> - More accurate: The manual process adds all `pinot-connectors` modules as temporary deps, which over-includes connectors not shipped in the binary (e.g. `pinot-flink-connector` is not in `pinot-assembly.xml`). The script only considers what is actually shipped. This also avoids the issue of the wiki getting outdated as new modules are added. - Less error-prone: No temporary POM hacking, no copy-pasting from HTML, no risk of forgetting to revert changes or missing a module. The wiki is also out of date - it referenced the `-Ppresto-driver` profile which no longer exists and would miss `pinot-timeseries-m3ql` which is in the assembly. It also introduced some issues related to the Maven enforcer plugin due to the way the entire dependency list is bundled into the `pinot-distribution`. There were also some issues with generating the notice binary through the old manual way due to some [pinot-cli](https://github.com/apache/pinot/blob/master/pinot-clients/pinot-cli/pom.xml) related changes. - Reproducible and diffable: Running the script twice on the same codebase produces the same output. The generated report makes it easy to review exactly what changed. <hr> <h3>Usage:</h3> ``` # Build first (needed to populate ~/.m2 with dependency JARs) mvn clean install -DskipTests -T1C ``` ``` # Update both LICENSE-binary and NOTICE-binary (writes *.updated files) python3 scripts/update-release-binary.py ``` ``` # Update only LICENSE-binary or only NOTICE-binary python3 scripts/update-release-binary.py --license-only python3 scripts/update-release-binary.py --notice-only ``` ``` # Preview changes without writing files python3 scripts/update-release-binary.py --report-only ``` A few things still need human review: - **TODO deps**: The report lists deps under `REQUIRES MANUAL LICENSE LOOKUP` whose license couldn't be auto-detected from POM metadata. Visit each project's website to determine the license, then add the dep to the appropriate section. If it's a new license type, create a new file in `licenses-binary/` and a new section in `LICENSE-binary`. - **Review the diff**: Before replacing, diff the generated files against the originals (`diff LICENSE-binary LICENSE-binary.updated`) to sanity-check the changes - version bumps, additions, removals should all make sense. - **Replace the files**: Once satisfied, copy the updated files over the originals: ```bash cp LICENSE-binary.updated LICENSE-binary cp NOTICE-binary.updated NOTICE-binary - Empty sections and orphaned files: The report flags license sections that became empty and files in `licenses-binary/` that are no longer referenced. Consider removing them. - Pre-existing duplicates: The original `LICENSE-binary` has 3 duplicate entries (`jnr-x86asm` in two sections, `netty-resolver-dns-native-macos` and `netty-transport-native-epoll` listed twice). The script preserves these as-is - clean them up if desired. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
