yashmayya opened a new pull request, #17818:
URL: https://github.com/apache/pinot/pull/17818

   [This release 
wiki](https://cwiki.apache.org/confluence/display/PINOT/Creating+an+Apache+Release)
 describes a manual, error-prone process for updating `LICENSE-binary` and 
`NOTICE-binary` that involves temporarily hacking `pinot-distribution/pom.xml`, 
generating HTML dependency reports, copy-pasting from a browser, hacking POMs 
again to wire the      `maven-shade-plugin`'s 
`ApacheNoticeResourceTransformer`, and then reverting all changes. This script 
replaces that entire process.
   
   <hr>
   
   <h3>What the script does:</h3>
   
     - Parses `pinot-assembly.xml` to discover the exact set of modules shipped 
in the binary tarball - the single source of truth for what needs licensing - 
then runs `mvn dependency:list` on each to resolve the full transitive 
dependency set (compile + runtime scope, excluding `org.apache.pinot` 
artifacts).
     - Diffs old vs new dependencies by parsing the current `LICENSE-binary`, 
computing added/removed/version-bumped deps, and auto-detecting licenses for 
new deps from POM metadata (<licenses> element) to classify them into the 
correct license section.
     - Generates an updated `LICENSE-binary` with deps sorted within each 
section, version bumps applied in-place, removals cleaned up, and new deps 
inserted under the right license heading - plus a human-readable report 
summarizing all changes
     - Generates an updated `NOTICE-binary` by extracting `META-INF/NOTICE` 
from each dependency JAR in ~/.m2/repository, stripping duplicate ASF 
boilerplate, deduplicating by title, and merging into a single file.
     - Flags items needing manual attention: deps whose license couldn't be 
auto-detected, deps that changed license type on a version bump, empty license 
sections, and orphaned files in licenses-binary/
   
   <hr>
   
   <h3>Improvements over the manual process:</h3>
   
     - More accurate: The manual process adds all `pinot-connectors` modules as 
temporary deps, which over-includes connectors not shipped in the binary (e.g. 
`pinot-flink-connector` is not in `pinot-assembly.xml`). The script only 
considers what is actually shipped. This also avoids the issue of the wiki 
getting outdated as new modules are added.
     - Less error-prone: No temporary POM hacking, no copy-pasting from HTML, 
no risk of forgetting to revert changes or missing a module. The wiki is also 
out of date - it referenced the `-Ppresto-driver` profile which no longer 
exists and would miss `pinot-timeseries-m3ql` which is in the assembly. It also 
introduced some issues related to the Maven enforcer plugin due to the way the 
entire dependency list is bundled into the `pinot-distribution`. There were 
also some issues with generating the notice binary through the old manual way 
due to some 
[pinot-cli](https://github.com/apache/pinot/blob/master/pinot-clients/pinot-cli/pom.xml)
 related changes.
     - Reproducible and diffable: Running the script twice on the same codebase 
produces the same output. The generated report makes it easy to review exactly 
what changed.
   
   <hr>
   
   <h3>Usage:</h3>
   
   ```
     # Build first (needed to populate ~/.m2 with dependency JARs)
     mvn clean install -DskipTests -T1C
   ```
   
   ```
     # Update both LICENSE-binary and NOTICE-binary (writes *.updated files)
     python3 scripts/update-release-binary.py
   ```
   
   ```
     # Update only LICENSE-binary or only NOTICE-binary
     python3 scripts/update-release-binary.py --license-only
     python3 scripts/update-release-binary.py --notice-only
   ```
   
   ```
     # Preview changes without writing files
     python3 scripts/update-release-binary.py --report-only
   ```
   
   A few things still need human review:                                        
                                                                                
                                                                            
      
     - **TODO deps**: The report lists deps under `REQUIRES MANUAL LICENSE 
LOOKUP` whose license couldn't be auto-detected from POM metadata. Visit each 
project's website to determine the license, then add the dep to the appropriate 
section. If it's a new license type, create a new file in `licenses-binary/` 
and a new section in       
     `LICENSE-binary`.                                         
     - **Review the diff**: Before replacing, diff the generated files against 
the originals (`diff LICENSE-binary LICENSE-binary.updated`) to sanity-check 
the changes - version bumps, additions, removals should all make sense.
     - **Replace the files**: Once satisfied, copy the updated files over the 
originals:
       ```bash
       cp LICENSE-binary.updated LICENSE-binary
       cp NOTICE-binary.updated NOTICE-binary
     - Empty sections and orphaned files: The report flags license sections 
that became empty and files in `licenses-binary/` that are no longer 
referenced. Consider removing them.
     - Pre-existing duplicates: The original `LICENSE-binary` has 3 duplicate 
entries (`jnr-x86asm` in two sections, `netty-resolver-dns-native-macos` and 
`netty-transport-native-epoll` listed twice). The script preserves these as-is 
- clean them up if desired.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to