Yicong-Huang opened a new issue, #4688:
URL: https://github.com/apache/texera/issues/4688

   ### What happened?
   
   `bin/licensing/check_binary_deps.py` compares bundled deps to 
`LICENSE-binary` claims using exact `name==version` strings. Whenever any 
claimed package gets a new upstream release (the license itself unchanged), the 
check fails on every PR — both as `NEW` (the freshly resolved version) and 
`STALE` (the previously documented version). This blocks PRs that have nothing 
to do with dependencies.
   
   Concrete example — PR #4687 just failed the python license check on 
`tifffile`:
   
   ```
   NEW Python packages not claimed by LICENSE-binary:
     + tifffile==2026.5.2
   
   STALE Python packages claimed by LICENSE-binary but not actually bundled:
     - tifffile==2026.4.11
   ```
   
   `tifffile` is calver and ships ~monthly; same goes for several other claimed 
packages (`s3fs==2025.9.0`, `scikit-image==0.25.2`, etc.). The check breaks on 
cadence, not on actual license-relevant changes.
   
   ### How to reproduce?
   
   1. Open any PR that doesn't touch `requirements.txt`.
   2. Wait for an upstream of any LICENSE-binary-claimed Python or npm package 
to publish a new release.
   3. Re-run the build job — `Check installed Python packages against 
LICENSE-binary` (or its npm counterpart) fails with `NEW` + `STALE` for the 
same package, only the version string differing.
   
   ### Version
   
   1.1.0-incubating (Pre-release/Master)
   
   ### Commit Hash (Optional)
   
   Failure surfaced on PR #4687 
(https://github.com/apache/texera/actions/runs/25260954761/job/74068060023).
   
   ### Relevant log output
   
   ```
   NEW Python packages not claimed by LICENSE-binary:
     + tifffile==2026.5.2
   
   STALE Python packages claimed by LICENSE-binary but not actually bundled:
     - tifffile==2026.4.11
   
   ##[error]Process completed with exit code 1.
   ```
   
   ### Proposed fix
   
   Two layers:
   
   **Stop the churn (script change).** Modify `check_binary_deps.py` to do 
membership comparison on package **name** (PEP 503 canonical form), and treat 
version drift as informational, not failing:
   
   ```python
   claim_names   = {c.split("==")[0]: c for c in claims}     # name -> "name==v"
   reality_names = {r.split("==")[0]: r for r in reality}    # name -> "name==v"
   
   added = sorted(reality_names.keys() - claim_names.keys())   # still hard-fail
   stale = sorted(claim_names.keys() - reality_names.keys())   # still hard-fail
   drift = sorted(
       name for name in claim_names.keys() & reality_names.keys()
       if claim_names[name] != reality_names[name]
   )  # informational only — print, don't exit non-zero
   ```
   
   `added` (a brand-new package) and `stale` (a removed package) still gate the 
build — those need legal review. `drift` (same package, new version) prints a 
notice with a pointer to a refresh helper but does not fail.
   
   **Refresh on cadence (separate helper).** Add a small 
`bin/licensing/refresh_versions.py` that rewrites the version strings in 
`LICENSE-binary` to whatever is currently bundled. Run it manually before each 
release so the documented versions stay accurate, without paying that cost on 
every PR.
   
   The same logic applies to npm / agent-npm bullets — extend the same approach 
there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to