Yicong-Huang opened a new issue, #4688:
URL: https://github.com/apache/texera/issues/4688
### What happened?
`bin/licensing/check_binary_deps.py` compares bundled deps to
`LICENSE-binary` claims using exact `name==version` strings. Whenever any
claimed package gets a new upstream release (the license itself unchanged), the
check fails on every PR — both as `NEW` (the freshly resolved version) and
`STALE` (the previously documented version). This blocks PRs that have nothing
to do with dependencies.
Concrete example — PR #4687 just failed the python license check on
`tifffile`:
```
NEW Python packages not claimed by LICENSE-binary:
+ tifffile==2026.5.2
STALE Python packages claimed by LICENSE-binary but not actually bundled:
- tifffile==2026.4.11
```
`tifffile` is calver and ships ~monthly; same goes for several other claimed
packages (`s3fs==2025.9.0`, `scikit-image==0.25.2`, etc.). The check breaks on
cadence, not on actual license-relevant changes.
### How to reproduce?
1. Open any PR that doesn't touch `requirements.txt`.
2. Wait for an upstream of any LICENSE-binary-claimed Python or npm package
to publish a new release.
3. Re-run the build job — `Check installed Python packages against
LICENSE-binary` (or its npm counterpart) fails with `NEW` + `STALE` for the
same package, only the version string differing.
### Version
1.1.0-incubating (Pre-release/Master)
### Commit Hash (Optional)
Failure surfaced on PR #4687
(https://github.com/apache/texera/actions/runs/25260954761/job/74068060023).
### Relevant log output
```
NEW Python packages not claimed by LICENSE-binary:
+ tifffile==2026.5.2
STALE Python packages claimed by LICENSE-binary but not actually bundled:
- tifffile==2026.4.11
##[error]Process completed with exit code 1.
```
### Proposed fix
Two layers:
**Stop the churn (script change).** Modify `check_binary_deps.py` to do
membership comparison on package **name** (PEP 503 canonical form), and treat
version drift as informational, not failing:
```python
claim_names = {c.split("==")[0]: c for c in claims} # name -> "name==v"
reality_names = {r.split("==")[0]: r for r in reality} # name -> "name==v"
added = sorted(reality_names.keys() - claim_names.keys()) # still hard-fail
stale = sorted(claim_names.keys() - reality_names.keys()) # still hard-fail
drift = sorted(
name for name in claim_names.keys() & reality_names.keys()
if claim_names[name] != reality_names[name]
) # informational only — print, don't exit non-zero
```
`added` (a brand-new package) and `stale` (a removed package) still gate the
build — those need legal review. `drift` (same package, new version) prints a
notice with a pointer to a refresh helper but does not fail.
**Refresh on cadence (separate helper).** Add a small
`bin/licensing/refresh_versions.py` that rewrites the version strings in
`LICENSE-binary` to whatever is currently bundled. Run it manually before each
release so the documented versions stay accurate, without paying that cost on
every PR.
The same logic applies to npm / agent-npm bullets — extend the same approach
there.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]