Hi Richard, Just want to provide a bit of context for why this script was added back in the day (6 years ago or so), in case it helps you make a decision on what to do about it today.
Based on the advice at https://infra.apache.org/licensing-howto.html#binary, and looking at a few other ASF projects (Kafka, Hadoop), the project needed to maintain at least 4 files: LICENSE/NOTICE for Storm's source distribution LICENSE/NOTICE-binary for Storm's binary distribution In addition, Storm at the time included some category B dependencies ( https://www.apache.org/legal/resolved.html#category-b), and those are required to be listed in a particular way that users are likely to notice ( https://www.apache.org/legal/resolved.html#appropriately-labelled-condition). Rather than make a listing of only category B dependencies, we added the DEPENDENCY-LICENSES file listing all dependencies plus licenses, and added a link to that file to the README. It was a bit of a pain to ensure that these files were up to date when doing a release, it was very easy to forget to update the files when adding/updating/removing a dependency, so I added the validate-license-files.py script to ensure that PRs that updated dependencies also kept these files up to date. At the time, dependency bumps were done manually and infrequently. So it wasn't really about keeping category X licenses out (we'd catch that in PR reviews even without these scripts), it was just about ensuring that these files accurately reflected the dependencies we were actually including in the distributions. Since dependency bumps were (comparatively) rare and not automated, it was less effort at the time to ask PRs to keep these files up to date as part of changing the dependencies, rather than ask the people doing releases to validate the files later. Den tors. 23. okt. 2025 kl. 10.23 skrev Richard Zowalla <[email protected]>: > Hi, > After reviewing validate-license-files.py, it seems we already generate > the two license files, compare them with the existing ones, and fail the > check if any differences are found. > > Currently, most of our PRs involve dependency updates, and each time we > spend several cycles manually updating these files. > > I was wondering if we could adopt a similar approach to what we do in > StormCrawler (see here): > https://github.com/apache/stormcrawler/blob/main/.github/workflows/main.yml#L46 > automatically generate the license files and open a PR whenever > differences are detected. > > I assume the current license check was introduced to prevent accidentally > introducing a category X license or similar issue. > > However, I think the time saved by automating these updates outweighs the > minor additional review effort required during release preparation, since a > full license review happens at that stage anyway. > > This goes in the direction of https://github.com/apache/storm/issues/7751 > > What do you think? > > Gruß > Richard
