Hi Richard,

Just want to provide a bit of context for why this script was added back in
the day (6 years ago or so), in case it helps you make a decision on what
to do about it today.

Based on the advice at https://infra.apache.org/licensing-howto.html#binary,
and looking at a few other ASF projects (Kafka, Hadoop), the project needed
to maintain at least 4 files:

LICENSE/NOTICE for Storm's source distribution
LICENSE/NOTICE-binary for Storm's binary distribution

In addition, Storm at the time included some category B dependencies (
https://www.apache.org/legal/resolved.html#category-b), and those are
required to be listed in a particular way that users are likely to notice (
https://www.apache.org/legal/resolved.html#appropriately-labelled-condition).
Rather than make a listing of only category B dependencies, we added the
DEPENDENCY-LICENSES file listing all dependencies plus licenses, and added
a link to that file to the README.

It was a bit of a pain to ensure that these files were up to date when
doing a release, it was very easy to forget to update the files when
adding/updating/removing a dependency, so I added the
validate-license-files.py script to ensure that PRs that updated
dependencies also kept these files up to date. At the time, dependency
bumps were done manually and infrequently.

So it wasn't really about keeping category X licenses out (we'd catch that
in PR reviews even without these scripts), it was just about ensuring that
these files accurately reflected the dependencies we were actually
including in the distributions. Since dependency bumps were (comparatively)
rare and not automated, it was less effort at the time to ask PRs to keep
these files up to date as part of changing the dependencies, rather than
ask the people doing releases to validate the files later.

Den tors. 23. okt. 2025 kl. 10.23 skrev Richard Zowalla <[email protected]>:

> Hi,
> After reviewing validate-license-files.py, it seems we already generate
> the two license files, compare them with the existing ones, and fail the
> check if any differences are found.
>
> Currently, most of our PRs involve dependency updates, and each time we
> spend several cycles manually updating these files.
>
> I was wondering if we could adopt a similar approach to what we do in
> StormCrawler (see here):
> https://github.com/apache/stormcrawler/blob/main/.github/workflows/main.yml#L46
> automatically generate the license files and open a PR whenever
> differences are detected.
>
> I assume the current license check was introduced to prevent accidentally
> introducing a category X license or similar issue.
>
> However, I think the time saved by automating these updates outweighs the
> minor additional review effort required during release preparation, since a
> full license review happens at that stage anyway.
>
> This goes in the direction of https://github.com/apache/storm/issues/7751
>
> What do you think?
>
> Gruß
> Richard

Reply via email to