Yes, back then we were using Travis for CI, which IIRC didn't have much
access to Github beyond what was needed to check out the repo. Pretty sure
it couldn't open PRs.

Opening a PR when the dependencies change sounds like a good idea. I
think you'll probably get a PR any time you merge a dependency change
though (i.e. every dependabot PR will trigger another PR when merged),
since both DEPENDENCY-LICENSES and LICENSE-binary contain the version
numbers of the dependencies. Maybe that's fine?

If you'd like to avoid such PRs, I believe only the LICENSE and NOTICE
files need to exist in the repo. DEPENDENCY-LICENSES only needs to exist in
the source and binary release artifacts. LICENSE-binary/NOTICE-binary only
need to exist in binary release artifacts. So if you could figure out
generating those last 3 files as part of running releases, you wouldn't
need to worry about them in PRs.

DEPENDENCY-LICENSES is already a generated file, see
https://github.com/apache/storm/blob/master/DEVELOPER.md#auditing-licenses-for-licensenotice
.

I believe NOTICE-binary is generated with collect_license_files.sh, with
some manual editing for readability. Maybe the output of the generator
could be improved to avoid the need for manual cleanup.

LICENSE-binary is manually updated, but the validate-license-files.py
script "knows" what the right list of dependencies is supposed to be, so
maybe this could be generated as well with a little more effort (if nothing
else, the repo's LICENSE-binary could probably just be a template that a
script populates at release time?).

In case you end up keeping DEPENDENCY-LICENSES in the repo, there's a
different simplification you could do: It doesn't technically have to list
all dependencies, it only needs to list category B. So you could
potentially reduce how often this file needs to be changed by adding some
filtering to the script that outputs it.

Anyway, just some loose thoughts in case they're helpful. Hope you figure
out something that works for you :)

Den fre. 24. okt. 2025 kl. 18.54 skrev Richard Zowalla <[email protected]>:

> Hi!
>
> Thanks a lot for the detailed explanation . I really appreciate you taking
> the time to provide the historical context after all these years,
> especially around how dependency management was handled back then.
> It definitely helps to understand the reasoning behind the script’s
> introduction.
> These days, we could have an automated process in place (similar to TomEE,
> StormCrawler, …):
>
> Our CI pipeline could generate the updated license files during the build
> and automatically opens a PR against the main branch via GitHub Actions if
> they differ. This would ensure that the license information is always
> current.
>
> I’m guessing such tooling wasn’t available at the time, so the current
> approach made perfect sense back then.
>
> Best,
> Richard
>
>
>
> > Am 24.10.2025 um 18:22 schrieb Stig Rohde Døssing <[email protected]>:
> >
> > Hi Richard,
> >
> > Just want to provide a bit of context for why this script was added back
> in
> > the day (6 years ago or so), in case it helps you make a decision on what
> > to do about it today.
> >
> > Based on the advice at
> https://infra.apache.org/licensing-howto.html#binary,
> > and looking at a few other ASF projects (Kafka, Hadoop), the project
> needed
> > to maintain at least 4 files:
> >
> > LICENSE/NOTICE for Storm's source distribution
> > LICENSE/NOTICE-binary for Storm's binary distribution
> >
> > In addition, Storm at the time included some category B dependencies (
> > https://www.apache.org/legal/resolved.html#category-b), and those are
> > required to be listed in a particular way that users are likely to
> notice (
> >
> https://www.apache.org/legal/resolved.html#appropriately-labelled-condition
> ).
> > Rather than make a listing of only category B dependencies, we added the
> > DEPENDENCY-LICENSES file listing all dependencies plus licenses, and
> added
> > a link to that file to the README.
> >
> > It was a bit of a pain to ensure that these files were up to date when
> > doing a release, it was very easy to forget to update the files when
> > adding/updating/removing a dependency, so I added the
> > validate-license-files.py script to ensure that PRs that updated
> > dependencies also kept these files up to date. At the time, dependency
> > bumps were done manually and infrequently.
> >
> > So it wasn't really about keeping category X licenses out (we'd catch
> that
> > in PR reviews even without these scripts), it was just about ensuring
> that
> > these files accurately reflected the dependencies we were actually
> > including in the distributions. Since dependency bumps were
> (comparatively)
> > rare and not automated, it was less effort at the time to ask PRs to keep
> > these files up to date as part of changing the dependencies, rather than
> > ask the people doing releases to validate the files later.
> >
> > Den tors. 23. okt. 2025 kl. 10.23 skrev Richard Zowalla <[email protected]
> >:
> >
> >> Hi,
> >> After reviewing validate-license-files.py, it seems we already generate
> >> the two license files, compare them with the existing ones, and fail the
> >> check if any differences are found.
> >>
> >> Currently, most of our PRs involve dependency updates, and each time we
> >> spend several cycles manually updating these files.
> >>
> >> I was wondering if we could adopt a similar approach to what we do in
> >> StormCrawler (see here):
> >>
> https://github.com/apache/stormcrawler/blob/main/.github/workflows/main.yml#L46
> >> automatically generate the license files and open a PR whenever
> >> differences are detected.
> >>
> >> I assume the current license check was introduced to prevent
> accidentally
> >> introducing a category X license or similar issue.
> >>
> >> However, I think the time saved by automating these updates outweighs
> the
> >> minor additional review effort required during release preparation,
> since a
> >> full license review happens at that stage anyway.
> >>
> >> This goes in the direction of
> https://github.com/apache/storm/issues/7751
> >>
> >> What do you think?
> >>
> >> Gruß
> >> Richard
>
>

Reply via email to