On 2026-02-23, Anderson Torres wrote:
> On Mon, Feb 23, 2026 at 2:57 PM Vagrant Cascadian <[email protected]> wrote:
>> On 2026-02-23, Maxim Cournoyer wrote:
>> > Gabriel Wicki <[email protected]> writes:
>> >> On Fri, Feb 20, 2026 at 02:10:15PM +0900, Maxim Cournoyer wrote:
>> > 2. The git history becomes even more important: should
>> > we migrate to another system in the future it'd be critical to preserve
>> > it; it also means we can't prune the git history passed some threshold
>> > to e.g. reduce the git repository size (I'm not suggesting to do this,
>> > but that'd be an option we'd forego).
>>
>> the Author of a git commit != to the Author of the copyrightable
>> material != the holder of the copyright (same is true for the dates of
>> the copyrighted material, if you are into that sort of thing)... so I
>> guess I have my doubts about the inclusive accuracy of relying on git
>> history.
>>
>> Various *-by: seem to be a convention in git commits to reflect some of
>> those distinctions... though honestly, it becomes a bit of git
>> archaeology at that point, rather than simply reading the text in a file.
>
> I believe that when we (we understood here as a hypothetical group)
> choose to use a VCS,
> the metadata saved by it becomes part of the project itself - from the
> messages to the opaque
> machine-generated low-level bits of padding and organization.

Agreed!

> If we can use the messages and/or the machine-generated metadata to
> convey copyright info,
> then we should.

Even if... it is the wrong information?

To elaborate a bit ... A Committer in 2026 can push a change of some
other Author who wrote it in 2017, as long as the license permits, where
the Committer is the author of the commit message but not the code
itself, and the copyrighted code might even be owned by a third Entity
(e.g. if the Author was under a work-for-hire jurisdiction). Now, what
if a similarly complicated (but also different!) mix of parties with
different relationships to the code edits that code in another commit
later? ... and again... and again...

I am not sure how you would extract relevent Copyright information
purely out of such git history without considerable additional metadata
beyond what git provides out of the box... other than keeping that
information... in the code itself... which is what we are talking about
getting rid of.


I worry a bit that this discussion is going in a direction that would
require us to simplify the world first in order to make our code
simpler, but I fear that is not the world we have... as much as I share
the inclination for a more elegant solution.


My main experience looking at this sort of thing is in u-boot, a
codebase with over 100000 commits over 26 years and over 37000 current
files (not including files that were added and removed and renamed over
the history of the project, notably)... they switched to SPDX
identifiers, and ... good luck extracting reasonable copyright
information out of that soup; I've been trying to do so for over a
decade, perhaps passably given the circumstances, but I cannot say I am
pleased with the results.

The linux kernel did something similar, at at least one or two orders of
magnitude larger of a scale... and similarly a bit messy.


I think SPDX identifiers might be more important for large entitites
trying to check boxes off of license compliance checklists and avoid
getting sued than people trying to leverage copyleft (and hopefully
other strategies!) to encourage sharing and rebuilding the world
commons...

Of course there is non-zero overlap, but I think the accidental
omissions by lossy copyright tracking serve one interest far more than
the other... The laws are not written for collaboration, and it is
almost surely a(n anti-)feature that representing collaborative projects
is a sisyphian task that would exhaust any reasonable person.

*sigh*


live well,
  vagrant

Attachment: signature.asc
Description: PGP signature

Reply via email to