On 2026-02-23, Anderson Torres wrote: > On Mon, Feb 23, 2026 at 2:57 PM Vagrant Cascadian <[email protected]> wrote: >> On 2026-02-23, Maxim Cournoyer wrote: >> > Gabriel Wicki <[email protected]> writes: >> >> On Fri, Feb 20, 2026 at 02:10:15PM +0900, Maxim Cournoyer wrote: >> > 2. The git history becomes even more important: should >> > we migrate to another system in the future it'd be critical to preserve >> > it; it also means we can't prune the git history passed some threshold >> > to e.g. reduce the git repository size (I'm not suggesting to do this, >> > but that'd be an option we'd forego). >> >> the Author of a git commit != to the Author of the copyrightable >> material != the holder of the copyright (same is true for the dates of >> the copyrighted material, if you are into that sort of thing)... so I >> guess I have my doubts about the inclusive accuracy of relying on git >> history. >> >> Various *-by: seem to be a convention in git commits to reflect some of >> those distinctions... though honestly, it becomes a bit of git >> archaeology at that point, rather than simply reading the text in a file. > > I believe that when we (we understood here as a hypothetical group) > choose to use a VCS, > the metadata saved by it becomes part of the project itself - from the > messages to the opaque > machine-generated low-level bits of padding and organization.
Agreed! > If we can use the messages and/or the machine-generated metadata to > convey copyright info, > then we should. Even if... it is the wrong information? To elaborate a bit ... A Committer in 2026 can push a change of some other Author who wrote it in 2017, as long as the license permits, where the Committer is the author of the commit message but not the code itself, and the copyrighted code might even be owned by a third Entity (e.g. if the Author was under a work-for-hire jurisdiction). Now, what if a similarly complicated (but also different!) mix of parties with different relationships to the code edits that code in another commit later? ... and again... and again... I am not sure how you would extract relevent Copyright information purely out of such git history without considerable additional metadata beyond what git provides out of the box... other than keeping that information... in the code itself... which is what we are talking about getting rid of. I worry a bit that this discussion is going in a direction that would require us to simplify the world first in order to make our code simpler, but I fear that is not the world we have... as much as I share the inclination for a more elegant solution. My main experience looking at this sort of thing is in u-boot, a codebase with over 100000 commits over 26 years and over 37000 current files (not including files that were added and removed and renamed over the history of the project, notably)... they switched to SPDX identifiers, and ... good luck extracting reasonable copyright information out of that soup; I've been trying to do so for over a decade, perhaps passably given the circumstances, but I cannot say I am pleased with the results. The linux kernel did something similar, at at least one or two orders of magnitude larger of a scale... and similarly a bit messy. I think SPDX identifiers might be more important for large entitites trying to check boxes off of license compliance checklists and avoid getting sued than people trying to leverage copyleft (and hopefully other strategies!) to encourage sharing and rebuilding the world commons... Of course there is non-zero overlap, but I think the accidental omissions by lossy copyright tracking serve one interest far more than the other... The laws are not written for collaboration, and it is almost surely a(n anti-)feature that representing collaborative projects is a sisyphian task that would exhaust any reasonable person. *sigh* live well, vagrant
signature.asc
Description: PGP signature
