One more thought on this, building on Bertho's point about code reproduction.
Consider this scenario: a contributor uses Copilot with training enabled, opens control.c to write a fix. Copilot ingests not just their new code, but the surrounding GPL context to generate suggestions. That GPL code can then resurface in someone else's proprietary project, with no license attached. That is a real problem. But this happens in the IDE, not on the hosting platform. A contributor could clone our repo from Codeberg, open it in VS Code with Copilot active, and the exact same thing occurs. Moving the repository doesn't change anything about this. The actual issue is that these AI tools don't check the license. The license file is right there in the repo. A responsible tool should read it before ingesting anything. They already have the ability to detect when suggestions match public code, so detecting a GPL license would be trivial. They chose not to. This isn't really a problem for LinuxCNC to solve by moving platforms. It's a problem with the tools, and the responsibility sits with the tool makers. Our energy would be better spent supporting efforts to hold them accountable than reorganizing our own infrastructure. In the meantime, a practical step we can take right now is adding a note to our contribution guidelines asking contributors who use Copilot and such to disable the training option in their IDE and GitHub privacy settings. Best regards, Luca On March 27, 2026 9:09:42 PM GMT+08:00, gene heskett <[email protected]> wrote: >On 3/27/26 07:12, Luca Toniolo wrote: >> You said >> >> "We gain independence from a corporate entity controlling the infrastructure >> and data we generate in development" >> >> I understand the appeal of independence from a corporate entity. >> >> But I think it's worth asking: what does that independence look like in >> practice? If we move to Codeberg, we're still relying on a third party, just >> a smaller, volunteer-run one. If we self-host, someone in our community >> needs to maintain servers, handle security, manage backups, and keep CI >> running. Right now Codeberg's CI only supports amd64, so our arm64 and >> multi-distro builds would need self-hosted runners that we'd have to >> provision and maintain ourselves. >> >> As for the data we generate in development, it's an open source project. Our >> commits, issues, and discussions are public by design. That's the deal we >> made when we chose this model, and it's a good deal. What additional control >> would a different platform give us over data that is meant to be open? >> >> I'm not dismissing the concern. I just want to make sure the cost of acting >> on it is proportionate to what we actually gain. >> >> Best regards, >> Luca >> >> >> >> On March 27, 2026 6:47:24 PM GMT+08:00, Luca Toniolo <[email protected]> >> wrote: >>> Hi Bertho, >>> >>> I think it’s important to clarify what "rebuilding" actually looks like in >>> practice. Moving to Codeberg isn't just a matter of effort, it would be a >>> significant technical downgrade. Their CI currently lacks native ARM >>> support and runs on a single global queue, which would leave our >>> infrastructure a shadow of what it is on GitHub. >>> >>> We also have to consider discoverability. GitHub is the de-facto home for >>> open source, and most new contributors find projects where the code already >>> lives. Moving to a niche platform risks cutting off that pipeline. >>> >>> If this is largely a response to the recent news about GitHub using >>> interaction data for Copilot training by default, we could address that >>> with a simple automated PR message reminding contributors to opt out in >>> their settings. Is this specific policy change the main driver for you, or >>> is there a more fundamental issue at play? >>> >>> Best, Luca >>> >>> >>> >>> On March 27, 2026 5:29:45 PM GMT+08:00, Bertho Stultiens >>> <[email protected]> wrote: >>>> On 3/27/26 9:27 AM, Luca Toniolo wrote: >>>>> Copilot doing statistical analysis on publicly available GPL code is, if >>>>> anything, less than what the GPL already explicitly permits. >>>> Yes, as long as you abide by the license. >>>> >>>> But LLMs do much more than just statistical analysis. LLMs generate output >>>> from the training set and people are encouraged to use that output. >>>> The problem is that LLMs are known to reproduce their input/training data. >>>> The problem is that they reproduce training/learned code and stripped the >>>> GPL license from that code. That is the real problem. >>>> >>>> The fact that we can't prevent these corporations from scraping and doing >>>> this is a fact of how the Internet works. However, the fact that they did >>>> it does not make it right or their use legal. >>>> >>>> >>>>> Mailing list archives have been indexed by Google, crawled by the Wayback >>>>> Machine, scraped by researchers, and read by recruiters for as long as >>>>> they've existed. Our commit messages, review comments, and design >>>>> discussions have been public and searchable for years. That was true >>>>> before Copilot, and it would remain true if we moved to GitLab, Codeberg, >>>>> or a self-hosted Gitea instance tomorrow. None of these platforms prevent >>>>> scraping. >>>> It is not only about what is publicly visible on the site(s). It is about >>>> the use and process how you do things. >>>> >>>> The information that is available *inside* github about you and what you >>>> are doing are quite more extensive than what can be viewed from the public >>>> record. >>>> >>>> The announcement from github makes, in principle, any and all data subject >>>> to input into their LLMs. That I cannot accept and will seriously consider >>>> my options. >>>> >>>> >>>>> GPL enforcement, even in clear-cut cases of actual license violation, has >>>>> historically been rare and difficult. The FSF and SFLC have pursued only >>>>> the most egregious cases, and even those took years. LinuxCNC itself has >>>>> never enforced the GPL against anyone. >>>> The non-enforcement of copyright violations does _not_ make it alright to >>>> become an infringer or to condone copyright infringement. Besides, the >>>> cases that were enforced were victory for the GPL and made many an >>>> infringer think twice or back off. >>>> >>>> That is not to say that there are many uncaught infringers. There are and >>>> we should all discourage that where ever and how ever we can. >>>> >>>> >>>>> The idea of taking drastic action over something that may not even >>>>> constitute a violation seems disproportionate. >>>> That is unsettled case law. >>>> >>>> However, the action is not just taken over copyrights. The action would >>>> also be taken to prevent a commercial entity from exploiting internal >>>> insights they acquire from us using the site. >>>> >>>> Besides, it sends a strong message that their (github's) behaviour will >>>> result in users changing their ways. >>>> >>>> >>>>> If we migrate off GitHub, what do we actually gain? We lose CI >>>>> infrastructure that works, we lose contributor familiarity, we lose >>>>> discoverability for new contributors, we lose issue and PR history, and >>>>> we solve nothing, because the code was already scraped, the mailing lists >>>>> were already indexed, >>>> We gain independence from a corporate entity controlling the >>>> infrastructure and data we generate in development. >>>> >>>> CI is not that difficult, but we'd need to rebuild. IMO a small price for >>>> what we gain. >>>> >>>> Commit history is in git. We can extract issues and PR data. You know, >>>> scrape it? ;-) >>>> >>>> Discoverability, hm... Use a search engine on the Internet: find >>>> linuxcnc.org -> link to development. How difficult is that? Not that we've >>>> been very active at promoting ourselves in the past 20 years or so.. >>>> >>>> >>>>> and the next platform will face the same reality. >>>> The next platform will not necessarily have that same reality. That is why >>>> Codeberg is such a good option, they are a non-profit with an outspoken >>>> goal to support and further FOSS >>>> (https://docs.codeberg.org/getting-started/what-is-codeberg/). >>>> >>>> -- >>>> Greetings Bertho >>>> >>>> (disclaimers are disclaimed) >Here is where I'd have to disagree, Bertho. The fact, mentioned previously >in this thread, that github is stripping the GPLv2 license before spitting out >our code to the rest of the world s/b grounds for a legal action. That, as M$ >knows well, will need deep pockets we don't have. Codeberg may be a good idea >in 3 or 5 years as development proceeds but for our purposes NOW, not so much >if it has no arm support. > >Keep looking for a github workalike that honors the GPLv2, or sell the farm >for .1 cents a section. >>>> >>>> . >>>> >>>> _______________________________________________ >>>> Emc-developers mailing list >>>> [email protected] >>>> https://lists.sourceforge.net/lists/listinfo/emc-developers >>> _______________________________________________ >>> Emc-developers mailing list >>> [email protected] >>> https://lists.sourceforge.net/lists/listinfo/emc-developers >> _______________________________________________ >> Emc-developers mailing list >> [email protected] >> https://lists.sourceforge.net/lists/listinfo/emc-developers > >Cheers, Gene Heskett, CET. >-- >"There are four boxes to be used in defense of liberty: > soap, ballot, jury, and ammo. Please use in that order." >-Ed Howdershelt (Author, 1940) >If we desire respect for the law, we must first make the law respectable. > - Louis D. Brandeis >Don't poison our oceans, interdict drugs at the src. > > > >_______________________________________________ >Emc-developers mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/emc-developers _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
