On 3/27/26 10:01, Luca Toniolo wrote:
One more thought on this, building on Bertho's point about code reproduction.Consider this scenario: a contributor uses Copilot with training enabled, opens control.c to write a fix. Copilot ingests not just their new code, but the surrounding GPL context to generate suggestions. That GPL code can then resurface in someone else's proprietary project, with no license attached. That is a real problem.
One that needs addressed RIGHT NOW.
Of coarse they did, their lawyers said it was an acceptable risk as they have pockets 100000*deeper than ours.But this happens in the IDE, not on the hosting platform. A contributor could clone our repo from Codeberg, open it in VS Code with Copilot active, and the exact same thing occurs. Moving the repository doesn't change anything about this. The actual issue is that these AI tools don't check the license. The license file is right there in the repo. A responsible tool should read it before ingesting anything. They already have the ability to detect when suggestions match public code, so detecting a GPL license would be trivial. They chose not to.
This isn't really a problem for LinuxCNC to solve by moving platforms. It's a problem with the tools, and the responsibility sits with the tool makers. Our energy would be better spent supporting efforts to hold them accountable than reorganizing our own infrastructure.
Precisely Luca. You have identified the real problem.
Which will work ONLY until the whitewash covering the leopards spots wears off and a silent "code update" automatically ignores that switch The EEE principle M$ has used for 50+ years. And which I have had personal experience with because W95 did not have a working network tcp connection. Card makers were slow to understand that they had to supply that stack & we were slow to find the right card maker.In the meantime, a practical step we can take right now is adding a note to our contribution guidelines asking contributors who use Copilot and such to disable the training option in their IDE and GitHub privacy settings.
We needed to get the teleprompter output of the AP Newsroom software out of the W95 server to the amiga 2000 with a 68030 card in it that we were using to serve up our stations web page from an autoanswer modem adding the news to our web page. We may have been the 1st tv station in the country to have a news page on our web page, but we were forced to make that connection on floppy disks and did for almost 2 years.
That was when I called Redmond and demanded a working net stack, and got called a pie-rat by someone about 3 layers up the support crew. All they could see was the ability to use the net as a means to steal W95. To date since, no Wanything has survived coming onto my property more than it took me to dl the linux of the week.
Best regards, Luca On March 27, 2026 9:09:42 PM GMT+08:00, gene heskett <[email protected]> wrote:On 3/27/26 07:12, Luca Toniolo wrote:You said "We gain independence from a corporate entity controlling the infrastructure and data we generate in development" I understand the appeal of independence from a corporate entity. But I think it's worth asking: what does that independence look like in practice? If we move to Codeberg, we're still relying on a third party, just a smaller, volunteer-run one. If we self-host, someone in our community needs to maintain servers, handle security, manage backups, and keep CI running. Right now Codeberg's CI only supports amd64, so our arm64 and multi-distro builds would need self-hosted runners that we'd have to provision and maintain ourselves. As for the data we generate in development, it's an open source project. Our commits, issues, and discussions are public by design. That's the deal we made when we chose this model, and it's a good deal. What additional control would a different platform give us over data that is meant to be open? I'm not dismissing the concern. I just want to make sure the cost of acting on it is proportionate to what we actually gain. Best regards, Luca On March 27, 2026 6:47:24 PM GMT+08:00, Luca Toniolo <[email protected]> wrote:Hi Bertho, I think it’s important to clarify what "rebuilding" actually looks like in practice. Moving to Codeberg isn't just a matter of effort, it would be a significant technical downgrade. Their CI currently lacks native ARM support and runs on a single global queue, which would leave our infrastructure a shadow of what it is on GitHub. We also have to consider discoverability. GitHub is the de-facto home for open source, and most new contributors find projects where the code already lives. Moving to a niche platform risks cutting off that pipeline. If this is largely a response to the recent news about GitHub using interaction data for Copilot training by default, we could address that with a simple automated PR message reminding contributors to opt out in their settings. Is this specific policy change the main driver for you, or is there a more fundamental issue at play? Best, Luca On March 27, 2026 5:29:45 PM GMT+08:00, Bertho Stultiens <[email protected]> wrote:On 3/27/26 9:27 AM, Luca Toniolo wrote:Copilot doing statistical analysis on publicly available GPL code is, if anything, less than what the GPL already explicitly permits.Yes, as long as you abide by the license. But LLMs do much more than just statistical analysis. LLMs generate output from the training set and people are encouraged to use that output. The problem is that LLMs are known to reproduce their input/training data. The problem is that they reproduce training/learned code and stripped the GPL license from that code. That is the real problem. The fact that we can't prevent these corporations from scraping and doing this is a fact of how the Internet works. However, the fact that they did it does not make it right or their use legal.Mailing list archives have been indexed by Google, crawled by the Wayback Machine, scraped by researchers, and read by recruiters for as long as they've existed. Our commit messages, review comments, and design discussions have been public and searchable for years. That was true before Copilot, and it would remain true if we moved to GitLab, Codeberg, or a self-hosted Gitea instance tomorrow. None of these platforms prevent scraping.It is not only about what is publicly visible on the site(s). It is about the use and process how you do things. The information that is available *inside* github about you and what you are doing are quite more extensive than what can be viewed from the public record. The announcement from github makes, in principle, any and all data subject to input into their LLMs. That I cannot accept and will seriously consider my options.GPL enforcement, even in clear-cut cases of actual license violation, has historically been rare and difficult. The FSF and SFLC have pursued only the most egregious cases, and even those took years. LinuxCNC itself has never enforced the GPL against anyone.The non-enforcement of copyright violations does _not_ make it alright to become an infringer or to condone copyright infringement. Besides, the cases that were enforced were victory for the GPL and made many an infringer think twice or back off. That is not to say that there are many uncaught infringers. There are and we should all discourage that where ever and how ever we canThe idea of taking drastic action over something that may not even constitute a violation seems disproportionate.That is unsettled case law. However, the action is not just taken over copyrights. The action would also be taken to prevent a commercial entity from exploiting internal insights they acquire from us using the site. Besides, it sends a strong message that their (github's) behaviour will result in users changing their ways.If we migrate off GitHub, what do we actually gain? We lose CI infrastructure that works, we lose contributor familiarity, we lose discoverability for new contributors, we lose issue and PR history, and we solve nothing, because the code was already scraped, the mailing lists were already indexed,We gain independence from a corporate entity controlling the infrastructure and data we generate in development. CI is not that difficult, but we'd need to rebuild. IMO a small price for what we gain. Commit history is in git. We can extract issues and PR data. You know, scrape it? ;-) Discoverability, hm... Use a search engine on the Internet: find linuxcnc.org -> link to development. How difficult is that? Not that we've been very active at promoting ourselves in the past 20 years or so..and the next platform will face the same reality.The next platform will not necessarily have that same reality. That is why Codeberg is such a good option, they are a non-profit with an outspoken goal to support and further FOSS (https://docs.codeberg.org/getting-started/what-is-codeberg/). -- Greetings Bertho (disclaimers are disclaimed)Here is where I'd have to disagree, Bertho. The fact, mentioned previously in this thread, that github is stripping the GPLv2 license before spitting out our code to the rest of the world s/b grounds for a legal action. That, as M$ knows well, will need deep pockets we don't have. Codeberg may be a good idea in 3 or 5 years as development proceeds but for our purposes NOW, not so much if it has no arm support. Keep looking for a github workalike that honors the GPLv2, or sell the farm for .1 cents a section.. _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers_______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers_______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developersCheers, Gene Heskett, CET. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis Don't poison our oceans, interdict drugs at the src. _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers_______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
Cheers, Gene Heskett, CET. -- "There are four boxes to be used in defense of liberty: soap, ballot, jury, and ammo. Please use in that order." -Ed Howdershelt (Author, 1940) If we desire respect for the law, we must first make the law respectable. - Louis D. Brandeis Don't poison our oceans, interdict drugs at the src. _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
