On 3/27/26 10:01, Luca Toniolo wrote:
One more thought on this, building on Bertho's point about code reproduction.

Consider this scenario: a contributor uses Copilot with training enabled, opens 
control.c to write a fix. Copilot ingests not just their new code, but the 
surrounding GPL context to generate suggestions. That GPL code can then 
resurface in someone else's proprietary project, with no license attached. That 
is a real problem.
One that needs addressed RIGHT NOW.
But this happens in the IDE, not on the hosting platform. A contributor could 
clone our repo from Codeberg, open it in VS Code with Copilot active, and the 
exact same thing occurs. Moving the repository doesn't change anything about 
this.

The actual issue is that these AI tools don't check the license. The license 
file is right there in the repo. A responsible tool should read it before 
ingesting anything. They already have the ability to detect when suggestions 
match public code, so detecting a GPL license would be trivial. They chose not 
to.
Of coarse they did, their lawyers said it was an acceptable risk as they have pockets 100000*deeper than ours.
This isn't really a problem for LinuxCNC to solve by moving platforms. It's a 
problem with the tools, and the responsibility sits with the tool makers. Our 
energy would be better spent supporting efforts to hold them accountable than 
reorganizing our own infrastructure.
Precisely Luca. You have identified the real problem.
In the meantime, a practical step we can take right now is adding a note to our 
contribution guidelines asking contributors who use Copilot and such to disable 
the training option in their IDE and GitHub privacy settings.
Which will work ONLY until the whitewash covering the leopards spots wears off and a silent "code update" automatically ignores that switch  The EEE principle M$ has used for 50+ years.  And which I have had personal experience with because W95 did not have a working network tcp connection. Card makers were slow to understand that they  had to supply that stack & we were slow to find  the right card maker.

We needed to get the teleprompter output of the AP Newsroom software out of the W95 server to the amiga 2000 with a 68030 card in it that we were using to serve up our stations web page from an autoanswer modem adding the news to our web page.  We may have been the 1st tv station in the country to have a news page on our web page, but we were forced to make that connection on floppy disks and did for almost 2 years.

That was when I called Redmond and demanded a working net stack, and got called a pie-rat by someone about 3 layers up the support crew. All they could see was the ability to use the net as a means to steal W95.  To date since, no Wanything has survived coming onto my property more than it took me to dl the linux of the week.
Best regards,
Luca

On March 27, 2026 9:09:42 PM GMT+08:00, gene heskett <[email protected]> 
wrote:
On 3/27/26 07:12, Luca Toniolo wrote:
You said

"We gain independence from a corporate entity controlling the infrastructure and 
data we generate in development"

I understand the appeal of independence from a corporate entity.

But I think it's worth asking: what does that independence look like in 
practice? If we move to Codeberg, we're still relying on a third party, just a 
smaller, volunteer-run one. If we self-host, someone in our community needs to 
maintain servers, handle security, manage backups, and keep CI running. Right 
now Codeberg's CI only supports amd64, so our arm64 and multi-distro builds 
would need self-hosted runners that we'd have to provision and maintain 
ourselves.

As for the data we generate in development, it's an open source project. Our 
commits, issues, and discussions are public by design. That's the deal we made 
when we chose this model, and it's a good deal. What additional control would a 
different platform give us over data that is meant to be open?

I'm not dismissing the concern. I just want to make sure the cost of acting on 
it is proportionate to what we actually gain.

Best regards,
Luca



On March 27, 2026 6:47:24 PM GMT+08:00, Luca Toniolo <[email protected]> wrote:
Hi Bertho,

I think it’s important to clarify what "rebuilding" actually looks like in 
practice. Moving to Codeberg isn't just a matter of effort, it would be a significant 
technical downgrade. Their CI currently lacks native ARM support and runs on a single 
global queue, which would leave our infrastructure a shadow of what it is on GitHub.

We also have to consider discoverability. GitHub is the de-facto home for open 
source, and most new contributors find projects where the code already lives. 
Moving to a niche platform risks cutting off that pipeline.

If this is largely a response to the recent news about GitHub using interaction 
data for Copilot training by default, we could address that with a simple 
automated PR message reminding contributors to opt out in their settings. Is 
this specific policy change the main driver for you, or is there a more 
fundamental issue at play?

Best, Luca



On March 27, 2026 5:29:45 PM GMT+08:00, Bertho Stultiens <[email protected]> 
wrote:
On 3/27/26 9:27 AM, Luca Toniolo wrote:
Copilot doing statistical analysis on publicly available GPL code is, if 
anything, less than what the GPL already explicitly permits.
Yes, as long as you abide by the license.

But LLMs do much more than just statistical analysis. LLMs generate output from 
the training set and people are encouraged to use that output.
The problem is that LLMs are known to reproduce their input/training data. The 
problem is that they reproduce training/learned code and stripped the GPL 
license from that code. That is the real problem.

The fact that we can't prevent these corporations from scraping and doing this 
is a fact of how the Internet works. However, the fact that they did it does 
not make it right or their use legal.


Mailing list archives have been indexed by Google, crawled by the Wayback 
Machine, scraped by researchers, and read by recruiters for as long as they've 
existed. Our commit messages, review comments, and design discussions have been 
public and searchable for years. That was true before Copilot, and it would 
remain true if we moved to GitLab, Codeberg, or a self-hosted Gitea instance 
tomorrow. None of these platforms prevent scraping.
It is not only about what is publicly visible on the site(s). It is about the 
use and process how you do things.

The information that is available *inside* github about you and what you are 
doing are quite more extensive than what can be viewed from the public record.

The announcement from github makes, in principle, any and all data subject to 
input into their LLMs. That I cannot accept and will seriously consider my 
options.


GPL enforcement, even in clear-cut cases of actual license violation, has 
historically been rare and difficult. The FSF and SFLC have pursued only the 
most egregious cases, and even those took years. LinuxCNC itself has never 
enforced the GPL against anyone.
The non-enforcement of copyright violations does _not_ make it alright to 
become an infringer or to condone copyright infringement. Besides, the cases 
that were enforced were victory for the GPL and made many an infringer think 
twice or back off.

That is not to say that there are many uncaught infringers. There are and we 
should all discourage that where ever and how ever we can


The idea of taking drastic action over something that may not even
constitute a violation seems disproportionate.
That is unsettled case law.

However, the action is not just taken over copyrights. The action would also be 
taken to prevent a commercial entity from exploiting internal insights they 
acquire from us using the site.

Besides, it sends a strong message that their (github's) behaviour will result 
in users changing their ways.


If we migrate off GitHub, what do we actually gain? We lose CI infrastructure 
that works, we lose contributor familiarity, we lose discoverability for new 
contributors, we lose issue and PR history, and we solve nothing, because the 
code was already scraped, the mailing lists were already indexed,
We gain independence from a corporate entity controlling the infrastructure and 
data we generate in development.

CI is not that difficult, but we'd need to rebuild. IMO a small price for what 
we gain.

Commit history is in git. We can extract issues and PR data. You know, scrape 
it? ;-)

Discoverability, hm... Use a search engine on the Internet: find linuxcnc.org 
-> link to development. How difficult is that? Not that we've been very active 
at promoting ourselves in the past 20 years or so..


and the next platform will face the same reality.
The next platform will not necessarily have that same reality. That is why 
Codeberg is such a good option, they are a non-profit with an outspoken goal to 
support and further FOSS 
(https://docs.codeberg.org/getting-started/what-is-codeberg/).

--
Greetings Bertho

(disclaimers are disclaimed)
Here is where I'd have to disagree,  Bertho.  The fact, mentioned previously in 
this thread, that github is stripping the GPLv2 license before spitting out our 
code to the rest of the world s/b grounds for a legal action. That, as M$ knows 
well, will need deep pockets we don't have.  Codeberg may be a good idea in 3 
or 5 years as development proceeds but for our purposes NOW,  not so much if it 
has no arm support.

Keep looking for a github workalike that honors the GPLv2, or sell the farm for 
.1 cents a section.
.

_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers
Cheers, Gene Heskett, CET.
--
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
- Louis D. Brandeis
Don't poison our oceans, interdict drugs at the src.



_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Cheers, Gene Heskett, CET.
--
"There are four boxes to be used in defense of liberty:
 soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author, 1940)
If we desire respect for the law, we must first make the law respectable.
 - Louis D. Brandeis
Don't poison our oceans, interdict drugs at the src.



_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to