On 3/27/26 9:27 AM, Luca Toniolo wrote:
Copilot doing statistical analysis on publicly available GPL code is, if anything, less than what the GPL already explicitly permits.

Yes, as long as you abide by the license.

But LLMs do much more than just statistical analysis. LLMs generate output from the training set and people are encouraged to use that output. The problem is that LLMs are known to reproduce their input/training data. The problem is that they reproduce training/learned code and stripped the GPL license from that code. That is the real problem.

The fact that we can't prevent these corporations from scraping and doing this is a fact of how the Internet works. However, the fact that they did it does not make it right or their use legal.


Mailing list archives have been indexed by Google, crawled by the Wayback Machine, scraped by researchers, and read by recruiters for as long as they've existed. Our commit messages, review comments, and design discussions have been public and searchable for years. That was true before Copilot, and it would remain true if we moved to GitLab, Codeberg, or a self-hosted Gitea instance tomorrow. None of these platforms prevent scraping.

It is not only about what is publicly visible on the site(s). It is about the use and process how you do things.

The information that is available *inside* github about you and what you are doing are quite more extensive than what can be viewed from the public record.

The announcement from github makes, in principle, any and all data subject to input into their LLMs. That I cannot accept and will seriously consider my options.


GPL enforcement, even in clear-cut cases of actual license violation, has historically been rare and difficult. The FSF and SFLC have pursued only the most egregious cases, and even those took years. LinuxCNC itself has never enforced the GPL against anyone.

The non-enforcement of copyright violations does _not_ make it alright to become an infringer or to condone copyright infringement. Besides, the cases that were enforced were victory for the GPL and made many an infringer think twice or back off.

That is not to say that there are many uncaught infringers. There are and we should all discourage that where ever and how ever we can.


The idea of taking drastic action over something that may not even
constitute a violation seems disproportionate.
That is unsettled case law.

However, the action is not just taken over copyrights. The action would also be taken to prevent a commercial entity from exploiting internal insights they acquire from us using the site.

Besides, it sends a strong message that their (github's) behaviour will result in users changing their ways.


If we migrate off GitHub, what do we actually gain? We lose CI infrastructure that works, we lose contributor familiarity, we lose discoverability for new contributors, we lose issue and PR history, and we solve nothing, because the code was already scraped, the mailing lists were already indexed,

We gain independence from a corporate entity controlling the infrastructure and data we generate in development.

CI is not that difficult, but we'd need to rebuild. IMO a small price for what we gain.

Commit history is in git. We can extract issues and PR data. You know, scrape it? ;-)

Discoverability, hm... Use a search engine on the Internet: find linuxcnc.org -> link to development. How difficult is that? Not that we've been very active at promoting ourselves in the past 20 years or so...


and the next platform will face the same reality.
The next platform will not necessarily have that same reality. That is why Codeberg is such a good option, they are a non-profit with an outspoken goal to support and further FOSS (https://docs.codeberg.org/getting-started/what-is-codeberg/).

--
Greetings Bertho

(disclaimers are disclaimed)



_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to