Hi Bertho,

I think it’s important to clarify what "rebuilding" actually looks like in 
practice. Moving to Codeberg isn't just a matter of effort, it would be a 
significant technical downgrade. Their CI currently lacks native ARM support 
and runs on a single global queue, which would leave our infrastructure a 
shadow of what it is on GitHub.

We also have to consider discoverability. GitHub is the de-facto home for open 
source, and most new contributors find projects where the code already lives. 
Moving to a niche platform risks cutting off that pipeline.

If this is largely a response to the recent news about GitHub using interaction 
data for Copilot training by default, we could address that with a simple 
automated PR message reminding contributors to opt out in their settings. Is 
this specific policy change the main driver for you, or is there a more 
fundamental issue at play?

Best, Luca



On March 27, 2026 5:29:45 PM GMT+08:00, Bertho Stultiens <[email protected]> 
wrote:
>On 3/27/26 9:27 AM, Luca Toniolo wrote:
>> Copilot doing statistical analysis on publicly available GPL code is, if 
>> anything, less than what the GPL already explicitly permits.
>
>Yes, as long as you abide by the license.
>
>But LLMs do much more than just statistical analysis. LLMs generate output 
>from the training set and people are encouraged to use that output.
>The problem is that LLMs are known to reproduce their input/training data. The 
>problem is that they reproduce training/learned code and stripped the GPL 
>license from that code. That is the real problem.
>
>The fact that we can't prevent these corporations from scraping and doing this 
>is a fact of how the Internet works. However, the fact that they did it does 
>not make it right or their use legal.
>
>
>> Mailing list archives have been indexed by Google, crawled by the Wayback 
>> Machine, scraped by researchers, and read by recruiters for as long as 
>> they've existed. Our commit messages, review comments, and design 
>> discussions have been public and searchable for years. That was true before 
>> Copilot, and it would remain true if we moved to GitLab, Codeberg, or a 
>> self-hosted Gitea instance tomorrow. None of these platforms prevent 
>> scraping.
>
>It is not only about what is publicly visible on the site(s). It is about the 
>use and process how you do things.
>
>The information that is available *inside* github about you and what you are 
>doing are quite more extensive than what can be viewed from the public record.
>
>The announcement from github makes, in principle, any and all data subject to 
>input into their LLMs. That I cannot accept and will seriously consider my 
>options.
>
>
>> GPL enforcement, even in clear-cut cases of actual license violation, has 
>> historically been rare and difficult. The FSF and SFLC have pursued only the 
>> most egregious cases, and even those took years. LinuxCNC itself has never 
>> enforced the GPL against anyone.
>
>The non-enforcement of copyright violations does _not_ make it alright to 
>become an infringer or to condone copyright infringement. Besides, the cases 
>that were enforced were victory for the GPL and made many an infringer think 
>twice or back off.
>
>That is not to say that there are many uncaught infringers. There are and we 
>should all discourage that where ever and how ever we can.
>
>
>> The idea of taking drastic action over something that may not even
>> constitute a violation seems disproportionate.
>That is unsettled case law.
>
>However, the action is not just taken over copyrights. The action would also 
>be taken to prevent a commercial entity from exploiting internal insights they 
>acquire from us using the site.
>
>Besides, it sends a strong message that their (github's) behaviour will result 
>in users changing their ways.
>
>
>> If we migrate off GitHub, what do we actually gain? We lose CI 
>> infrastructure that works, we lose contributor familiarity, we lose 
>> discoverability for new contributors, we lose issue and PR history, and we 
>> solve nothing, because the code was already scraped, the mailing lists were 
>> already indexed,
>
>We gain independence from a corporate entity controlling the infrastructure 
>and data we generate in development.
>
>CI is not that difficult, but we'd need to rebuild. IMO a small price for what 
>we gain.
>
>Commit history is in git. We can extract issues and PR data. You know, scrape 
>it? ;-)
>
>Discoverability, hm... Use a search engine on the Internet: find linuxcnc.org 
>-> link to development. How difficult is that? Not that we've been very active 
>at promoting ourselves in the past 20 years or so...
>
>
>> and the next platform will face the same reality.
>The next platform will not necessarily have that same reality. That is why 
>Codeberg is such a good option, they are a non-profit with an outspoken goal 
>to support and further FOSS 
>(https://docs.codeberg.org/getting-started/what-is-codeberg/).
>
>-- 
>Greetings Bertho
>
>(disclaimers are disclaimed)
>
>
>
>_______________________________________________
>Emc-developers mailing list
>[email protected]
>https://lists.sourceforge.net/lists/listinfo/emc-developers
_______________________________________________
Emc-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/emc-developers

Reply via email to