Hi Bertho, I think it’s important to clarify what "rebuilding" actually looks like in practice. Moving to Codeberg isn't just a matter of effort, it would be a significant technical downgrade. Their CI currently lacks native ARM support and runs on a single global queue, which would leave our infrastructure a shadow of what it is on GitHub.
We also have to consider discoverability. GitHub is the de-facto home for open source, and most new contributors find projects where the code already lives. Moving to a niche platform risks cutting off that pipeline. If this is largely a response to the recent news about GitHub using interaction data for Copilot training by default, we could address that with a simple automated PR message reminding contributors to opt out in their settings. Is this specific policy change the main driver for you, or is there a more fundamental issue at play? Best, Luca On March 27, 2026 5:29:45 PM GMT+08:00, Bertho Stultiens <[email protected]> wrote: >On 3/27/26 9:27 AM, Luca Toniolo wrote: >> Copilot doing statistical analysis on publicly available GPL code is, if >> anything, less than what the GPL already explicitly permits. > >Yes, as long as you abide by the license. > >But LLMs do much more than just statistical analysis. LLMs generate output >from the training set and people are encouraged to use that output. >The problem is that LLMs are known to reproduce their input/training data. The >problem is that they reproduce training/learned code and stripped the GPL >license from that code. That is the real problem. > >The fact that we can't prevent these corporations from scraping and doing this >is a fact of how the Internet works. However, the fact that they did it does >not make it right or their use legal. > > >> Mailing list archives have been indexed by Google, crawled by the Wayback >> Machine, scraped by researchers, and read by recruiters for as long as >> they've existed. Our commit messages, review comments, and design >> discussions have been public and searchable for years. That was true before >> Copilot, and it would remain true if we moved to GitLab, Codeberg, or a >> self-hosted Gitea instance tomorrow. None of these platforms prevent >> scraping. > >It is not only about what is publicly visible on the site(s). It is about the >use and process how you do things. > >The information that is available *inside* github about you and what you are >doing are quite more extensive than what can be viewed from the public record. > >The announcement from github makes, in principle, any and all data subject to >input into their LLMs. That I cannot accept and will seriously consider my >options. > > >> GPL enforcement, even in clear-cut cases of actual license violation, has >> historically been rare and difficult. The FSF and SFLC have pursued only the >> most egregious cases, and even those took years. LinuxCNC itself has never >> enforced the GPL against anyone. > >The non-enforcement of copyright violations does _not_ make it alright to >become an infringer or to condone copyright infringement. Besides, the cases >that were enforced were victory for the GPL and made many an infringer think >twice or back off. > >That is not to say that there are many uncaught infringers. There are and we >should all discourage that where ever and how ever we can. > > >> The idea of taking drastic action over something that may not even >> constitute a violation seems disproportionate. >That is unsettled case law. > >However, the action is not just taken over copyrights. The action would also >be taken to prevent a commercial entity from exploiting internal insights they >acquire from us using the site. > >Besides, it sends a strong message that their (github's) behaviour will result >in users changing their ways. > > >> If we migrate off GitHub, what do we actually gain? We lose CI >> infrastructure that works, we lose contributor familiarity, we lose >> discoverability for new contributors, we lose issue and PR history, and we >> solve nothing, because the code was already scraped, the mailing lists were >> already indexed, > >We gain independence from a corporate entity controlling the infrastructure >and data we generate in development. > >CI is not that difficult, but we'd need to rebuild. IMO a small price for what >we gain. > >Commit history is in git. We can extract issues and PR data. You know, scrape >it? ;-) > >Discoverability, hm... Use a search engine on the Internet: find linuxcnc.org >-> link to development. How difficult is that? Not that we've been very active >at promoting ourselves in the past 20 years or so... > > >> and the next platform will face the same reality. >The next platform will not necessarily have that same reality. That is why >Codeberg is such a good option, they are a non-profit with an outspoken goal >to support and further FOSS >(https://docs.codeberg.org/getting-started/what-is-codeberg/). > >-- >Greetings Bertho > >(disclaimers are disclaimed) > > > >_______________________________________________ >Emc-developers mailing list >[email protected] >https://lists.sourceforge.net/lists/listinfo/emc-developers _______________________________________________ Emc-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/emc-developers
