On Tuesday, February 27th, 2024 at 3:45 PM, Michał Górny <mgo...@gentoo.org> 
wrote:

> Hello,
> 
> Given the recent spread of the "AI" bubble, I think we really need to
> look into formally addressing the related concerns. In my opinion,
> at this point the only reasonable course of action would be to safely
> ban "AI"-backed contribution entirely. In other words, explicitly
> forbid people from using ChatGPT, Bard, GitHub Copilot, and so on, to
> create ebuilds, code, documentation, messages, bug reports and so on for
> use in Gentoo.
> 
> Just to be clear, I'm talking about our "original" content. We can't do
> much about upstream projects using it.
> 
> 
> Rationale:
> 
> 1. Copyright concerns. At this point, the copyright situation around
> generated content is still unclear. What's pretty clear is that pretty
> much all LLMs are trained on huge corpora of copyrighted material, and
> all fancy "AI" companies don't give shit about copyright violations.
> In particular, there's a good risk that these tools would yield stuff we
> can't legally use.
> 
> 2. Quality concerns. LLMs are really great at generating plausibly
> looking bullshit. I suppose they can provide good assistance if you are
> careful enough, but we can't really rely on all our contributors being
> aware of the risks.
> 
> 3. Ethical concerns. As pointed out above, the "AI" corporations don't
> give shit about copyright, and don't give shit about people. The AI
> bubble is causing huge energy waste. It is giving a great excuse for
> layoffs and increasing exploitation of IT workers. It is driving
> enshittification of the Internet, it is empowering all kinds of spam
> and scam.
> 
> 
> Gentoo has always stood out as something different, something that
> worked for people for whom mainstream distros were lacking. I think
> adding "made by real people" to the list of our advantages would be
> a good thing — but we need to have policies in place, to make sure shit
> doesn't flow in.
> 
> Compare with the shitstorm at:
> https://github.com/pkgxdev/pantry/issues/5358
> 
> --
> Best regards,
> Michał Górny

While I understand the concerns that may have triggered feeling the need for a 
rule like this. As someone from the field of machine learning (AI) engineer, I 
feel I need to add my brief opinion.

The pkgxdev thing very artificial and if there is a threat to quality/integrity 
it will not manifest itself as obviously which brings me to..

A rule like this is just not enforceable.

The contributor as they're signed is responsible for the quality of the 
contribution, even if it's been written by plain editor, dev environment with 
smart plugins (LSP) or their dog.

Other organizations have already had to deal with automated contributions which 
can sometimes go wrong for *all different* kinds of reasons for much longer and 
their approach may be an inspiration:
[0] OpenStreetMap: automated edits - 
https://wiki.openstreetmap.org/wiki/Automated_Edits_code_of_conduct
[1] Wikipedia: bot policy - https://en.wikipedia.org/wiki/Wikipedia:Bot_policy
The AI that we are dealing right now is just another means of automation after 
all.

As a machine learning engineer myself, I was contemplating creating an instance 
of a generative model myself for my own use from my own data, in which case the 
copyright and ethical point would absolutely not apply.
Also, there are ethically and copyright-ok language model projects such as 
project Bergamo [2] vetted by universities and EU, also used by [3] Mozilla 
(one of the prominent ethical AI proponents).

Banning all tools, just because some might be not up to moral standards, puts 
the ones that are, in a disadvantage in our world as a whole.

[2] Project Bergamo - https://browser.mt/
[3] Mozilla blog: training translation models - 
https://hacks.mozilla.org/2022/06/training-efficient-neural-network-models-for-firefox-translations/

- Martin

Reply via email to