Hello,

Recently we've seen quite a grow in LLM usage in upstream packages, to
various degrees.  The exact degree differs, apparently ranging from
using LLMs as glorified autocomplete to asking them to rewrite the whole
project or being convinced that a chatbot is sentient.  The results are
raising some concerns.  I've originally written to the distributions ml
[1] to facilitate some feedback on how other distributions are handling
it, but the major players didn't reply.

Our "AI policy" [2] covers only direct contributions to Gentoo.  At the
time, we did not consider it appropriate to start restricting what
packages are added to Gentoo.  Still, general rules apply and some of
the recent packages are starting to raise concerns there.  Hence I'm
looking for your feedback.

Two recent cases that impacted the Python team were autobahn and
chardet.

In case of autobahn, the author started using two LLMs to write major
portions of code, as well as reply to bugs.  This caused some "weird"
bugs [3,4,5], which eventually lead me to ask upstream to reconsider
which met with quite a backlash [6].  While I haven't checked the code,
judging by the kind of slop that (almost) got committed as a result of
my bug reports, we've decided that it's not safe to package the new
versions and autobahn was deprecated in Gentoo.  Still, it has reverse
dependencies and I don't really have the energy to go around convincing
people to move away from it.

In case of chardet, one of the current maintainers suddenly decided to
ask LLM to rewrite the whole codebase under a different license,
therefore obliterating all past commits.  This triggered a huge
shitstorm [7], in which the maintainer pretty much admitted he did this
deliberately and was pretty much a complete asshole about it (basically,
"I've been co-maintaining this for N years, nobody was doing anything,
so I've used a plagiarism machine to rewrite the whole thing and claim
it as my own work").  I haven't read most of it, given that the same
things keep repeating: mostly GPL haters cosplaying lawyers.  We've also
decided not to package the new version, given the ethical and licensing
concerns.

However, these aren't the only concerning packages.  As a result of
chardet rewrite, binaryornot has replaced it with its own code which in
which LLMs were quite likely to have been involved [8].  Linux kernel
apparently uses them too [9].  KeePassXC too, triggering security
concerns [10].  So does vim [11].

The key problem is, how do we decide whether to package something or
not?  We definitely don't have the capability of inspecting whatever
crap upstream may be committing.  Of course, that was always a risk, but
with LLMs around, things are just crazy.  And we definitely can't stick
with old versions forever.

The other side of this is that I have very little motivation to put my
human effort into dealing with random slop people are pushing to
production these days, and reporting issues that are going to be met
with incomprehensible slop replies.


[1] 
https://lore.kernel.org/distributions/[email protected]/
[2] https://wiki.gentoo.org/wiki/Project:Council/AI_policy
[3] https://github.com/crossbario/autobahn-python/issues/1716
[4] https://github.com/crossbario/autobahn-python/issues/1735
[5] https://github.com/crossbario/autobahn-python/issues/1782
[6] https://github.com/crossbario/autobahn-python/discussions/1818
[7] https://github.com/chardet/chardet/issues/327
[8] https://github.com/binaryornot/binaryornot/releases/tag/v0.5.0
[9] https://lwn.net/Articles/1026558/
[10] https://github.com/keepassxreboot/keepassxc/issues/12635
[11] https://hachyderm.io/@AndrewRadev/116175986749599825

-- 
Best regards,
Michał Górny

Attachment: signature.asc
Description: This is a digitally signed message part

Reply via email to