[gentoo-dev] AI policy approved

2024-04-15 Thread Michał Górny
Hello,

On 2024-04-14, the Gentoo Council has unanimously approved the new AI
policy.  The original wording from the mailing list thread was approved:

"""
It is expressly forbidden to contribute to Gentoo any content that has
been created with the assistance of Natural Language Processing
artificial intelligence tools.  This motion can be revisited, should
a case been made over such a tool that does not pose copyright, ethical
and quality concerns.
"""

I have started drafting a Wiki page detailing this at [1].  We will also
look into how best provide this new information to our contributors.

[1] https://wiki.gentoo.org/wiki/Project:Council/AI_policy

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo

2024-04-15 Thread Jérôme Carretero
Hi,


It's a good thing that
 https://wiki.gentoo.org/wiki/Project:Council/AI_policy
has been voted, and that it mentions:

> This motion can be revisited, should a case been made over such a
tool  that does not pose copyright, ethical and quality concerns.


I wanted to provide some meat to discuss improvements of the specific 
phrasing "created with the assistance of Natural Language
Processing artificial intelligence tools" which may not be the most
optimal.


First, I think we should not limit this to LLMs / NLP stuff, when it
should be about all algorithmically/automatically generated content,
which could all cause a flood of time-wasting, low-quality information.


Second, I think we should define what would be acceptable use cases of
algorithmically-generated content; I'd suggest for a starting point,
the combination of:

- The algorithm generating such content is proper F/LOSS

- In the case of a machine learning algorithm, the dataset allowing
to generate such algorithm is proper F/LOSS itself (with traceability
of all of its bits)

- The algorithm generating such content is reproducible (training
produces the exact same bits)

- The algorithm did not publish the content automatically: all the
content was reviewed and approved by a human, who bears responsibility
for their contribution, and the content has been flagged as having been
generated using $tool.


Third, I think a "developer certificate of origin" policy could be
augmented with the "bot did not publish the content automatically" bits
and should also be mandated in the context of bug reporting, so as to
have a "human gate" for issues discovered by automation / tinderboxes.


Best regards,

-- 
Jérôme


signature.asc
Description: This is a digitally signed message part