Re: [gentoo-dev] Current unavoidable use of xz utils in Gentoo

2024-03-31 Thread Matt Jolly

Hi Eddie,

On 31/3/24 21:13, Eddie Chapman wrote:

At the moment there is far too much of
a cavalier attitude about the whole thing being shown by too many,
including here I'm sad to see.


It's obvious that this is something that you are very worried about, but 
I think that you need to take a deep breath and relax a little. I have 
not seen a cavalier attitude towards this issue.


What I see instead are developers who have made an initial assessment 
and disclosure, have taken some actions to mitigate the severity of the 
issue, and are carefully continuing their investigations (over a major 
holiday in large parts of the world) so that they can issue sane 
responses to actual threats, and not a knee-jerk response like 'Gentoo 
should re-encode all xzs in other formats' which, as was discussed 
above, adds significant complexity for no real benefit.


I've seen from your previous emails to the list that you know what 
paragraphs are and how to use them to break up your content into 
digestible chunks. Please continue using them - it makes it 
significantly easier to respond to your ideas and gives off an aura of 
professionalism that you will need if you want your concerns to be taken 
seriously and addressed directly.


Cheers,

Matt




[gentoo-dev] Last rites: media-video/obs-v4l2sink

2024-03-11 Thread Matt Jolly

# Matt Jolly  (2024-03-11)
# Obsolete, does not compile, archived upstream.
# Removal: 2024-04-10.  Bugs #876977, #907705.
media-video/obs-v4l2sink



Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo

2024-02-28 Thread Matt Jolly


But where do we draw the line? Are translation tools like DeepL 
allowed? I don't see much of a copyright issue for these.


I'd also like to jump in and play devil's advocate. There's a fair
chance that this is because I just got back from a
supercomputing/research conf where LLMs were the hot topic in every keynote.

As mentioned by Sam, this RFC is performative. Any users that are going
to abuse LLMs are going to do it _anyway_, regardless of the rules. We
already rely on common sense to filter these out; we're always going to
have BS/Spam PRs and bugs - I don't really think that the content being
generated by LLM is really any worse.

This doesn't mean that I think we should blanket allow poor quality LLM
contributions. It's especially important that we take into account the
potential for bias, factual errors, and outright plagarism when these
tools are used incorrectly.  We already have methods for weeding out low
quality contributions and bad faith contributors - let's trust in these
and see what we can do to strengthen these tools and processes.

A bit closer to home for me, what about using a LLMs as an assistive
technology / to reduce boilerplate? I'm recovering from RSI - I don't
know when (if...) I'll be able to type like I used to again. If a model
is able to infer some mostly salvagable boilerplate from its context
window I'm going to use it and spend the effort I would writing that to
fix something else; an outright ban on LLM use will reduce my _ability_
to contribute to the project.

What about using a LLM for code documentation? Some models can do a
passable job of writing decent quality function documentation and, in
production, I _have_ caught real issues in my logic this way. Why should
I type that out (and write what I think the code does rather than what
it actually does) if an LLM can get 'close enough' and I only need to do
light editing?

In line with the above, if the concern is about code quality / potential
for plagiarised code, What about indirect use of LLMs? Imagine a
hypothetical situation where a contributor asks a LLM to summarise a
topic and uses that knowledge to implement a feature. Is this now
tainted / forbidden knowledge according to the Gentoo project?

As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
and repos, or more likely trained on exclusively open-source
contributions and fine-tuned on Gentoo specifics? I'm in the process of
spinning up several models at work to get a handle on the tech / turn
more electricity into heat - this is a real possibility (if I can ever
find the time).

The cat is out of the bag when it comes to LLMs. In my real-world job I
talk to scientists and engineers using these things (for their
strengths) to quickly iterate on designs, to summarise experimental
results, and even to generate testable hypotheses. We're only going to
see increasing use of this technology going forward.

TL;DR: I think this is a bad idea. We already have effective mechanisms
for dealing with spam and bad faith contributions. Banning LLM use by
Gentoo contributors at this point is just throwing the baby out with the
bathwater.

As an alternative I'd be very happy some guidelines for the use of LLMs
and other assistive technologies like "Don't use LLM code snippets
unless you understand them", "Don't blindly copy and paste LLM output",
or, my personal favourite, "Don't be a jerk to our poor bug wranglers".

A blanket "No completely AI/LLM generated works" might be fine, too.

Let's see how the legal issues shake out before we start pre-emptively
banning useful tools. There's a lot of ongoing action in this space - at
the very least I'd like to see some thorough discussion of the legal
issues separately if we're making a case for banning an entire class of
technology.

A Gentoo LLM project formed of experts who could actually provide good
advice / some actual guidelines for LLM use within the project (and
engaging some real-world legal advice) might be a good starting point.
Are there any volunteers in the audience?

Thanks for listening to my TED talk,

Matt


OpenPGP_0x50EC548D52E051C0.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] special small-files USE flag without effect on dependencies: [ unicode ]

2024-02-09 Thread Matt Jolly

On 10/2/24 08:56, stefan1 wrote:
> Both removals definitely not still being contested and debated.

You've conveniently ignored the context immediately below the line that
you chose to quote and somehow decided to try and shoehorn in discussion
of a completely different (and settled) issue. Congratulations, it's not
often that I get to open my emails in the morning and untangle several
different flavours of wrong in only a single line!

> Maybe support as much choice as possible, and not act like you know what
> users want better that the users themselves.

FYI I don't appreciate your tone. Purge it down a little next time,
please?

Knowing what's best for users, especially when the users in question
appear to be willfully misinformed, is actually part of a developer's
job description. We even have a fancy section of the dev manual that's
all about not adding superfluous USE flags because we've learnt through
experience that providing a toggle for every single little thing in a
package is not, in fact, beneficial to users overall.

https://devmanual.gentoo.org/general-concepts/use-flags/index.html#when-not-to-use-use-flags

Users _rely_ on developers to make sane decisions on their behalf,
particularly around which USE options should exist and be togglable and
which options it makes sense to just hard enable or disable in the
configure step. In general we don't expect users to have an in-depth
knowledge of an upstream build system; they don't need to be across the
implementation details of every single configure option, they just need
to trust the options enabled by ebuild developers (and by extension the
decisions made by the wider project, including [and especially] areas
like QA).

In addition to this there are mechanisms through which a user can choose
to override the decisions made within an ebuild, including but not
limited to the already-mentioned EXTRA_ECONF, MYMESONARGS, MYCMAKEARGS
in package.env.

TL;DR: Methinks thou dost make too great a mountain of that which is but
a molehill

On to the main subject, we already default USE=unicode to on in the
ebuild and a browse over the sbcl docs suggests that it's also the
upstream default and is unlikely to break anything given that it's been
that way since version 0.8.17 and we're well into major version 2 by
now. Just drop the USE and default it to on - QA is right in this case:
in the absence of a compelling reason to keep the USE it should be
dropped.

Cheers,

Matt