Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo

2024-02-29 Thread Sam James
Matt Jolly  writes:

>> But where do we draw the line? Are translation tools like DeepL
>> allowed? I don't see much of a copyright issue for these.
>
> I'd also like to jump in and play devil's advocate. There's a fair
> chance that this is because I just got back from a
> supercomputing/research conf where LLMs were the hot topic in every keynote.
>
> As mentioned by Sam, this RFC is performative. Any users that are going
> to abuse LLMs are going to do it _anyway_, regardless of the rules. We
> already rely on common sense to filter these out; we're always going to
> have BS/Spam PRs and bugs - I don't really think that the content being
> generated by LLM is really any worse.
>
> This doesn't mean that I think we should blanket allow poor quality LLM
> contributions. It's especially important that we take into account the
> potential for bias, factual errors, and outright plagarism when these
> tools are used incorrectly.  We already have methods for weeding out low
> quality contributions and bad faith contributors - let's trust in these
> and see what we can do to strengthen these tools and processes.
>
> A bit closer to home for me, what about using a LLMs as an assistive
> technology / to reduce boilerplate? I'm recovering from RSI - I don't
> know when (if...) I'll be able to type like I used to again. If a model
> is able to infer some mostly salvagable boilerplate from its context
> window I'm going to use it and spend the effort I would writing that to
> fix something else; an outright ban on LLM use will reduce my _ability_
> to contribute to the project.

Another person approached me after this RFC and asked whether tooling
restricted to the current repo would be okay. For me, that'd be mostly
acceptable, given it won't make suggestions based on copyrighted code.

I also don't have a problem with LLMs being used to help refine commit
messages as long as someone is being sensible about it (e.g. if, as in
your situation, you know what you want to say but you can't type much).

I don't know how to phrase a policy off the top of my head which allows
those two things but not the rest.

>
> What about using a LLM for code documentation? Some models can do a
> passable job of writing decent quality function documentation and, in
> production, I _have_ caught real issues in my logic this way. Why should
> I type that out (and write what I think the code does rather than what
> it actually does) if an LLM can get 'close enough' and I only need to do
> light editing?

I suppose in that sense, it's the same as blindly listening to any
linting tool or warning without understanding what it's flagging and if
it's correct.

> [...]
> As a final not-so-hypothetical, what about a LLM trained on Gentoo docs
> and repos, or more likely trained on exclusively open-source
> contributions and fine-tuned on Gentoo specifics? I'm in the process of
> spinning up several models at work to get a handle on the tech / turn
> more electricity into heat - this is a real possibility (if I can ever
> find the time).

I think that'd be interesting. It also does a good job as a rhetorical
point wrt the policy being a bit too blanket here.

See https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/
too.

>
> The cat is out of the bag when it comes to LLMs. In my real-world job I
> talk to scientists and engineers using these things (for their
> strengths) to quickly iterate on designs, to summarise experimental
> results, and even to generate testable hypotheses. We're only going to
> see increasing use of this technology going forward.
>
> TL;DR: I think this is a bad idea. We already have effective mechanisms
> for dealing with spam and bad faith contributions. Banning LLM use by
> Gentoo contributors at this point is just throwing the baby out with the
> bathwater.

The problem is that in FOSS, a lot of people are getting flooded with AI
spam and therefore have little regard for any possibly-good parts of it.

I count myself as part of that group - it's very much sludge and I feel
tired just seeing it talked about at the moment.

Is that super rational? No, but we're also volunteers and it's not
unreasonable for said volunteers to then say "well I don't want any more
of that".

I think this colours a lot of the responses here, and it doesn't
invalidate them, but it also explains why nobody is really interested
in being open to this for now. Who can blame them (me included)?

>
> As an alternative I'd be very happy some guidelines for the use of LLMs
> and other assistive technologies like "Don't use LLM code snippets
> unless you understand them", "Don't blindly copy and paste LLM output",
> or, my personal favourite, "Don't be a jerk to our poor bug wranglers".
>
> A blanket "No completely AI/LLM generated works" might be fine, too.
>
> Let's see how the legal issues shake out before we start pre-emptively
> banning useful tools. There's a lot of ongoing action in this space - at
> the very least I'd 

Re: [gentoo-dev] RFC: banning "AI"-backed (LLM/GPT/whatever) contributions to Gentoo

2024-02-29 Thread Zoltan Puskas
Hi,

> Compare with the shitstorm at:
> https://github.com/pkgxdev/pantry/issues/5358

Thank you for this, it made my day.

Though I'm just a proxy maintainer for now, I also support this initiative,
there should be some guard rails set up around LLM usage.

> 1. Copyright concerns.  At this point, the copyright situation around
> generated content is still unclear.  What's pretty clear is that pretty
> much all LLMs are trained on huge corpora of copyrighted material, and
> all fancy "AI" companies don't give shit about copyright violations.
> In particular, there's a good risk that these tools would yield stuff we
> can't legally use.

IANAL, but IMHO if we stop respecting copyright law, even if indirectly via
LLMs, why should we expect others to respect our licenses? It could be prudent
to wait and see where will this land.

> 2. Quality concerns.  LLMs are really great at generating plausibly
> looking bullshit.  I suppose they can provide good assistance if you are
> careful enough, but we can't really rely on all our contributors being
> aware of the risks.

From my personal experience of using Github Copilot fine tuned on a large
private code base, it functions mostly okay as a more smart auto complete on a
single line of code, but when it comes to multiple lines of code, even when it
comes to filling out boiler plate code, it's at best a 'meh'. The problem is
that while the output looks okay-ish, often it will have subtle mistakes or will
hallucinate some random additional stuff not relevant to the source file in
question, so one ends up having to read and analyze the entire output of the LLM
to fix problems with the code. I found that the mental and time overhead rarely
makes it worth it, especially when a template can do a better job (e.g. this
would be the case for ebuilds).

Since during reviews we are supposed to be reading the entire contribution, not
sure how much difference this makes, but I can see a developer trusting LLM
too much might end up outsourcing the checking of the code to the reviewers,
which means we need to be extra vigilant and could lead to reduced trust of
contributions.

> 3. Ethical concerns.  As pointed out above, the "AI" corporations don't
> give shit about copyright, and don't give shit about people.  The AI
> bubble is causing huge energy waste.  It is giving a great excuse for
> layoffs and increasing exploitation of IT workers.  It is driving
> enshittification of the Internet, it is empowering all kinds of spam
> and scam.

I agree. I'm already tired of AI generated blog spam and so forth, such a waste
of time and quite annoying. I'd rather not have that on our wiki pages too. The
purpose of documenting things is to explain an area to someone new to it or
writing down unique quirks of a setup or a system. Since LLMs cannot write new
original things, just rehash information it has seen I'm not sure how could it
be helpful for this at all to be honest.

Overall my time is too valuable to shift through AI generated BS when I'm trying
to solve a problem, I'd prefer we keep a well curated high quality documentation
where possible.

Zoltan


signature.asc
Description: PGP signature


[gentoo-dev] Last rites: sci-libs/mpir

2024-02-29 Thread Sam James
# Eli Schwartz  (2024-02-29)
# Ancient fork of gmp from 2017. Various build issues, fails tests. All
# reverse dependencies turned out to be incorrect or preferred gmp
# anyways. No path forward to keeping it buildable, no use case for
# keeping it around.  Bug #812950, #874537, #925308
# Removal on 2024-03-31.
sci-libs/mpir


signature.asc
Description: PGP signature


Re: [gentoo-dev] [PATCH 2/3] texlive-common.eclass: check exit status of texmf-update

2024-02-29 Thread Ulrich Mueller
> On Thu, 29 Feb 2024, Florian Schmaus wrote:

> @@ -178,6 +178,10 @@ etexmf-update() {
>   if has_version 'app-text/texlive-core' ; then
>   if [[ -z ${ROOT} && -x "${EPREFIX}"/usr/sbin/texmf-update ]] ; 
> then
>   "${EPREFIX}"/usr/sbin/texmf-update
> + local res="${?}"
> + if [[ "${?}" -ne 0 ]] && ver_test -ge 2023; then

This condition will always be false.

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] [PATCH 3/3] texlive-common.eclass: Use nonfatal-respecting die for fmtutil-sys

2024-02-29 Thread Ulrich Mueller
> On Thu, 29 Feb 2024, Michał Górny wrote:

>> +"${EPREFIX}"/usr/bin/fmtutil-sys --all &> /dev/null \
>> +|| die -n "fmtutil-sys returned non-zero exit 
>> status ${res}"

> Put '||' at end of the line, then you won't need the redundant
> backslash.

I don't think we have such a policy, so || at end of line or beginning
of next line is really up to personal preference.

Independent of this, looks like ${res} is not defined?

Ulrich


signature.asc
Description: PGP signature


Re: [gentoo-dev] [PATCH 3/3] texlive-common.eclass: Use nonfatal-respecting die for fmtutil-sys

2024-02-29 Thread Michał Górny
On Thu, 2024-02-29 at 14:38 +0100, Florian Schmaus wrote:
> Signed-off-by: Florian Schmaus 
> ---
>  eclass/texlive-common.eclass | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/eclass/texlive-common.eclass b/eclass/texlive-common.eclass
> index 96e962cb8027..85cdb8ff204e 100644
> --- a/eclass/texlive-common.eclass
> +++ b/eclass/texlive-common.eclass
> @@ -199,7 +199,8 @@ efmtutil-sys() {
>   if has_version 'app-text/texlive-core' ; then
>   if [[ -z ${ROOT} && -x "${EPREFIX}"/usr/bin/fmtutil-sys ]] ; 
> then
>   einfo "Rebuilding formats"
> - "${EPREFIX}"/usr/bin/fmtutil-sys --all &> /dev/null || 
> die
> + "${EPREFIX}"/usr/bin/fmtutil-sys --all &> /dev/null \
> + || die -n "fmtutil-sys returned non-zero exit 
> status ${res}"

Put '||' at end of the line, then you won't need the redundant
backslash.

>   else
>   ewarn "Cannot run fmtutil-sys for some reason."
>   ewarn "Your formats might be inconsistent with your 
> installed ${PN} version"

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Michał Górny
On Thu, 2024-02-29 at 15:21 +0100, Florian Schmaus wrote:
> On 29/02/2024 15.08, Michael Orlitzky wrote:
> > On Thu, 2024-02-29 at 14:47 +0100, Florian Schmaus wrote:
> > > >
> > > > +if [[ -z ${TL_PV} ]] \
> > > > +  && [[ ${EAPI} -ge 8 ]] \
> > > 
> > > I am skeptical of this construct, as in the past we had non-numeric
> > > EAPIs. So I may have to go with EAPI == 8 for now. Input appreciated.
> > > 
> > 
> > 
> > The eclass only supports EAPIs {7,8,...} so it should suffice to
> > blacklist EAPI=7.
> 
> Fair point, but that would mean to remember to adjust this line once the 
> eclass gets support for EAPI 9.
> 
> It appears that bash does the right thing:
> 
> $ if [[ "eapi-future" -gt 8 ]]; then echo "is greater than 8"; else echo 
> "is NOT greater than 8"; fi
> is NOT greater than 8
> 
> even considering
> 
> $ if [[ "9-eapi-future" -gt 8 ]]; then echo "is greater than 8"; else 
> echo "is NOT greater than 8"; fi
> is greater than 8
> 
> which would be fine.
> 
> Although I prefer the current approach, it is not a hill to die on for me.
> 

It is invalid to treat EAPI as an integer.

The standard practice is to explicitly list old EAPIs, so that no
changes need to preserve the new behavior for new EAPIs.

-- 
Best regards,
Michał Górny



signature.asc
Description: This is a digitally signed message part


Re: [gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Florian Schmaus

On 29/02/2024 15.34, Michael Orlitzky wrote:

On Thu, 2024-02-29 at 15:21 +0100, Florian Schmaus wrote:


The eclass only supports EAPIs {7,8,...} so it should suffice to
blacklist EAPI=7.


Fair point, but that would mean to remember to adjust this line once the
eclass gets support for EAPI 9.



It should do the right thing automatically when EAPI=9 is added, no? If
the goal of "-gt 8" is to match EAPI=8 and newer, then rejecting (only)
the one EAPI that's older than 8 should be equivalent without involving
a numerical comparison.


Of course, you are right. Not sure how I could miss that.

I'll go with [[ ${EAPI} != 7 ]] then.

Thanks

- Flow



OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Michael Orlitzky
On Thu, 2024-02-29 at 15:21 +0100, Florian Schmaus wrote:
> > 
> > The eclass only supports EAPIs {7,8,...} so it should suffice to
> > blacklist EAPI=7.
> 
> Fair point, but that would mean to remember to adjust this line once the 
> eclass gets support for EAPI 9.
> 

It should do the right thing automatically when EAPI=9 is added, no? If
the goal of "-gt 8" is to match EAPI=8 and newer, then rejecting (only)
the one EAPI that's older than 8 should be equivalent without involving
a numerical comparison.



Re: [gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Florian Schmaus

On 29/02/2024 15.08, Michael Orlitzky wrote:

On Thu, 2024-02-29 at 14:47 +0100, Florian Schmaus wrote:
   
+if [[ -z ${TL_PV} ]] \

+  && [[ ${EAPI} -ge 8 ]] \


I am skeptical of this construct, as in the past we had non-numeric
EAPIs. So I may have to go with EAPI == 8 for now. Input appreciated.




The eclass only supports EAPIs {7,8,...} so it should suffice to
blacklist EAPI=7.


Fair point, but that would mean to remember to adjust this line once the 
eclass gets support for EAPI 9.


It appears that bash does the right thing:

$ if [[ "eapi-future" -gt 8 ]]; then echo "is greater than 8"; else echo 
"is NOT greater than 8"; fi

is NOT greater than 8

even considering

$ if [[ "9-eapi-future" -gt 8 ]]; then echo "is greater than 8"; else 
echo "is NOT greater than 8"; fi

is greater than 8

which would be fine.

Although I prefer the current approach, it is not a hill to die on for me.

- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Michael Orlitzky
On Thu, 2024-02-29 at 14:47 +0100, Florian Schmaus wrote:
> >   
> > +if [[ -z ${TL_PV} ]] \
> > +  && [[ ${EAPI} -ge 8 ]] \
> 
> I am skeptical of this construct, as in the past we had non-numeric 
> EAPIs. So I may have to go with EAPI == 8 for now. Input appreciated.
> 


The eclass only supports EAPIs {7,8,...} so it should suffice to
blacklist EAPI=7.




[gentoo-dev] Re: [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Florian Schmaus

On 29/02/2024 14.38, Florian Schmaus wrote:

Signed-off-by: Florian Schmaus 
---
  eclass/texlive-module.eclass | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/eclass/texlive-module.eclass b/eclass/texlive-module.eclass
index afcd4532975a..d1bf0f86185b 100644
--- a/eclass/texlive-module.eclass
+++ b/eclass/texlive-module.eclass
@@ -85,6 +85,12 @@ HOMEPAGE="https://www.tug.org/texlive/;
  
  IUSE="doc source"
  
+if [[ -z ${TL_PV} ]] \

+  && [[ ${EAPI} -ge 8 ]] \


I am skeptical of this construct, as in the past we had non-numeric 
EAPIs. So I may have to go with EAPI == 8 for now. Input appreciated.


- Flow


OpenPGP_0x8CAC2A9678548E35.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature


[gentoo-dev] [PATCH 3/3] texlive-common.eclass: Use nonfatal-respecting die for fmtutil-sys

2024-02-29 Thread Florian Schmaus
Signed-off-by: Florian Schmaus 
---
 eclass/texlive-common.eclass | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/eclass/texlive-common.eclass b/eclass/texlive-common.eclass
index 96e962cb8027..85cdb8ff204e 100644
--- a/eclass/texlive-common.eclass
+++ b/eclass/texlive-common.eclass
@@ -199,7 +199,8 @@ efmtutil-sys() {
if has_version 'app-text/texlive-core' ; then
if [[ -z ${ROOT} && -x "${EPREFIX}"/usr/bin/fmtutil-sys ]] ; 
then
einfo "Rebuilding formats"
-   "${EPREFIX}"/usr/bin/fmtutil-sys --all &> /dev/null || 
die
+   "${EPREFIX}"/usr/bin/fmtutil-sys --all &> /dev/null \
+   || die -n "fmtutil-sys returned non-zero exit 
status ${res}"
else
ewarn "Cannot run fmtutil-sys for some reason."
ewarn "Your formats might be inconsistent with your 
installed ${PN} version"
-- 
2.43.0




[gentoo-dev] [PATCH 2/3] texlive-common.eclass: check exit status of texmf-update

2024-02-29 Thread Florian Schmaus
The texlive eclasses where traditionally lenient when it comes to the
exit status of texmf-update and fmtutil-sys, as they would return a
non-zero exit status in certain situations, especially when bootstraping
the texlive installation, i.e., when texlive-core is installed.

With the upcoming Texlive 2023 bbump we can make this more strict,
having texlive-core use nonfatal.

Signed-off-by: Florian Schmaus 
---
 eclass/texlive-common.eclass | 4 
 1 file changed, 4 insertions(+)

diff --git a/eclass/texlive-common.eclass b/eclass/texlive-common.eclass
index fab6ff66ecd5..96e962cb8027 100644
--- a/eclass/texlive-common.eclass
+++ b/eclass/texlive-common.eclass
@@ -178,6 +178,10 @@ etexmf-update() {
if has_version 'app-text/texlive-core' ; then
if [[ -z ${ROOT} && -x "${EPREFIX}"/usr/sbin/texmf-update ]] ; 
then
"${EPREFIX}"/usr/sbin/texmf-update
+   local res="${?}"
+   if [[ "${?}" -ne 0 ]] && ver_test -ge 2023; then
+   die -n "texmf-update returned non-zero exit 
status ${res}"
+   fi
else
ewarn "Cannot run texmf-update for some reason."
ewarn "Your texmf tree might be inconsistent with your 
configuration"
-- 
2.43.0




[gentoo-dev] [PATCH 1/3] texlive-module.eclass: implicitly set TL_PV if not explicitly set

2024-02-29 Thread Florian Schmaus
Signed-off-by: Florian Schmaus 
---
 eclass/texlive-module.eclass | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/eclass/texlive-module.eclass b/eclass/texlive-module.eclass
index afcd4532975a..d1bf0f86185b 100644
--- a/eclass/texlive-module.eclass
+++ b/eclass/texlive-module.eclass
@@ -85,6 +85,12 @@ HOMEPAGE="https://www.tug.org/texlive/;
 
 IUSE="doc source"
 
+if [[ -z ${TL_PV} ]] \
+  && [[ ${EAPI} -ge 8 ]] \
+  && [[ ${CATEGORY} == dev-texlive ]]; then
+   TL_PV=$(ver_cut 1)
+fi
+
 RDEPEND=">=app-text/texlive-core-${TL_PV:-${PV}}"
 # We do not need anything from SYSROOT:
 #   Everything is built from the texlive install in /
-- 
2.43.0




[gentoo-dev] [PATCH 0/3] *** Three minor changes to texlive-(common|mmodule).elcass ***

2024-02-29 Thread Florian Schmaus
Following are three minor changes to texlive-(common|module).eclass,
which I expect to be the last changes to the eclasses before start
moving texlive 2023 from ::tex-overlay into ::gentoo (initally
pmasked).

Florian Schmaus (3):
  texlive-module.eclass: implicitly set TL_PV if not explicitly set
  texlive-common.eclass: check exit status of texmf-update
  texlive-common.eclass: Use nonfatal-respecting die for fmtutil-sys

 eclass/texlive-common.eclass | 7 ++-
 eclass/texlive-module.eclass | 6 ++
 2 files changed, 12 insertions(+), 1 deletion(-)

-- 
2.43.0