Re: Concerns/questions around Software Heritage Archive

2024-05-09 Thread Maxim Cournoyer
Hi Ian, Ludovic.

Ludovic Courtès  writes:

> Hi Ian,
>
> Ian Eure  skribis:
>
>> Summarizing the situation:
>>
>> - SHF has an opaque, difficult, and undocumented process for
>>   handling name changes.  I’s like to stress again that this is
>>   *not* strictly a transgender issue (though it likely affects   them
>>   more, or in worse/different ways) -- it is a human respect   issue.
>>   Many, many more cisgender people change their name than
>>   transgender people.
>
> It is also not strictly an SWH issue: how does Internet Archive handle
> name changes?  What about append-only storage in general?  We’ve
> discussed this already.

>> - SHF gave their archive to HuggingFace, an "AI" company which is
>>   generating derived works with no attribution or provenance, in
>>   ways which violate the both licenses of the projects used to   train
>>  their model, and the SHF principles for LLMs.
>
> [...]
>
>> - Has Guix reached out to SHF to express these concerns / get a
>>   response?
>
> I’ve seen and participated in informal discussions, but that’s all I
> know.  Maintainers?

We haven't.  Given some improvements were apparently already made by SWF
in response to concerns raised, it seems the dialogue should continue.

>> - Whether a public or private response, what would Guix consider   to
>>  be an acceptable response?  An unacceptable respoinse?
>> - How long is Guix willing to wait for a response?
>
> Free software people, myself included, have expressed disappointment
> regarding the use of code harvested by SWH for HuggingFace’s training.
> Stefano Zacchiroli of SWH responded to these concerns on Mastodon back
> in March, as you probably saw.
>
> One important point is that copyleft code is excluded from the training
> dataset; I was able to anecdotally check that for GPL code such as Guix
> using their interface (there was a thread on Mastodon but I can’t find
> it): .  That
> addresses my main concern.
>
> Remaining concerns include the weak wording of the principles put
> forward by SWH in its statement on LLMs:
> .
> I think this is something worth discussing further with them (it’s
> already been brought up notably on Mastodon).  It’s not clear to me
> whether this is a task for Guix as a project.

I don't think it is a task for Guix specifically, but rather for all
users of SWH or interested parties.

-- 
Thanks,
Maxim



Re: Concerns/questions around Software Heritage Archive

2024-05-02 Thread Ludovic Courtès
Hi Ian,

Ian Eure  skribis:

> Summarizing the situation:
>
> - SHF has an opaque, difficult, and undocumented process for
>   handling name changes.  I’s like to stress again that this is
>   *not* strictly a transgender issue (though it likely affects   them
>   more, or in worse/different ways) -- it is a human respect   issue.
>   Many, many more cisgender people change their name than
>   transgender people.

It is also not strictly an SWH issue: how does Internet Archive handle
name changes?  What about append-only storage in general?  We’ve
discussed this already.

> - SHF gave their archive to HuggingFace, an "AI" company which is
>   generating derived works with no attribution or provenance, in
>   ways which violate the both licenses of the projects used to   train
>  their model, and the SHF principles for LLMs.

[...]

> - Has Guix reached out to SHF to express these concerns / get a
>   response?

I’ve seen and participated in informal discussions, but that’s all I
know.  Maintainers?

> - Whether a public or private response, what would Guix consider   to
>  be an acceptable response?  An unacceptable respoinse?
> - How long is Guix willing to wait for a response?

Free software people, myself included, have expressed disappointment
regarding the use of code harvested by SWH for HuggingFace’s training.
Stefano Zacchiroli of SWH responded to these concerns on Mastodon back
in March, as you probably saw.

One important point is that copyleft code is excluded from the training
dataset; I was able to anecdotally check that for GPL code such as Guix
using their interface (there was a thread on Mastodon but I can’t find
it): .  That
addresses my main concern.

Remaining concerns include the weak wording of the principles put
forward by SWH in its statement on LLMs:
.
I think this is something worth discussing further with them (it’s
already been brought up notably on Mastodon).  It’s not clear to me
whether this is a task for Guix as a project.

(I do not forget that, in the meantime, Microsoft ingests everything
that’s on GitHub, including copyleft code, and including clones of repos
that were not initially hosted there.)

I’m not sure this is the kind of answer you expected, but I hope it
makes sense!

Ludo’.



Re: Concerns/questions around Software Heritage Archive

2024-05-01 Thread Tomas Volf
On 2024-05-01 08:29:29 -0700, Ian Eure wrote:
>  If Guix is going to continue to facilitate license violations, I will have no
> choice but to remove my software from it to defend them.

Purely hypothetically, if it would come to this, how would you go about it?
Assuming the software is under free license (requirement for inclusion into
Guix), I am unsure based on what would the removal be demanded.  Do you have
some specific approach in mind?

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-05-01 Thread Ian Eure

Hello Guixers,

It’s been another week with no response or movement on this.  I’m 
disappointed that this situation seems to be getting treated so 
lightly.  Adhering to the terms of software licenses is 
fundamental to the operation of the free software ecosystem; there 
is no software freedom without it.  It’s surprising that a pretty 
clear-cut situation of creating derivative works of free software 
in violation of their licenses would be shrugged off so easily.


Whatever the Guix organization’s position is, I’m reaching my 
personal limit, and need to see some kind of positive movement on 
this[1].  If Guix is going to continue to facilitate license 
violations, I will have no choice but to remove my software from 
it to defend them.


 — Ian

[1]: Personally, I would be satisfied with a per-package setting 
which disables scheduling source for archiving by SWH.  Seeing 
this, or a committment to build this within a reasonable 
timeframe, would allay my concerns.


Ian Eure  writes:


Hello,

I’m following up on this since discussion since it’s been a 
month and

I haven’t heard any updates.

Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for
  handling name changes.  I’s like to stress again that this is
  *not* strictly a transgender issue (though it likely affects 
  them
  more, or in worse/different ways) -- it is a human respect 
  issue.

  Many, many more cisgender people change their name than
  transgender people.

- SHF gave their archive to HuggingFace, an "AI" company which 
is

  generating derived works with no attribution or provenance, in
  ways which violate the both licenses of the projects used to 
  train

 their model, and the SHF principles for LLMs.

- HuggingFace wasn’t respecting requests to opt-out of their 
model.



On the first point, it sounds like SHF has made concrete 
progress to
improve[1], which is very good to hear.  If SHF continues on 
this

course, I think the concern is resolved.

On the third point, HuggingFace has begun honoring opt-out 
requests,
but is still very far behind.  Also, they don’t remove code from 
the
older versions of their model -- it remains there forever.  This 
is

progress, but still, not great.

On the second point, I have not seen any public statements 
indicating
that either SHF or HuggingFace even acknowledges the problem. 
SHF’s

most recent newsletter[2], published in April 2024 (after these
concerns came to light), continues to tout that StarCoder2 is 
"the

first AI model aligned with our principles," which appears to be
false.  StarCoder2 includes both licensed and unlicensed code, 
and
HuggingFace’s own StarChat2 playground produces works derivative 
of
this code, with no attribution or licensing information.  There 
is

also no statement or position on the SHF news blog.  Nor hsa
HuggingFace either fixed their tools, or made a statement.  This 
is

still very much a live concern.

I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a
  response?
- Whether a public or private response, what would Guix consider 
to

 be an acceptable response?  An unacceptable respoinse?
- How long is Guix willing to wait for a response?

Thanks,

 — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some

[2]:
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf

Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall,

and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies

from
being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should
*never*
be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian







Re: Concerns/questions around Software Heritage Archive

2024-04-20 Thread Ian Eure

Hello,

I’m following up on this since discussion since it’s been a month 
and I haven’t heard any updates.


Summarizing the situation:

- SHF has an opaque, difficult, and undocumented process for 
 handling name changes.  I’s like to stress again that this is 
 *not* strictly a transgender issue (though it likely affects 
 them more, or in worse/different ways) -- it is a human respect 
 issue.  Many, many more cisgender people change their name than 
 transgender people.


- SHF gave their archive to HuggingFace, an "AI" company which is 
 generating derived works with no attribution or provenance, in 
 ways which violate the both licenses of the projects used to 
 train their model, and the SHF principles for LLMs.


- HuggingFace wasn’t respecting requests to opt-out of their 
 model.



On the first point, it sounds like SHF has made concrete progress 
to improve[1], which is very good to hear.  If SHF continues on 
this course, I think the concern is resolved.


On the third point, HuggingFace has begun honoring opt-out 
requests, but is still very far behind.  Also, they don’t remove 
code from the older versions of their model -- it remains there 
forever.  This is progress, but still, not great.


On the second point, I have not seen any public statements 
indicating that either SHF or HuggingFace even acknowledges the 
problem.  SHF’s most recent newsletter[2], published in April 2024 
(after these concerns came to light), continues to tout that 
StarCoder2 is "the first AI model aligned with our principles," 
which appears to be false.  StarCoder2 includes both licensed and 
unlicensed code, and HuggingFace’s own StarChat2 playground 
produces works derivative of this code, with no attribution or 
licensing information.  There is also no statement or position on 
the SHF news blog.  Nor hsa HuggingFace either fixed their tools, 
or made a statement.  This is still very much a live concern.


I have a few questions:

- Has Guix reached out to SHF to express these concerns / get a 
 response?
- Whether a public or private response, what would Guix consider 
 to be an acceptable response?  An unacceptable respoinse?

- How long is Guix willing to wait for a response?

Thanks,

 — Ian

[1]: 
https://cohost.org/arborelia/post/5273879-they-are-fixing-some
[2]: 
https://www.softwareheritage.org/wp-content/uploads/2024/04/Software-Heritage-2024-Vision-Milestones-Newsletter.pdf


Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall, and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer 
who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I 
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies from

being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should 
*never*

be included in SWH?

Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian





Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-22 Thread indieterminacy

On 2024-03-18 15:14, Andreas Enge wrote:

Am Mon, Mar 18, 2024 at 04:33:49PM +0200 schrieb MSavoritias:
Actually gitlab already is facing something like that and they are 
doing

what was proposed elsewhere: mapping of UUIDs to display names
https://gitlab.com/gitlab-org/gitlab/-/issues/20960


Interesting, thanks! It is something that maybe could be implemented by
Savannah, but it would probably require a bit of thought. And yet 
again,

somehow the mapping uuid<->"real" names would have to be public (people
would "git clone" commits with uuids, and would need to locally replace
them by "real" names); so people can always keep copies of the mapping
over time.

I am also not quite sure about the signing process for committers;
in principle keys are enough, but in GPG they are tied to email 
addresses,

and I do not know whether we use this in Guix.

In the end, my impression is this will not achieve much more than what 
we

already have with the .mailmap approach. In a sense, everyone would use
a pseudonym (their uuid), and then we would keep a mapping between 
these

pseudonyms and, well, "real" names or other pseudonyms chosen by the
contributors...

Hm, this could indeed be implemented exactly with .mailmap, no?
We would need to enforce that authors use a uuid of a specific format,
and potentially an empty or dummy email address, or another uuid.
Then we could keep a .mailmap file. The history of "real" identities
would still be visible in the git history, but as said above, anyway
we could not prevent people from storing the association information
over time.

Right fair. As I have said before SWH does break Guix CoC effectively 
right

now.
So what Guix does from this point on will effectively dictate if the 
CoC is

valid or not.


Well, the CoC is valid on our communication channels; so what SWH does 
with

our software is outside its scope (that is governed by the license).

Andreas


I have happened to stumble across a new initiative concerning UUIDs for 
academic researchers.


Here is their description:
```
ORCID, which stands for Open Researcher and Contributor ID, is a free, 
unique, persistent identifier (PID) for individuals to use as they 
engage in research, scholarship, and innovation activities. We provide 
ORCID to researchers free of charge so that we may realize our vision of 
connecting all who participate in research, scholarship, and innovation 
are uniquely identified and connected to their contributions across 
disciplines, borders, and time.

```

Here are its guiding principles:
```
Our Founding Principles

ORCID will work to support the creation of a permanent, clear, and 
unambiguous record of research and scholarly communication by enabling 
reliable attribution of authors and contributors.
ORCID will transcend discipline, geographic, national, and 
institutional boundaries.
Participation in ORCID is open to any organization that has an 
interest in research and scholarly communications.
Access to ORCID services will be based on transparent and 
non-discriminatory terms posted on the ORCID website.
Researchers will be able to create, edit, and maintain an ORCID 
identifier and record free of charge.
Researchers will control the defined privacy settings of their own 
ORCID record data.
All data contributed to ORCID by researchers or claimed by them will 
be available in standard formats for free download (subject to the 
researchers’ own privacy settings) that are updated once a year and 
released under a CC0 waiver.
All software developed by ORCID will be publicly released under an 
Open Source Software license approved by the Open Source Initiative. For 
the software it adopts, ORCID will prefer Open Source.
ORCID identifiers and record data (subject to privacy settings) will 
be made available via a combination of no-charge and for-a-fee APIs and 
services. Any fees will be set to ensure the sustainability of ORCID as 
a not-for-profit, charitable organization focused on the long-term 
persistence of the ORCID system.
ORCID will be governed by representatives from a broad cross-section 
of stakeholders, the majority of whom are not-for-profit, and will 
strive for maximal transparency by publicly posting summaries of all 
Board meetings and annual financial reports.

```

While I do not have the focus to make a further evaluation,
I should point out that ORCID is a component of the nascent Open Science 
Network

https://openscience.network/

FWIW, recognising an academic in OSN and being aware of the quality of 
the tooling Bonfire Networks make me wonder whether ORCID has some good 
design principles

https://bonfirenetworks.org/

In any case, it may provide a practical point for comparison given the 
thicket of governance issues this thread has discovered.



Warmest regards,


fsnjfkjljcjcjcdnmddfnfdfnlzxvcllnjnrejvns  v fjfdsjhsv



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Development of GNU Guix and the GNU System distribution.
> IMHO This is a quiet egocentric point of view.
> What are you implying with the "loud" minority here?

Hi,

"Quiet" is a funny typo here.

Also, "peace on Earth and goodwill toward [all]." [1]

Please

[1] https://www.youtube.com/watch?v=74ocbvwam7c



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Philip McGrath


On Thu, Mar 21, 2024, at 11:11 AM, MSavoritias wrote:
> On 3/21/24 17:08, Giovanni Biscuolo wrote:
>> […]
>> I don't understand how using petnames, uuids or even a re:claimID
>> identity (see below) could solve the problem with "rewriting history" in
>> case a person wishes to change his or her previous _published_ name
>> (petname, uuid...) in an archived content-addressable storage system.
>
> It doesnt solve the problem of rewriting history. It solves the bug of 
> having names part of the git history.
>
> see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab 
> doing the same thing.
>

Unless I’m missing something, the linked Gitlab issue seems to be a proposal by 
someone in February 2018 that Gitlab adopt some system of using UUIDs instead 
of author information. There was fairly limited discussion, with the last 
comment in May 2018. There does not seem to have been a consensus supporting 
the proposal, and I’m not seeing any indication that Gitlab plans to implement 
the proposal.

Furthermore, the author and committer metadata are not the only places where 
people’s names appear in Guix. For example, I know some font packages that 
mention the name of the font designer in the package’s description. More 
broadly, Guix also refers to package sources by their content hashes: most 
sources probably contain some people’s names, and any of these could face the 
same problems as names directly included in the Guix Git repository.

I strongly believe in the importance of protecting trans people from 
harassment. I don’t know how to solve the tension with long-term bit-for-bit 
reproducibility. 

Philip



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread pinoaffe


Hartmut Goebel  writes:

> Am 21.03.24 um 07:12 schrieb MSavoritias:
>> Specifically the social rules that we support trans people and we
>> want to include them. Any person really that want to change their
>> name at some point for some reason. 
>
> Interestingly you are asking the right to get the old name rewritten
> for trans people only.

This discussion arose because of the experiences of someone who's trans,
and is relevant to many trans folks, so of course this will remain a
major focus of the discussion.

> To be frank: IMHO This is a quiet egocentric point of view.
You're wrong and it ain't

> In many cultures all over the world women are required to change their
> name when they merry. And you are not asking for women's right. But
> only for right for the small but loud minority of trans people.
I am not aware of any women who want/have wanted to retroactively change
historic occurences of their maiden name, so your mail reeks of concern
trolling to me.

There are (of course) instances where people may want to replace
historic use of a name with another name for reasons other than
transitioning, but that should make you rejoice in the fact that
protecting trans people's rights also protects cis people's rights.
This should not at all be surprising, as trans rights are human rights.

Kind regards,
pinoaffe



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread pinoaffe


Giovanni Biscuolo  writes:
> [...]
> pinoaffe  writes:
>> - should examine possible workarounds going forward,
>> - should move towards something like UUIDs and petnames in the long run.
>>
>> (see https://spritelyproject.org/news/petname-systems.html).
>
> I don't understand how using petnames, uuids or even a re:claimID
> identity (see below) could solve the problem with "rewriting history" in
> case a person wishes to change his or her previous _published_ name
> (petname, uuid...) in an archived content-addressable storage system.
It would decouple "name" from "identity as represented in the git merkle
tree", thus allowing name changes to occur without affecting hashes and
the like.  I see no possible reason for UUID changes, as UUIDs (by
themself) are not personally identifying.  This of course would not
allow retroactive splitting/merging of identities, but I feel like
permitting that is incompatible with the idea of identities anyhow.

> As a side note, other than the "petname system" please also consider
> re:claimID from GNUnet:
> https://www.gnunet.org/en/reclaim/index.html
> https://www.gnunet.org/en/reclaim/motivation.html

Sure, I'll take a look

kind regards,
pinoaffe



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Efraim Flashner
On Thu, Mar 21, 2024 at 04:23:01PM +0100, Hartmut Goebel wrote:
> Am 21.03.24 um 07:12 schrieb MSavoritias:
> > Specifically the social rules that we support trans people and we want
> > to include them. Any person really that want to change their name at
> > some point for some reason.
> 
> Interestingly you are asking the right to get the old name rewritten for
> trans people only.
> 
> To be frank: IMHO This is a quiet egocentric point of view.

I took it in as though we were discussing the recent activity, not that
it was ONLY this instance that we care about.  I have a number of
friends who have more than 1 set of names and specifically wish to to by
one set over the other.  The point is that there is a vocal portion of
people in the world who insist on deadnaming people, and that is not
okay.

> In many cultures all over the world women are required to change their name
> when they merry. And you are not asking for women's right. But only for
> right for the small but loud minority of trans people.

As a project, we support people by addressing them by their preferred
name, and honoring their wishes as to name, gender, honorifics, etc. For
all people. If a person chooses to go by their "maiden name" or their
"married name" or a pseudonym, that's their prerogative.

> 
> -- 
> Regards
> Hartmut Goebel
> 
> | Hartmut Goebel  | h.goe...@crazy-compilers.com   |
> | www.crazy-compilers.com | compilers which you thought are impossible |
> 
> 

-- 
Efraim Flashner  רנשלפ םירפא
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted


signature.asc
Description: PGP signature


Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Ekaitz Zarraga

Hi,


What are you implying with the "loud" minority here?


MSavoritias


He's probably talking about the same thing that made you continue being 
heated after the fact you were told to calm down and you are not wasting 
any single opportunity to continue answering every single email in this 
thread and all the subthreads that continue to appear.


I don't want to look insensitive but I think we are revolving around the 
same issue over and over again and honestly it's bothering me.


Not the discussion itself, which has a profound meaning and it's a deep 
issue, but the way it is taking place and where it is taking place.


It's also extremely sad to me to see many unanswered questions in the 
help-guix mailing list, which might or might not include questions from 
trans people that are willing to use the fantastic software we all 
collectively maintain and which would help them have a better life, and 
yet we are talking about the detail of the detail here for no real 
reason: this conversation does not have any practical purpose.


Also there are hundreds of issues open in guix, which don't happen to 
deserve the attention this discussion has.


I don't think this conversation is going to reach anywhere, and I would 
like to encourage people to spend their energy somewhere else until we 
really start having a different mindset on the issue. As we were 
suggested to do.


I don't think this is a topic for `guix-devel` mailing list. If it is, 
please let me know and change my expectations accordingly.


My suggestion is: if this is an actual problem with guix's software, we 
should open an issue for this, for those who are interested on actually 
trying to improve the situation. If it's not a problem with guix, then 
this conversation is just an exercise of ethical and intellectual 
bragging that is just uninteresting to me and more appropriate for 
social media.


Best,
Ekaitz




Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread MSavoritias

On 3/21/24 17:23, Hartmut Goebel wrote:


Am 21.03.24 um 07:12 schrieb MSavoritias:
Specifically the social rules that we support trans people and we 
want to include them. Any person really that want to change their 
name at some point for some reason. 


Interestingly you are asking the right to get the old name rewritten 
for trans people only.


To be frank: IMHO This is a quiet egocentric point of view.

In many cultures all over the world women are required to change their 
name when they merry. And you are not asking for women's right. But 
only for right for the small but loud minority of trans people.



What are you implying with the "loud" minority here?


MSavoritias




Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Hartmut Goebel

Am 21.03.24 um 07:12 schrieb MSavoritias:
Specifically the social rules that we support trans people and we want 
to include them. Any person really that want to change their name at 
some point for some reason. 


Interestingly you are asking the right to get the old name rewritten for 
trans people only.


To be frank: IMHO This is a quiet egocentric point of view.

In many cultures all over the world women are required to change their 
name when they merry. And you are not asking for women's right. But only 
for right for the small but loud minority of trans people.


--
Regards
Hartmut Goebel

| Hartmut Goebel  | h.goe...@crazy-compilers.com   |
| www.crazy-compilers.com | compilers which you thought are impossible |




Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread MSavoritias



On 3/21/24 17:08, Giovanni Biscuolo wrote:

Hello pinoaffe,

pinoaffe  writes:

[...]


I think we, as Guix,
- should examine if/how it is currently feasible to rewrite our git
history,

it's not, see also:
https://guix.gnu.org/en/blog/2020/securing-updates/


- should examine possible workarounds going forward,
- should move towards something like UUIDs and petnames in the long run.

(see https://spritelyproject.org/news/petname-systems.html).

I don't understand how using petnames, uuids or even a re:claimID
identity (see below) could solve the problem with "rewriting history" in
case a person wishes to change his or her previous _published_ name
(petname, uuid...) in an archived content-addressable storage system.


It doesnt solve the problem of rewriting history. It solves the bug of 
having names part of the git history.


see also https://gitlab.com/gitlab-org/gitlab/-/issues/20960 for Gitlab 
doing the same thing.



MSavoritias



As a side note, other than the "petname system" please also consider
re:claimID from GNUnet:
https://www.gnunet.org/en/reclaim/index.html
https://www.gnunet.org/en/reclaim/motivation.html

[...]

Regards, Giovanni.


[1] https://guix.gnu.org/en/blog/2020/securing-updates/






Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Giovanni Biscuolo
Hello pinoaffe,

pinoaffe  writes:

[...]

> I think we, as Guix,
> - should examine if/how it is currently feasible to rewrite our git
> history,

it's not, see also:
https://guix.gnu.org/en/blog/2020/securing-updates/

> - should examine possible workarounds going forward,
> - should move towards something like UUIDs and petnames in the long run.
>
> (see https://spritelyproject.org/news/petname-systems.html).

I don't understand how using petnames, uuids or even a re:claimID
identity (see below) could solve the problem with "rewriting history" in
case a person wishes to change his or her previous _published_ name
(petname, uuid...) in an archived content-addressable storage system.

As a side note, other than the "petname system" please also consider
re:claimID from GNUnet:
https://www.gnunet.org/en/reclaim/index.html
https://www.gnunet.org/en/reclaim/motivation.html

[...]

Regards, Giovanni.


[1] https://guix.gnu.org/en/blog/2020/securing-updates/


-- 
Giovanni Biscuolo

Xelera IT Infrastructures


signature.asc
Description: PGP signature


Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread pinoaffe
Hi!

MSavoritias  writes:

> On 3/20/24 19:22, Giovanni Biscuolo wrote:
>> Disclaimer: I've still not read all the relevant threads [3] [4], so
>> please forgive me if I repeat some information already provided.
>>
>> What rights are we talking about?
>
> You are making the same misconception as some other people in the
> thread here.
>
> We are talking about social rules that we have here in the Guix
> community not legal/state rules.

Arborelia is clearly talking about legal/state rules in part of her
blogposts.  You can argue that the state rules aren't relevant here
(IMO, Giovanni's observations support this argument), but it's not a
"misconception" to think that the current discussion is at least
partially about the legal aspects.

> Specifically the social rules that we support trans people and we want
> to include them. Any person really that want to change their name at
> some point for some reason.
>
> To that end we listen to their concerns/wishes and we accommodate
> them.

I agree that we should listen to peoples concerns/wishes and accommodate
them out of basic respect, but we can only accomodate people's wishes
when those wishes fall within what is technologically feasible and reasonable.

When a person publishes books under a certain identity, it is not
feasible for *every* mention in every copy to retroactively be updated
to reflect a new name.  In a similar manner, it is (currently) not
always feasible to rewrite git history to change historic names.

I think we, as Guix,
- should examine if/how it is currently feasible to rewrite our git history,
- should examine possible workarounds going forward,
- should move towards something like UUIDs and petnames in the long run.

(see https://spritelyproject.org/news/petname-systems.html).

>> As a *free software* user do I have the right to redistribute /old/
>> copies of the source code and documentation I got in the past from the
>> copyright holder, in any form (e.g. print)?... or to use old sources or
>> documentation to develop derived work, with _attribution_, without
>> asking for consent from the original authors and/or contact the original
>> authors to ask them what is their current name?
>
> Copyright is not consent. When we are talking about consent we are
> talking about it in social rules.
>
> See also
> https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf
> as a nice paper for consent in tech.
>
>> If yes, I would like to exercise all my rights without being harassed.
>
> Again this has nothing to do with rights granted by states. This is
> about including people and making them feel safe and respected.

I fully agree with you here, rights such as the right to free speech and
copyleft don't mean that any action that falls within those rights
should be free of consequences, especially when such an action excludes
others, disrespects them or makes them feel unsafe.

>> [...]

kind regards,
pinoaffe



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread pelzflorian (Florian Pelz)
Hello all.  I object to this argument:

MSavoritias  writes:
> We are talking about social rules that we have here in the Guix
> community not legal/state rules.

No, legal rules come from deliberation of social arguments.

CoC-wise, it seems to me that SWH was unfriendly and this is important
to Guix.

But SWH’s legal arguments are also social arguments and cannot be
dismissed.  I do not know if SWH really is an archive in the sense of
the law, but certainly we are facing a trade-off.

It would be nice if Guix could handle harmless deletion or
rectifications.  Whether that is possible shapes laws.  I believe it is
possible, but “show me how” is a valid response.

Regards,
Florian



Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread Attila Lendvai
> We are talking about social rules that we have here in the Guix
> community not legal/state rules.


ethics, i.e. the discussion of rights, is a branch of philosophy.

ideally, it should inform the people who are writing and enforcing state laws, 
but these days -- sadly -- it has precious little to do with state laws. and i 
think you're the one here who conflates the two.


> Specifically the social rules that we support trans people and we want
> to include them. Any person really that want to change their name at
> some point for some reason.
>
> To that end we listen to their concerns/wishes and we accommodate them.


i've asked you this before, and i'll keep asking it: sure, accommodate, but to 
what extent? what is a reasonable cost i can incur on others? (see the 
discussion of negative vs. positive rights in this context)

what if i declare that i only feel accommodated here if everyone attaches the 
local weather forcast to each mail they send to guix-devel?

the limit of your demands begins where it starts to constrain the freedom of 
others. considering this is an essential part of respectful behavior towards 
others.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“I am not what happened to me, I am what I choose to become.”
— Carl Jung (1875–1961)




Re: the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-21 Thread MSavoritias

On 3/20/24 19:22, Giovanni Biscuolo wrote:


Hello Ludovic and Guix devel community!

Disclaimer: I've still not read all the relevant threads [3] [4], so
please forgive me if I repeat some information already provided.

What rights are we talking about?


You are making the same misconception as some other people in the thread 
here.


We are talking about social rules that we have here in the Guix 
community not legal/state rules.



Specifically the social rules that we support trans people and we want 
to include them. Any person really that want to change their name at 
some point for some reason.


To that end we listen to their concerns/wishes and we accommodate them.



As a *free software* user do I have the right to redistribute /old/
copies of the source code and documentation I got in the past from the
copyright holder, in any form (e.g. print)?... or to use old sources or
documentation to develop derived work, with _attribution_, without
asking for consent from the original authors and/or contact the original
authors to ask them what is their current name?


Copyright is not consent. When we are talking about consent we are 
talking about it in social rules.


See also 
https://www.consentfultech.io/wp-content/uploads/2019/10/Building-Consentful-Tech.pdf 
as a nice paper for consent in tech.



If yes, I would like to exercise all my rights without being harassed.


Again this has nothing to do with rights granted by states. This is 
about including people and making them feel safe and respected.



MSavoritias



Also, SHW and other organizations (re)distributing free software have
their rights and should excercise them without being harassed.

Ludovic Courtès  writes:

[...]





the right to rewrite history to rectify the past (was Re: Concerns/questions around Software Heritage Archive)

2024-03-20 Thread Giovanni Biscuolo
Hello Ludovic and Guix devel community!

Disclaimer: I've still not read all the relevant threads [3] [4], so
please forgive me if I repeat some information already provided.

What rights are we talking about?

As a *free software* user do I have the right to redistribute /old/
copies of the source code and documentation I got in the past from the
copyright holder, in any form (e.g. print)?... or to use old sources or
documentation to develop derived work, with _attribution_, without
asking for consent from the original authors and/or contact the original
authors to ask them what is their current name?

If yes, I would like to exercise all my rights without being harassed.

Also, SHW and other organizations (re)distributing free software have
their rights and should excercise them without being harassed.

Ludovic Courtès  writes:

[...]

>> I was also distressed to see how poorly they treated a developer who
>> wished to update their name:

[1] https://cohost.org/arborelia/post/4968198-the-software-heritag

[2] https://cohost.org/arborelia/post/5052044-the-software-heritag

> That’s another concern, with append-only storage in general, starting
> with Git.  We should look for solutions that work for both contributors
> who change names and for users.  This has happened several times in Guix
> and what people did was search/replace their name and adjust
> ‘.mailmap’.

This is a good solution but unfortunately this is not what the author of
the blog posts above [1] [2] and some people in this and other threads
[3] [4] are asking SWH - and Guix and potentially all other people
distributing copies of copyrighted works (e.g. documentation) - to do.

They are asking to "rewrite history" [1] (of git... why not of other
archives?):

--8<---cut here---start->8---

I already fixed my name in my code. I updated the README and the
copyright notice, and I ran git-filter-repo to rewrite the git history
so it had always said my correct name, including in commits. This is a
thing you can do.

--8<---cut here---end--->8---

The author explicitely invokes the "right to rectification" (of the
GDPR) [2]:

--8<---cut here---start->8---

I give zero shits about the integrity of their data structures. I had
already sent them a second email invoking the Right to Rectification,
which it seemed like they ignored again, so it was time to get more
formal.

[...] En application de l’article 21.1 du Règlement général sur la
protection des données (RGPD), je m’oppose au traitement de mes données
à caractère personnel par votre organisme, l’archive Software Héritage.

[...] Dès lors, vous voudrez bien : 

* supprimer mes données de vos fichiers et notifier ma demande aux
 organismes auxquels vous les auriez communiquées (articles 17.1.c. et
 19 du RGPD) ;

* si vous en avez l’obligation légale, m’indiquer la durée de
 conservation de mes données dans vos bases archives ;

* m'informer de ces éléments dans les meilleurs délais et au plus tard
 dans un délai d’un mois à compter de la réception de ce courrier
 (article 12.3 du RGPD).

--8<---cut here---end--->8---

People asking to rectify informaiton /they/ _published_ on their own are
obviously misinterpreting the relevant section of the GDPR (more on this
later)... and in fact, the SHW DPO reply is [2]:

--8<---cut here---start->8---

Unfortunately, the deletion or modification of the software repositories
you requested cannot be performed, for several reasons:

* On the one hand, these developments involve several authors and are
 made available under open source licenses, which explicitly allow
 copying and redistribution

* On the other hand, the mission of Software Heritage archive is to
 guarantee the availability of all versions of all publicly available
 source codes, and to ensure the integrity of these codes

We understand the concern about the display of outdated identities, and
for this reason a mechanism has been put in place to display a preferred
identity across all the Software Heritage archive.

--8<---cut here---end--->8---

But the authos is still not satisfied with the solution proposed by SHW
(and used by Guix for it's contributors):

--8<---cut here---start->8---

* I was not asking them to develop such a mechanism. I don't just want
 them to cosmetically change what they display, I want them to change
 the data. I can't trust the organization that contains the transphobe
 who had written their previous content policy to hold on to a
 substitution rule involving my deadname forever.

--8<---cut here---end--->8---

«I want them to change the data», that is: rewrite history (of /all/ the
copies of the repository archived by SWH, **fork** included?)

The CNIL (the french data regulator) 

contributor uuid (was Re: Concerns/questions around Software Heritage Archive)

2024-03-20 Thread bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6
paul  writes:

[...]

> If we'd really need to identify contributors, and obviously Guix 
> doesn't, we could use an UUID/machine readable identifier which can then 
> be mapped to a displayed name. I believe git can already be configured 
> to do so.

every contributor wishing to do so can already choose to use the
preferred uuid/email metadata they wish and ask some person with commit
access to add a uuid/display-name mapping via git .mailmap

unfortunately this does not resolve the problem with rewriting history
with git, because Guix artifacts also contains source code that usually
contains information about the author, including names that potentially
could become "deadnames" in the future

happy hacking!

--
bae66428a8ad58eafaa98cb0ab2e512f045974ecf4bf947e32096fae574d99c6


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-19 Thread Ian Eure



Simon Tournier  writes:


Hi,

On lun., 18 mars 2024 at 12:38, Ian Eure  
wrote:


They appear to be violating free software licenses on large 
scale. 
They are in violation of SWH’s own positions.


[...]


[1]: https://arxiv.org/html/2402.19173v1
[2]: 
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground

[3]: https://huggingface.co/datasets/bigcode/the-stack-v2
[4]: https://github.com/bigcode-project/opt-out-v2/issues


Please note that Software Heritage folks are not co-author of 
all that;
or I misread.  Do not take me wrong, this is not an attempt to 
escape

but a query for waiting the feedback of SWH.



Shit rolls downhill.  It’s the least surprising thing in the world 
to find that an "AI" company is violating licenses, because the 
entire technology is based on infringement at a massive scale. 
SWH’s partnership with, and promotion of, both the company and its 
license-violating model, in violation of their *own stated 
principles*, raises very legitimate questions.


There are multpile overlapping concerns here; personal, 
organizational, legal, ethical, and technical.


From a personal, legal standpoint, HuggingFace is almost certainly 
in violation of my code’s licenses.  I will, therefore, work to 
remove my code from their models.  From a personal, ethical 
standpoint, I believe that SWH has proven themselves untrustworthy 
by enabling *and promoting* this infringement in violation of 
their own stated policies, and will work to remove my code from 
their archive.  Personally, I cannot extend them the benefit of 
the doubt on this.  They blew it.


From an organizational ethical standpoint, Guix is IMO on the 
right track by waiting on SWH (and perhaps pressuring them to fix 
things).  From an organizational, technical perspective, I would 
like to see concrete measures to support my (and hundreds of 
others’) personal, ethical desires to exclude software from SWH, 
and by extension, HuggingFace’s models.



As Ludo said, SWH folks are, by the way, also long time Free 
Software

activists.



In my view, this is not to their credit.  I’d expect people 
familiar with Free Software to be *more* sensitive to licensing 
concerns, thus less likely to partner with a company likely to 
violate them.



PS: Thanks for the detailed explanations.  I will provide my 
reading

later, after some concerns will be separated, eventually.


You’re very welcome.

Thanks,

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-19 Thread Simon Tournier
Hi,

On lun., 18 mars 2024 at 12:38, Ian Eure  wrote:

> They appear to be violating free software licenses on large scale. 
> They are in violation of SWH’s own positions.

[...]

> [1]: https://arxiv.org/html/2402.19173v1
> [2]: 
> https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground
> [3]: https://huggingface.co/datasets/bigcode/the-stack-v2
> [4]: https://github.com/bigcode-project/opt-out-v2/issues

Please note that Software Heritage folks are not co-author of all that;
or I misread.  Do not take me wrong, this is not an attempt to escape
but a query for waiting the feedback of SWH.

As Ludo said, SWH folks are, by the way, also long time Free Software
activists.  For the record, the quality of 10 Years of Guix [1] videos
is the result of tireless work (for free!) by a Debian video team member
(also working for SWH) and one of SWH co-founder had been Debian project
leader.  Let the benefit of the doubt while waiting.

1: https://10years.guix.gnu.org

Cheers,
simon

PS: Thanks for the detailed explanations.  I will provide my reading
later, after some concerns will be separated, eventually.



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-19 Thread Attila Lendvai
> not an expert in guix internals) the only reason we care about
> identity is that it's part of git commits.


identities are deeply intertwined with trust (our best predictor of future 
behavior is past behavior). and how trust is facilitated by the tools and 
processes (including the social "technology") can make or break any group 
effort.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“The direct use of physical force is so poor a solution to the problem of 
limited resources that it is commonly employed only by small children and great 
nations.”
— David D. Friedman (1945–), 'The Machinery of Freedom' (1973)




Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Ludovic Courtès
Hello,

Ian Eure  skribis:

> HuggingFace and the StarCoder2 model is in violation of principle 2.
> By their own admission, they are including code without clear
> licensing[1]:

[...]

> HuggingFace is also in violation of the third principle, because they
> haven’t established a functioning opt-out model[3].  Opting out
> requires using non-free software; requests have been sitting for
> nearly a year with no action or response; and out of every request
> submitted, only a single one has *ever* been honored.
>
> They appear to be violating free software licenses on large
> scale. They are in violation of SWH’s own positions.

You may be right, but again, I think we should all wait for SWH folks to
weigh in.

Many people working there are long-time free software activists; I think
we can trust them to take our concerns into consideration, but they may
also need more time to reply thoughtfully.

Besides, we should probably focus the discussion on what it means for Guix.

Ludo’.



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Tomas Volf
On 2024-03-18 12:08:48 +, Daniel Littlewood wrote:
> Hi everyone,
>
> I think the discussion so far splits into "should something be done"
> and "what can be done". The "should something be done" is easier to
> address, I think, so I'll deal with it first. I particularly have
> Attila's reply in mind.
>
> > let's put aside the trans aspect of this question for a moment,
> > is it reasonable for me to demand from somebody else to change their memory 
> > of my past actions?
> > if so, then where is the line? what's the principle here? and what are its 
> > implications?
> > i sure see some actors out there who can hardly wait to start erasing 
> > certain records at the barrel of the law
>
> I do not doubt that there are bad actors who might misuse the ability
> to rewrite history generally. However, this only allows us to dismiss
> the technical challenge if there is *no* legitimate use case for
> rewriting history, ever, in any circumstance. So rather than removing
> the trans aspect of the question to consider every possible use case
> (good or bad) of rewriting history, it seems like we only need to come
> up with a single case that's sufficient to justify altering someone's
> identity, for it to be worth considering if the technical restriction
> could be avoided. But then the answer is obvious: Someone might just
> sign their commits wrong for whatever reason. Is it valuable for a
> user or for guix generally to preserve metadata in the case where a
> commit is signed incorrectly? Obviously not. So whether you are
> sympathetic to the deadnaming issue or not (personally I am) it seems
> like we can dismiss the question "should we do something about it".

I do not think the situation is as black and white as you put it here.  I
believe the question of "should something be done" needs to be further split
into two sub-branches.  "should something be doable effective from some point in
time" and "should something be doable retro-actively".

For the former, I think most people here would agree that yes, and there already
is a mechanism for that (.mailmap).

For the latter, I do not think you can just "dismiss" it.  While I agree with
you there is a little value in the act of Guix preserving wrong metadata by
itself, any history-modifying operation would have quiet large impact on the
ecosystem, so that needs to be taken into account as well.  And it that light I
would say yes, preserving wrong metadata (when viewed from this angle) does have
a value.

And I say this as a contributor perfectly matching your example of "signed their
commits wrong", which is why you will find me in the .mailmap.

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Olivier Dion
On Mon, 18 Mar 2024, Kaelyn  wrote:
> On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier 
>  wrote:

[...]

>> That’s the double sword of “free software”. :-)
>
> Hi,
>
> I want to stress that I am not a lawyer, but my (possiblibly outdated)
> understanding of what machine learning models can and cannot do with
> regards to their training data, and a reading of parts of the GPL 2
> and 3, suggest that at best the SWH's LLM is in a legal grey area and
> at worst directly violates the license of GPL code that it ingests for
> training. As such, I don't think it is accurate to say "you cannot
> prevent people to use “your” free software for any purposes you
> dislike" in response to concerns about automatic inclusion of free
> software into LLM training sets. Specifically, my understanding (as of
> a few years ago) is that LLMs have difficulty tracing and atttributing
> various aspects of its training to specific inputs, which seems to be
> in violation of of e.g. Sections 5 and 6 of the GPL. Specific quotes
> from those sections https://www.gnu.org/licenses/gpl-3.0.html:

I think that the larger point here is that you do not get to choose who
use your software and for what purpose.  That is the double edges sword
of free software.

Putting aside LLM for a moment, what if some package in Guix is used for
military purpose?  Will this software be removed from Guix because one
of its user uses it in some unethical way, even though it is also used
in an ethical way by others.  Will we penalized users for the sake of
moral high ground?

This raise the question, what is considered ethical and when does ethic
become political dogma?

[...]

-- 
Olivier Dion
oldiob.ca



Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Ian Eure



Simon Tournier  writes:


Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure  
wrote:


They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/


About LLM, Software Heritage made a clear statement:

https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code

Quoting:

We feel that the question is no longer whether LLMs for 
code
should be built. They are already being built, 
independently of
what we do, and there is no turning back.  The real 
question is

how they should be built and whom they should benefit.

Principles:

1. Knowledge derived from the Software Heritage archive 
must be
given back to humanity, rather than monopolized for 
private
gain. The resulting machine learning models must be made 
available
under a suitable open license, together with the 
documentation and

toolings needed to use them.

2. The initial training data extracted from the Software 
Heritage
archive must be fully and precisely identified by, for 
example,
publishing the corresponding SWHID identifiers (note 
that, in the
context of Software Heritage, public availability of the 
initial

training data is a given: anyone can obtain it from the
archive). This will enable use cases such as: studying 
biases
(fairness), verifying if a code of interest was present 
in the
training data (transparency), and providing appropriate 
attribution
when generated code bears resemblance to training data 
(credit),

among others.

3. Mechanisms should be established, where possible, for 
authors to
exclude their archived code from the training inputs 
before model

training begins.

I hope it clarifies your concerns to some extent.



It doesn’t clarify them, but it does illustrate them.

HuggingFace and the StarCoder2 model is in violation of principle 
2.  By their own admission, they are including code without clear 
licensing[1]:


   The main difference between the Stack v2 and the Stack v1 is 
   that we

   include both permissively licensed and unlicensed files.

HuggingFace’s StarChat2 Playground[2] also violates this 
principle, as it outputs code without any license or provenance 
information; I know, because I tried it.  While their own terms of 
use for StarCoder2 state:


   Any use of all or part of the code gathered in The Stack v2 
   must abide by

   the terms of the original licenses...

...their own playground makes this impossible.

HuggingFace is also in violation of the third principle, because 
they haven’t established a functioning opt-out model[3].  Opting 
out requires using non-free software; requests have been sitting 
for nearly a year with no action or response; and out of every 
request submitted, only a single one has *ever* been honored.


They appear to be violating free software licenses on large scale. 
They are in violation of SWH’s own positions.



Moreover, you wrote: « I want absolutely nothing to do with 
them. »


Maybe there is a misunderstanding on your side about what “free
software” and GPL means because once “free software”, you cannot 
prevent

people to use “your” free software for any purposes you dislike.

If you want to bound the use cases of the software you create, 
you need
to explicitly specify that in the license.  And if you do, your 
software

will not be considered as “free software”.

That’s the double sword of “free software”. :-)



I am crystal clear on the meaning of free software.  I wish to 
remove it from these models *in order to* keep it free.


Thanks,

 — Ian

[1]: https://arxiv.org/html/2402.19173v1
[2]: 
https://huggingface.co/spaces/HuggingFaceH4/starchat2-playground

[3]: https://huggingface.co/datasets/bigcode/the-stack-v2
[4]: https://github.com/bigcode-project/opt-out-v2/issues



Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Daniel Littlewood
Hi Kaelyn,

The legal question is unsettled, and there is ongoing litigation by
(at least) Matthew Butterick in the US, since at least 2022. The
reasonable positions I'm aware of are:

1. An LLM (or, more precisely, the set of weights that define it) is
not a derivative work of its training data, for the purposes of
copyright, and thus the license is irrelevant.
2. Producing an LLM from training data is a transformative fair use,
and thus the license is irrelevant.
3. Neither 1 nor 2 holds, and LLMs constitute copyright infringement
on a profound scale (of both copyrighted and copylefted works).

The FSF and CC have both commissioned white papers on the impact of
such considerations for Free works. I don't recall seeing anything
particularly insightful in them. Probably a waste of time to discuss
it here.

Best wishes,
Dan



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Daniel Littlewood
Hi everyone,

I think the discussion so far splits into "should something be done"
and "what can be done". The "should something be done" is easier to
address, I think, so I'll deal with it first. I particularly have
Attila's reply in mind.

> let's put aside the trans aspect of this question for a moment,
> is it reasonable for me to demand from somebody else to change their memory 
> of my past actions?
> if so, then where is the line? what's the principle here? and what are its 
> implications?
> i sure see some actors out there who can hardly wait to start erasing certain 
> records at the barrel of the law

I do not doubt that there are bad actors who might misuse the ability
to rewrite history generally. However, this only allows us to dismiss
the technical challenge if there is *no* legitimate use case for
rewriting history, ever, in any circumstance. So rather than removing
the trans aspect of the question to consider every possible use case
(good or bad) of rewriting history, it seems like we only need to come
up with a single case that's sufficient to justify altering someone's
identity, for it to be worth considering if the technical restriction
could be avoided. But then the answer is obvious: Someone might just
sign their commits wrong for whatever reason. Is it valuable for a
user or for guix generally to preserve metadata in the case where a
commit is signed incorrectly? Obviously not. So whether you are
sympathetic to the deadnaming issue or not (personally I am) it seems
like we can dismiss the question "should we do something about it".

As for what could be done, if I understand the discussion so far (I'm
not an expert in guix internals) the only reason we care about
identity is that it's part of git commits. If that's really all it is,
then I wonder if the following scheme would resolve the issue?

* Start with git repo A, signed with an identity now considered
incorrect for some reason.
* Rewrite history to replace the old signer with the new signer. Make
no other changes to the content of any commit. This produces
repository A'.
* Repository A and A' should have identical numbers of commits, and
identical content of the code at each commit. Therefore we can set up
a one-to-one mapping from the commits of A to the commits of A'.
* Store this mapping of "deprecated commits" (pairs of commit hashes,
pointing from the deprecated commit to its corrected version) in a
database somewhere, and discard repository A.
* Whenever we attempt to look up a commit, if the lookup fails, try to
look in the deprecated commits database. Perhaps emit a warning that
the commit hash is deprecated and should be updated.

Note that point 3 (that the content is identical in each commit) could
be violated. e.g. perhaps there is a "CONTRIBUTORS" file which also
needs to be scrubbed. This would present an algorithmic difficulty (if
we actually tried to verify the code is unchanged) but if a trusted
maintainer of the project is authorising the deprecation, then we
don't actually need to know that the code is unchanged.

Note also that this deprecation mechanism would fix the problem for
simple forks too. e.g. in the case referenced, if someone packaged a
fork of the deadnamed repo, then looking up a commit that was created
pre-fork and included the old identity, then looking up that commit
could notify the user that the repo should be updated.

Does this sound at all sane?

Best wishes,
Dan

On Mon, Mar 18, 2024 at 11:26 AM Simon Tournier
 wrote:
>
> Hi,
>
> On lun., 18 mars 2024 at 12:10, MSavoritias  wrote:
>
> > The right of a trans person to ask a project to not advertise their
> > deadname was never in question.
> >
> > Guix is a place that supports trans people and anybody else that wants
> > to change their name.
>
> There is a difference between “advertise” and “part of the history”.
>
> Do not take me wrong.  The right to be forgotten is one topic.  However,
> as many people are saying: it is not an easy question.  There is legal
> questions, technical questions, social questions, etc.
>
> For what it is worth, Guix is built around the concept of immutability.
> This is a core concept and deep in Guix internals.
>
> Therefore, it would be more constructive if you come with a
> proof-of-concept allowing “history rewrite” and strong “software
> identification” property [1].  Else, the discussion is leading nowhere,
> IMHO.
>
> 1: https://guix.gnu.org/en/blog/2024/identifying-software/
>
> Cheers,
> simon
>



Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Kaelyn
On Monday, March 18th, 2024 at 2:28 AM, Simon Tournier 
 wrote:

> 
> Hi,
> 
> On sam., 16 mars 2024 at 08:52, Ian Eure i...@retrospec.tv wrote:
> 
> > They appear to be using the archive to build LLMs:
> > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> 
> 
> About LLM, Software Heritage made a clear statement:
> 
> https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code
> 
> Quoting:
> 
> We feel that the question is no longer whether LLMs for code
> should be built. They are already being built, independently of
> what we do, and there is no turning back. The real question is
> how they should be built and whom they should benefit.
> 
> Principles:
> 
> 1. Knowledge derived from the Software Heritage archive must be
> given back to humanity, rather than monopolized for private
> gain. The resulting machine learning models must be made available
> under a suitable open license, together with the documentation and
> toolings needed to use them.
> 
> 2. The initial training data extracted from the Software Heritage
> archive must be fully and precisely identified by, for example,
> publishing the corresponding SWHID identifiers (note that, in the
> context of Software Heritage, public availability of the initial
> training data is a given: anyone can obtain it from the
> archive). This will enable use cases such as: studying biases
> (fairness), verifying if a code of interest was present in the
> training data (transparency), and providing appropriate attribution
> when generated code bears resemblance to training data (credit),
> among others.
> 
> 3. Mechanisms should be established, where possible, for authors to
> exclude their archived code from the training inputs before model
> training begins.
> 
> I hope it clarifies your concerns to some extent.
> 
> 
> Moreover, you wrote: « I want absolutely nothing to do with them. »
> 
> Maybe there is a misunderstanding on your side about what “free
> software” and GPL means because once “free software”, you cannot prevent
> people to use “your” free software for any purposes you dislike.
> 
> If you want to bound the use cases of the software you create, you need
> to explicitly specify that in the license. And if you do, your software
> will not be considered as “free software”.
> 
> That’s the double sword of “free software”. :-)

Hi,

I want to stress that I am not a lawyer, but my (possiblibly outdated) 
understanding of what machine learning models can and cannot do with regards to 
their training data, and a reading of parts of the GPL 2 and 3, suggest that at 
best the SWH's LLM is in a legal grey area and at worst directly violates the 
license of GPL code that it ingests for training. As such, I don't think it is 
accurate to say "you cannot prevent people to use “your” free software for any 
purposes you dislike" in response to concerns about automatic inclusion of free 
software into LLM training sets. Specifically, my understanding (as of a few 
years ago) is that LLMs have difficulty tracing and atttributing various 
aspects of its training to specific inputs, which seems to be in violation of 
of e.g. Sections 5 and 6 of the GPL. Specific quotes from those sections 
https://www.gnu.org/licenses/gpl-3.0.html:

>From section 5:
> You may convey a work based on the Program, or the modifications to produce 
> it from the Program, in the form of source code under the terms of section 4, 
> provided that you also meet all of these conditions:
> 
> a) The work must carry prominent notices stating that you modified it, 
> and giving a relevant date.
> b) The work must carry prominent notices stating that it is released 
> under this License and any conditions added under section 7. This requirement 
> modifies the requirement in section 4 to “keep intact all notices”.
> c) You must license the entire work, as a whole, under this License to 
> anyone who comes into possession of a copy. This License will therefore 
> apply, along with any applicable section 7 additional terms, to the whole of 
> the work, and all its parts, regardless of how they are packaged. This 
> License gives no permission to license the work in any other way, but it does 
> not invalidate such permission if you have separately received it.
> d) If the work has interactive user interfaces, each must display 
> Appropriate Legal Notices; however, if the Program has interactive interfaces 
> that do not display Appropriate Legal Notices, your work need not make them 
> do so.

and from Section 6:
> You may convey a covered work in object code form under the terms of sections 
> 4 and 5, provided that you also convey the machine-readable Corresponding 
> Source under the terms of this License, in one of these ways:
> 
> a) Convey the object code in, or embodied in, a physical product 
> (including a physical distribution medium), accompanied by the Corresponding 
> Source fixed on a durable physical medium 

Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias

On 3/18/24 17:14, Andreas Enge wrote:


Am Mon, Mar 18, 2024 at 04:33:49PM +0200 schrieb MSavoritias:

Actually gitlab already is facing something like that and they are doing
what was proposed elsewhere: mapping of UUIDs to display names
https://gitlab.com/gitlab-org/gitlab/-/issues/20960

Interesting, thanks! It is something that maybe could be implemented by
Savannah, but it would probably require a bit of thought. And yet again,
somehow the mapping uuid<->"real" names would have to be public (people
would "git clone" commits with uuids, and would need to locally replace
them by "real" names); so people can always keep copies of the mapping
over time.

Sure. But we can only say about Guix not everything else.

I am also not quite sure about the signing process for committers;
in principle keys are enough, but in GPG they are tied to email addresses,
and I do not know whether we use this in Guix.

I hope we don't because i use ssh to sign commits personally :D


In the end, my impression is this will not achieve much more than what we
already have with the .mailmap approach. In a sense, everyone would use
a pseudonym (their uuid), and then we would keep a mapping between these
pseudonyms and, well, "real" names or other pseudonyms chosen by the
contributors...

Hm, this could indeed be implemented exactly with .mailmap, no?
We would need to enforce that authors use a uuid of a specific format,
and potentially an empty or dummy email address, or another uuid.
Then we could keep a .mailmap file. The history of "real" identities
would still be visible in the git history, but as said above, anyway
we could not prevent people from storing the association information
over time.


Nicknames may change tho. UUIDs are not in any way meaningful to humans 
so i doubt we would need to change them.


I have changed nicknames once for example.


Right fair. As I have said before SWH does break Guix CoC effectively right
now.
So what Guix does from this point on will effectively dictate if the CoC is
valid or not.

Well, the CoC is valid on our communication channels; so what SWH does with
our software is outside its scope (that is governed by the license).

Andreas



My question was more like:


In the next Guix Days or any Guix conference, do we allow SWH to 
participate if this matter is still unresolved?


Because we would be basically inviting people that don't respect the CoC.


MSavoritias




Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Andreas Enge
Am Mon, Mar 18, 2024 at 04:33:49PM +0200 schrieb MSavoritias:
> Actually gitlab already is facing something like that and they are doing
> what was proposed elsewhere: mapping of UUIDs to display names
> https://gitlab.com/gitlab-org/gitlab/-/issues/20960

Interesting, thanks! It is something that maybe could be implemented by
Savannah, but it would probably require a bit of thought. And yet again,
somehow the mapping uuid<->"real" names would have to be public (people
would "git clone" commits with uuids, and would need to locally replace
them by "real" names); so people can always keep copies of the mapping
over time.

I am also not quite sure about the signing process for committers;
in principle keys are enough, but in GPG they are tied to email addresses,
and I do not know whether we use this in Guix.

In the end, my impression is this will not achieve much more than what we
already have with the .mailmap approach. In a sense, everyone would use
a pseudonym (their uuid), and then we would keep a mapping between these
pseudonyms and, well, "real" names or other pseudonyms chosen by the
contributors...

Hm, this could indeed be implemented exactly with .mailmap, no?
We would need to enforce that authors use a uuid of a specific format,
and potentially an empty or dummy email address, or another uuid.
Then we could keep a .mailmap file. The history of "real" identities
would still be visible in the git history, but as said above, anyway
we could not prevent people from storing the association information
over time.

> Right fair. As I have said before SWH does break Guix CoC effectively right
> now.
> So what Guix does from this point on will effectively dictate if the CoC is
> valid or not.

Well, the CoC is valid on our communication channels; so what SWH does with
our software is outside its scope (that is governed by the license).

Andreas




Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread pinoaffe


Lars-Dominik Braun  writes:
>> I have heard folks in the Guix maintenance sphere claim that we
>> never rewrite git history in Guix, as a matter of policy. I believe we
>> should revisit that policy (is it actually written anywhere?) with an
>> eye towards possible exceptions, and develop a mechanism for securely
>> maintaining continuity of Guix installations after history has been
>> rewritten so that we maintain this as a technical possibility in the
>> future, even if we should choose to use it sparingly.
>
> the fallout of rewriting Guix’ git history would be devastating. It
> would break every single Guix installation, because
>
> a) `guix pull` authenticates commits and we might lose our trust anchor
> if we rewrite history earlier than the introduction of this feature,
> b) `guix pull` outright rejects changes to the commit history to prevent
> downgrade attacks.
>
> Additionally it would break every single existing usage of the
> time machine and thereby completely defeat the goal of providing
> reproducible software environments since the commit hash is used to
> identify the point in time to jump to.
>
> I doubt developing “mechanisms” – whatever they look like – would
> be worth the effort. Our contributors matter, but so do our users. Never
> ever rewriting our git history is a tradeoff we should make for our users.

There may come a time where we don't really have another option but to
rewrite (part of) history (e.g., if someone vandalizes the repository
using incriminating/illegal files) - I hope that such vandalism would be
caught quickly so that most guix installations would not be infected,
but it may be a good idea to plan what to do in the unfortunte event that
it is necessary to rewrite guix history




Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias

On 3/18/24 16:19, Andreas Enge wrote:


Am Mon, Mar 18, 2024 at 04:03:20PM +0200 schrieb MSavoritias:

Rewriting history is the wrong question imo. I dont think a request to
change all of the history of Guix will be accepted anyway.
A much easier thing to do is to change the approach in the future. And let
all the past history untouched.

I was well thinking about the future history as well as the past one...
Everything we do now becomes unmutable history in the future; so the
question how we can rewrite an a priori unmutable history remains the same,
regardless of the date when person X wants to be known as person Y: Also in
the future, someone may wish to travel to a time before the change.
And the fundamental problem of history rewriting remains; I do not see
how we could simplify it. So I do not think that it is "a much easier
thing to do". Please feel free to prove me wrong by making a concrete
suggestion!


Actually gitlab already is facing something like that and they are doing 
what was proposed elsewhere: mapping of UUIDs to display names


https://gitlab.com/gitlab-org/gitlab/-/issues/20960


So no reason we couldn't do something like this.



Am Mon, Mar 18, 2024 at 04:00:38PM +0200 schrieb MSavoritias:

On 3/18/24 15:12, Simon Tournier wrote:

Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
things you granted them to do.  SWH respects the “ethical” definition of
“free software”.

You are bringing the legal argument again. The argument that you can do what
you want with Free Software is based around a licence which is a legal
construct of states.

I think there is a misunderstanding here, rooted in the use of "you" in
"you can do what you want". We need to be clear about whom we are speaking.
There is SWH, and what they can do is a result of the free license. The
other question is what we as the Guix community want to do (and can do);
I would suggest to concentrate in our discussion on the latter, which is
where we have agency.

Andreas


Right fair. As I have said before SWH does break Guix CoC effectively 
right now.


So what Guix does from this point on will effectively dictate if the CoC 
is valid or not.



MSavoritias







Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Simon Tournier
Hi MSavoritias,

On lun., 18 mars 2024 at 16:00, MSavoritias  wrote:

> I think you have misunderstood that here we are talking about

> I think you have misunderstood that here we are talking about

What if? Maybe it’s you.  Maybe you, “you have misunderstood that here
we are talking about […]”.

For what my opinion is worth here, I would prefer that you do not assume
on what I might have understood.  Similarly, I am not assuming anything
about your understanding of the various topics at hand.

That’s my last message in this thread.

Cheers,
simon



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Andreas Enge
Am Mon, Mar 18, 2024 at 04:03:20PM +0200 schrieb MSavoritias:
> Rewriting history is the wrong question imo. I dont think a request to
> change all of the history of Guix will be accepted anyway.
> A much easier thing to do is to change the approach in the future. And let
> all the past history untouched.

I was well thinking about the future history as well as the past one...
Everything we do now becomes unmutable history in the future; so the
question how we can rewrite an a priori unmutable history remains the same,
regardless of the date when person X wants to be known as person Y: Also in
the future, someone may wish to travel to a time before the change.
And the fundamental problem of history rewriting remains; I do not see
how we could simplify it. So I do not think that it is "a much easier
thing to do". Please feel free to prove me wrong by making a concrete
suggestion!

Am Mon, Mar 18, 2024 at 04:00:38PM +0200 schrieb MSavoritias:
> On 3/18/24 15:12, Simon Tournier wrote:
> > Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
> > things you granted them to do.  SWH respects the “ethical” definition of
> > “free software”.
> You are bringing the legal argument again. The argument that you can do what
> you want with Free Software is based around a licence which is a legal
> construct of states.

I think there is a misunderstanding here, rooted in the use of "you" in
"you can do what you want". We need to be clear about whom we are speaking.
There is SWH, and what they can do is a result of the free license. The
other question is what we as the Guix community want to do (and can do);
I would suggest to concentrate in our discussion on the latter, which is
where we have agency.

Andreas




Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias

On 3/18/24 15:35, Andreas Enge wrote:


Hello all,

Am Mon, Mar 18, 2024 at 12:26:18PM +0100 schrieb Simon Tournier:

Therefore, it would be more constructive if you come with a
proof-of-concept allowing “history rewrite” and strong “software
identification” property

the one thing I can think of, and which would allow time travel to coexist
with history rewriting, is an additional layer of metainformation.

First of all, when rewriting history, all commits from the bifurcation
to an alternate universe must be signed again by the person doing the
"time split"; so there is a loss of information there.

Second, we need to create a table that associates every old, lost commit
hash to the corresponding new commit hash; this should also be signed by
the person rewriting history.

Of course this will have to be continued to the future: If Guix has n
commits and m history rewrites, then the m-th rewrite may have to create
a table of n entries that link commit hashes of the m-th rewrite to those
of the (m-1)-th rewrite. Total memory would become m*n entries.

When doing time travel to a commit hash, one would need to check whether
it is available in the current, m-th history rewrite; if not, one would
need to look for it in the (m-1)-th rewrite and map it to a commit hash
in the m-th rewrite; if not, one would have to look for it in the (m-2)-th
rewrite and map it to a hash in the (m-1)-th rewrite, and then check
whether or not it has been overwritten in the m-th rewrite. The total
time complexity would be m look-ups in tables of size n each.


It is a lot of effort; and probably for little gain, since we cannot
eradicate each and every fork of the Guix git repo. The old data will
still be available at SWH, and probably at random forks on lots of random
forges all over the world. As Simon, I think that history, fundamentally,
cannot be rewritten: What has happened in the past, has happened in the
past. If you have done some public activity as the person known as X, and
then change your name to Y, you cannot prevent your past activity to be
known under identity X. Also, the time split would have to be publicly
documented somehow; if we add as rationale for a history rewrite "person X
is now known as Y", not much is gained compared to just keeping the old
commits. Not documenting the rationales for history rewrites would not help
to instill trust in the codebase, and probably not solve the problem either,
since it is quite likely that the request by person X to now be addressed
as Y will have been made on the mailing list or some other public forum.

So my impression is that the .mailmap approach in the Guix project is a
good compromise between acknowledging the wish of people to be known under
identity Y, and what can reasonably be achieved to hide identity X.

Well, there are things people can do individually:
1) Use a pseudonym P from the start instead of X (which is admitted in
the Guix community, just look at a few of the names: there are pseudo-
nyms with clearly made-up first and last names, there are very obvious
one-word pseudonyms, and maybe some of the names that look like real
names are not from the persons' passports, who would know).
2) This does not help, of course, if you are already known as X and want
to be known as Y. Then either you can somehow make the change publicly,
and transfer your reputation and also the information that you used
to be known as X, or disappear as X and reappear as a new person Y
and lose X's reputation. Doing both is impossible, I would say.

Andreas

Rewriting history is the wrong question imo. I dont think a request to 
change all of the history of Guix will be accepted anyway.


A much easier thing to do is to change the approach in the future. And 
let all the past history untouched.



MSavoritias




Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias



On 3/18/24 15:12, Simon Tournier wrote:

Hi MSavoritias,

On lun., 18 mars 2024 at 13:47, MSavoritias  wrote:



As advice for the future when somebody says a concern or wish they have,
your first statement shouldn't be "but its legal" because that
completely dismisses any constructive discussion that could be done.

Again, I am not arguing about “legal” something.  Instead, I am pointing
that this wish does not match the principles of “free software”.

If you accept that the software you create is “free software” then you
cannot complain if this “free software” is used in some contexts that
you consider unethical.

That’s the double sword of “free software”.

Do I consider LLMs as something unethical?  I think yes: most AI appears
to me unethical but that’s another story (rooting my arguments in
arguments about energy [2,3,4]).

2: https://social.sciences.re/@zimoun/112082437445032973
3: https://social.sciences.re/@zimoun/112039562095800532
4: https://social.sciences.re/@zimoun/112038609631116527

Yes you are. The argument that you can do what you want with Free 
Software is based around a licence which is a legal construct of states.


I think you have misunderstood that here we are talking about the social 
rules of being a decent group of human beings and respect somebody 
else's wishes.



What is in question here is whether Software Heritage respects people
enough to do the right thing and respect their wishes without getting
lawyers/legal involved.

Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
things you granted them to do.  SWH respects the “ethical” definition of
“free software”.


You are bringing the legal argument again. The argument that you can do 
what you want with Free Software is based around a licence which is a 
legal construct of states.


I think you have misunderstood that here we are talking about the social 
rules of being a decent group of human beings and respect somebody 
else's wishes.


In this case somebody asks for something so if SFH is a good member of 
our community they should do that. Otherwise they are not a good member 
of our community.





Besides with the way you are framing Free Software as not respecting any
social rules then that makes Free Software not attractive which is the
opposite of what we are trying to do here :)

I do not know what are the “social rules” of “free software”.  At best,
I understand the social rules of a community working on free software.

And this community is far to be an homogeneous whole with clear social
rules.  These social rules vary and the only shared denominator is the
“free software” principles defined by four freedoms.


Guix has a CoC that's the common thing we have here. For social things 
that is. Plus some cultural things of course.



MSavoritias





Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Andreas Enge
Hello all,

Am Mon, Mar 18, 2024 at 12:26:18PM +0100 schrieb Simon Tournier:
> Therefore, it would be more constructive if you come with a
> proof-of-concept allowing “history rewrite” and strong “software
> identification” property

the one thing I can think of, and which would allow time travel to coexist
with history rewriting, is an additional layer of metainformation.

First of all, when rewriting history, all commits from the bifurcation
to an alternate universe must be signed again by the person doing the
"time split"; so there is a loss of information there.

Second, we need to create a table that associates every old, lost commit
hash to the corresponding new commit hash; this should also be signed by
the person rewriting history.

Of course this will have to be continued to the future: If Guix has n
commits and m history rewrites, then the m-th rewrite may have to create
a table of n entries that link commit hashes of the m-th rewrite to those
of the (m-1)-th rewrite. Total memory would become m*n entries.

When doing time travel to a commit hash, one would need to check whether
it is available in the current, m-th history rewrite; if not, one would
need to look for it in the (m-1)-th rewrite and map it to a commit hash
in the m-th rewrite; if not, one would have to look for it in the (m-2)-th
rewrite and map it to a hash in the (m-1)-th rewrite, and then check
whether or not it has been overwritten in the m-th rewrite. The total
time complexity would be m look-ups in tables of size n each.


It is a lot of effort; and probably for little gain, since we cannot
eradicate each and every fork of the Guix git repo. The old data will
still be available at SWH, and probably at random forks on lots of random
forges all over the world. As Simon, I think that history, fundamentally,
cannot be rewritten: What has happened in the past, has happened in the
past. If you have done some public activity as the person known as X, and
then change your name to Y, you cannot prevent your past activity to be
known under identity X. Also, the time split would have to be publicly
documented somehow; if we add as rationale for a history rewrite "person X
is now known as Y", not much is gained compared to just keeping the old
commits. Not documenting the rationales for history rewrites would not help
to instill trust in the codebase, and probably not solve the problem either,
since it is quite likely that the request by person X to now be addressed
as Y will have been made on the mailing list or some other public forum.

So my impression is that the .mailmap approach in the Guix project is a
good compromise between acknowledging the wish of people to be known under
identity Y, and what can reasonably be achieved to hide identity X.

Well, there are things people can do individually:
1) Use a pseudonym P from the start instead of X (which is admitted in
   the Guix community, just look at a few of the names: there are pseudo-
   nyms with clearly made-up first and last names, there are very obvious
   one-word pseudonyms, and maybe some of the names that look like real
   names are not from the persons' passports, who would know).
2) This does not help, of course, if you are already known as X and want
   to be known as Y. Then either you can somehow make the change publicly,
   and transfer your reputation and also the information that you used
   to be known as X, or disappear as X and reappear as a new person Y
   and lose X's reputation. Doing both is impossible, I would say.

Andreas




Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Simon Tournier
Hi MSavoritias,

On lun., 18 mars 2024 at 13:47, MSavoritias  wrote:

> 1.
>
> You seem to be misunderstanding the statement here that was said.
>
> What you can do legally and what you can do socially are not always the 
> same thing.

I do not read where I wrote something like that but anyway.

A program is free software if the program's users have the four
essential freedoms: [1]

  0. The freedom to run the program as you wish, for any purpose.
  1. The freedom to study how the program works, and change it so it does
 your computing as you wish. Access to the source code is a precondition
 for this. 
  2. The freedom to redistribute copies so you can help others.
  3. The freedom to distribute copies of your modified versions to
 others. By doing this you can give the whole community a chance to
 benefit from your changes. Access to the source code is a precondition
 for this.

All is about the philosophy of “free software”.

1: https://www.gnu.org/philosophy/free-sw.en.html


> As advice for the future when somebody says a concern or wish they have, 
> your first statement shouldn't be "but its legal" because that 
> completely dismisses any constructive discussion that could be done.

Again, I am not arguing about “legal” something.  Instead, I am pointing
that this wish does not match the principles of “free software”.

If you accept that the software you create is “free software” then you
cannot complain if this “free software” is used in some contexts that
you consider unethical.

That’s the double sword of “free software”.

Do I consider LLMs as something unethical?  I think yes: most AI appears
to me unethical but that’s another story (rooting my arguments in
arguments about energy [2,3,4]).

2: https://social.sciences.re/@zimoun/112082437445032973
3: https://social.sciences.re/@zimoun/112039562095800532
4: https://social.sciences.re/@zimoun/112038609631116527


> What is in question here is whether Software Heritage respects people 
> enough to do the right thing and respect their wishes without getting 
> lawyers/legal involved.

Again, this is an incorrect frame, IMHO.  Software Heritage (SWH) do the
things you granted them to do.  SWH respects the “ethical” definition of
“free software”.

Again, do I think that feeding LLM after publishing a statement for LLM
code is a good move?  I do not know…  Does it break my ethical values?
Maybe…  Can I complain about my contributions to “free software” reused
in a way that I might consider unethical?  No.

5: https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
6: https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code/


> Besides with the way you are framing Free Software as not respecting any 
> social rules then that makes Free Software not attractive which is the 
> opposite of what we are trying to do here :)

I do not know what are the “social rules” of “free software”.  At best,
I understand the social rules of a community working on free software.

And this community is far to be an homogeneous whole with clear social
rules.  These social rules vary and the only shared denominator is the
“free software” principles defined by four freedoms.

The only question might be: by allowing ingested source code to be used
to train LLM, is Software Heritage aligned with the values that the Guix
community promote?

To be honest, I cannot answer to that question in a hurry.


> 2.
>
>  > Somehow, a Content-Addressed system is designed around immutable 
> > content. And if one know how to implement a Content-Addressed system 
> > relying on mutable content, I would be very interested to know more 
> > about it.
>
> Please refrain from doing such remarks. Nobody here suggested anything 
> that you mention here and you effectively devalue the discussion by 
> arguing like this and frame other people as stupid.

I will not refrain to say: Talk is cheap!

Positions about the situation with “rewrite history” cannot be a
discussion about opinions but it needs to be rooted in how it
technically works and what does it mean Content-addressed system.


> 3.
>
> You may disagree with this sure, but shutting down the discussion 
> because nobody wrote the code for you is very elitist of you.

We are speaking about which discussion because I am lost.  About LLM or
about “rewrite history”?

About LLM, see point #1.

About “rewrite history”, see point #2


> 4.
>
>  > This language is not acceptable on Guix channel of communication.
>
> Calling out transphobia it is very much accepted here actually :)

No it is not.  Because it is a bold conclusion.

I am asking that the Guix project rewrite right now its history:
changing my identity ’zimoun’ to my identity ’Simon Tournier’.  Since
the Guix project will take the time to check, then I will claim: the
Guix project is French-phobic!

I ask you again to stop such language.  I respect your opinion but name
calling is not welcoming on Guix channels of communication.

Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias

On 3/18/24 11:28, Simon Tournier wrote:


Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure  wrote:


They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

About LLM, Software Heritage made a clear statement:

 https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code

Quoting:

 We feel that the question is no longer whether LLMs for code
 should be built. They are already being built, independently of
 what we do, and there is no turning back.  The real question is
 how they should be built and whom they should benefit.

Principles:

 1. Knowledge derived from the Software Heritage archive must be
 given back to humanity, rather than monopolized for private
 gain. The resulting machine learning models must be made available
 under a suitable open license, together with the documentation and
 toolings needed to use them.

 2. The initial training data extracted from the Software Heritage
 archive must be fully and precisely identified by, for example,
 publishing the corresponding SWHID identifiers (note that, in the
 context of Software Heritage, public availability of the initial
 training data is a given: anyone can obtain it from the
 archive). This will enable use cases such as: studying biases
 (fairness), verifying if a code of interest was present in the
 training data (transparency), and providing appropriate attribution
 when generated code bears resemblance to training data (credit),
 among others.

 3. Mechanisms should be established, where possible, for authors to
 exclude their archived code from the training inputs before model
 training begins.

I hope it clarifies your concerns to some extent.


Moreover, you wrote: « I want absolutely nothing to do with them. »

Maybe there is a misunderstanding on your side about what “free
software” and GPL means because once “free software”, you cannot prevent
people to use “your” free software for any purposes you dislike.

If you want to bound the use cases of the software you create, you need
to explicitly specify that in the license.  And if you do, your software
will not be considered as “free software”.

That’s the double sword of “free software”. :-)


Simon,


1.

You seem to be misunderstanding the statement here that was said.

What you can do legally and what you can do socially are not always the 
same thing.


As advice for the future when somebody says a concern or wish they have, 
your first statement shouldn't be "but its legal" because that 
completely dismisses any constructive discussion that could be done.


And you seem to be talking about legal a lot here so thats not a good look.


Yes, legally Ian probably can't get lawyers on you. But nobody is 
talking about legally here.


What is in question here is whether Software Heritage respects people 
enough to do the right thing and respect their wishes without getting 
lawyers/legal involved.



Besides with the way you are framing Free Software as not respecting any 
social rules then that makes Free Software not attractive which is the 
opposite of what we are trying to do here :)



2.

> Somehow, a Content-Addressed system is designed around immutable 
content. And if one know how to implement a Content-Addressed system 
relying on mutable content, I would be very interested to know more 
about it.



Please refrain from doing such remarks. Nobody here suggested anything 
that you mention here and you effectively devalue the discussion by 
arguing like this and frame other people as stupid.



3.

Its not on people that are not included to write the code. If Guix is to 
be an inclusive project, then Guix should do the work so that people 
feel included.


You may disagree with this sure, but shutting down the discussion 
because nobody wrote the code for you is very elitist of you.



4.

> This language is not acceptable on Guix channel of communication.

Calling out transphobia it is very much accepted here actually :)

Its transphobic speech that is not accepted.


I welcome Software Heritage to make an announcement about this or some 
kind of official communication saying their stance.


Although I still wouldn't use them due to the LLMs and AI stuff that 
they are using. Which I hope at some point realize their mistake.



MSavoritias




Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Simon Tournier
Hi,

On lun., 18 mars 2024 at 12:10, MSavoritias  wrote:

> The right of a trans person to ask a project to not advertise their 
> deadname was never in question.
>
> Guix is a place that supports trans people and anybody else that wants 
> to change their name.

There is a difference between “advertise” and “part of the history”.

Do not take me wrong.  The right to be forgotten is one topic.  However,
as many people are saying: it is not an easy question.  There is legal
questions, technical questions, social questions, etc.

For what it is worth, Guix is built around the concept of immutability.
This is a core concept and deep in Guix internals.

Therefore, it would be more constructive if you come with a
proof-of-concept allowing “history rewrite” and strong “software
identification” property [1].  Else, the discussion is leading nowhere,
IMHO.

1: https://guix.gnu.org/en/blog/2024/identifying-software/

Cheers,
simon



Re: Concerns/questions around Software Heritage Archive

2024-03-18 Thread Simon Tournier
Hi,

On sam., 16 mars 2024 at 08:52, Ian Eure  wrote:

> They appear to be using the archive to build LLMs: 
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

About LLM, Software Heritage made a clear statement:

https://www.softwareheritage.org/2023/10/19/swh-statement-on-llm-for-code

Quoting:

We feel that the question is no longer whether LLMs for code
should be built. They are already being built, independently of
what we do, and there is no turning back.  The real question is
how they should be built and whom they should benefit.

Principles:

1. Knowledge derived from the Software Heritage archive must be
given back to humanity, rather than monopolized for private
gain. The resulting machine learning models must be made available
under a suitable open license, together with the documentation and
toolings needed to use them.

2. The initial training data extracted from the Software Heritage
archive must be fully and precisely identified by, for example,
publishing the corresponding SWHID identifiers (note that, in the
context of Software Heritage, public availability of the initial
training data is a given: anyone can obtain it from the
archive). This will enable use cases such as: studying biases
(fairness), verifying if a code of interest was present in the
training data (transparency), and providing appropriate attribution
when generated code bears resemblance to training data (credit),
among others.

3. Mechanisms should be established, where possible, for authors to
exclude their archived code from the training inputs before model
training begins.

I hope it clarifies your concerns to some extent.


Moreover, you wrote: « I want absolutely nothing to do with them. »

Maybe there is a misunderstanding on your side about what “free
software” and GPL means because once “free software”, you cannot prevent
people to use “your” free software for any purposes you dislike.

If you want to bound the use cases of the software you create, you need
to explicitly specify that in the license.  And if you do, your software
will not be considered as “free software”.

That’s the double sword of “free software”. :-)

Cheers,
simon



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread pelzflorian (Florian Pelz)
The guix-daemon does the hashing, so guix-daemon would have to be fixed
to override integrity checks (and it would have to be patched
retroactively in every time-travel).  Noone likes touching guix-daemon
(until it is rewritten in Guile), so I can imagine it would be
frustrating.

Now ftfy is not in Guix, but if Software Heritage deleted the data,
their customers might be equally frustrated.  It is SWH’s obligation to
delete the data, but looking at how rare GDPR action is and how
frequently Icecat’s TPRB add-on lays bare GDPR violations, if they do
not follow the obligation, there likely will not be legal action or the
legal action can be dismissed somehow.

The problem is, such inaction by SWH may be relevant for Guix’ Code of
Conduct, as pointed out by MSavoritias.

Regards,
Florian



Re: rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-18 Thread MSavoritias

On 3/18/24 02:10, Attila Lendvai wrote:


I was also distressed to see how poorly they treated a developer
who wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag


let's put aside the trans aspect of this question for a moment, because this 
question has broad implications, much broader than the regrettable struggles of 
trans people.

the question here is whether person A has the right to demand that others 
change their memory of A's past actions (i.e. rewrite history, or else become a 
felon... or maybe just unwelcome in polite society?).

so, let's just assume that i have decided to prefer being called a new name 
(without disclosing my reasons).

is it reasonable for me to demand from somebody else to change their memory of 
my past actions? e.g. to demand that they rewrite their memory/instances of my 
books that i have published under my previous name in the past? or that they 
forget my old name, and when the change happened? or that they do not link the 
two names to the same individual?

if so, then where is the line? what's the principle here? and what are its 
implications?

do i have the right to demand the replacement of a page in each copy that 
exists out there? i.e. should it be criminal (or just a sin?) to own old 
copies? do i have the right to demand that certain libraries must sell/burn 
their copies of my books and never own them again?

what if i committed a fraud? e.g. i pushed a backdoor somewhere... do i have 
the right to memory-hole my old identity?

and who will enforce such a right? the government? i.e. those people who already keep an 
(extralegal) record of whenever i farted in the past decade? where can i even file my 
GDPR request for that? would that really be a "right to be forgotten", or 
merely a tool of even tighter monopolization of The Central Database?

what if i'm a joker and i demand a new change every week for the rest of my 
life? do i have the right to the resources of every library out there? to keep 
their staff and computers busy for the next couple of decades?

but let's put the technical aspects aside; wherever we draw the line... what 
are the implications of that for borader society? because i sure see some 
actors out there who can hardly wait to start erasing certain records at the 
barrel of the law, including rewriting books of significance... (and while we 
are at it, i suggest to start preserving your offline/local copies, because 
we're up to a wild ride!)

humanity has reached an enormous challenge with the complete marginalization of 
the costs of storing and transmitting information. it's a completely 
new/different playing field, and how we proceed from here has grave 
implications. this questions is nowhere near as obvious/trivial as presented in 
the cited blog post.

The right of a trans person to ask a project to not advertise their 
deadname was never in question.


Guix is a place that supports trans people and anybody else that wants 
to change their name.


We don't need "enforcers" here or put the "burden of proof" on people.


MSavoritias




rewriting history; Was: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Attila Lendvai
> I was also distressed to see how poorly they treated a developer
> who wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag


let's put aside the trans aspect of this question for a moment, because this 
question has broad implications, much broader than the regrettable struggles of 
trans people.

the question here is whether person A has the right to demand that others 
change their memory of A's past actions (i.e. rewrite history, or else become a 
felon... or maybe just unwelcome in polite society?).

so, let's just assume that i have decided to prefer being called a new name 
(without disclosing my reasons).

is it reasonable for me to demand from somebody else to change their memory of 
my past actions? e.g. to demand that they rewrite their memory/instances of my 
books that i have published under my previous name in the past? or that they 
forget my old name, and when the change happened? or that they do not link the 
two names to the same individual?

if so, then where is the line? what's the principle here? and what are its 
implications?

do i have the right to demand the replacement of a page in each copy that 
exists out there? i.e. should it be criminal (or just a sin?) to own old 
copies? do i have the right to demand that certain libraries must sell/burn 
their copies of my books and never own them again?

what if i committed a fraud? e.g. i pushed a backdoor somewhere... do i have 
the right to memory-hole my old identity?

and who will enforce such a right? the government? i.e. those people who 
already keep an (extralegal) record of whenever i farted in the past decade? 
where can i even file my GDPR request for that? would that really be a "right 
to be forgotten", or merely a tool of even tighter monopolization of The 
Central Database?

what if i'm a joker and i demand a new change every week for the rest of my 
life? do i have the right to the resources of every library out there? to keep 
their staff and computers busy for the next couple of decades?

but let's put the technical aspects aside; wherever we draw the line... what 
are the implications of that for borader society? because i sure see some 
actors out there who can hardly wait to start erasing certain records at the 
barrel of the law, including rewriting books of significance... (and while we 
are at it, i suggest to start preserving your offline/local copies, because 
we're up to a wild ride!)

humanity has reached an enormous challenge with the complete marginalization of 
the costs of storing and transmitting information. it's a completely 
new/different playing field, and how we proceed from here has grave 
implications. this questions is nowhere near as obvious/trivial as presented in 
the cited blog post.

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“It is only when compassion is present that people allow themselves to see the 
truth. […] Compassion is a kind of healing agent that helps us tolerate the 
hurt of seeing the truth.”
— A.H. Almaas (1944–), 'Elements of the Real in Man (Diamond Heart, 
Book 1)'




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Attila Lendvai
> only a 35 yrs old white cis boy

you're judging a group of individuals, namely those who were handed the cis 
white male mix at the genetic lottery, as a uniform blob. and maybe even 
somewhat deplorable, if i'm reading your right.

does it make sense to judge an individual based on some coincidental 
properties? or really, based on anything else than their actions? does it make 
sense to discuss the actions/morality of a group of individuals that is formed 
based on some coincidental properties? e.g. what can we say about the morality 
of all the blond people?

and ultimately, is that an effective way of speaking up for human rights and 
welcoming environments -- of all things?

maybe it's time to take a thorough look at the book that you're preaching from?

if i may, let me attempt to inspire you:

“The world is changed by your example, not by your opinion.”
— Paulo Coelho (1947–)
%
“Yesterday I was clever, so I wanted to change the world. Today I am wise, so I 
am changing myself.”
— Rumi (1207–1273)
%
“If there is to be peace in the world,
There must be peace in the nations.
If there is to be peace in the nations,
There must be peace in the cities.
If there is to be peace in the cities,
There must be peace between neighbors.
If there is to be peace between neighbors,
There must be peace in the home.
If there is to be peace in the home,
There must be peace in the heart.”
— Lao Tzu (sixth century BC)
%
“A man of humanity is one who, in seeking to establish himself, finds a 
foothold for others and who, in desiring attaining himself, helps others to 
attain.”
— Confucius (551–479 BC)
%
“To put the world in order, we must first put the nation in order; to put the 
nation in order, we must first put the family in order; to put the family in 
order; we must first cultivate our personal life; we must first set our hearts 
right.”
— Confucius (551–479 BC)
%
“Until we have met the monsters in ourselves, we keep trying to slay them in 
the outer world. And we find that we cannot. For all darkness in the world 
stems from darkness in the heart. And it is there that we must do our work.”
— Marianne Williamson (1952–), 'Everyday Grace: Having Hope, Finding 
Forgiveness And Making Miracles' (2004)
%
“If things go wrong in the world, this is because something is wrong with the 
individual, because something is wrong with me. Therefore, if I am sensible, I 
shall put myself right first”
— Carl Jung (1875–1961), 'The Meaning of Psychology for Modern Man'

-- 
• attila lendvai
• PGP: 963F 5D5F 45C7 DFCD 0A39
--
“If liberty means anything at all, it means the right to tell people what they 
do not want to hear.”
— George Orwell (1903–1950)




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Richard Sent
Regarding Guix development, if the decision is made to not change
existing policy or implement another authorship mechanism, I think some
text could be added to the manual explaining such.

Contributing to Guix is an intentional thing, unlike SWH. Updating the
manual means contributors will, at least, be making an informed decision
to contribute, knowing that names cannot be changed in the Guix repo's
history due to X, Y, and Z consequences in Guix's functionality.

I'm not suggesting that this solution is "the end-all-be-all" or
invalidates alternative avenues, but I feel it is an improvement over
the status quo with no negative tradeoffs. I would not support a
solution that obsoletes time-machine or requires regular manual
intervention during upgrades.

Personally as a new contributor I find it gratifying to see my name in
the commit history.

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Ludovic Courtès
Hi,

Ian Eure  skribis:

> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

To me, if the end result is that copyleft licenses are ignored, as is
the case with Microsoft’s CoPilot, then we have a problem.

That’s no excuse, but the problem goes beyond SWH: people upload copies
of repositories to GitHub without one’s consent (nothing to blame them
for, it’s free software), and then code ends up being used as training
data for CoPilot.

As you may have seen, this is being discussed on the Fediverse.  I’d
like to leave the SWH people time to reply to concerns that have been
raised.

> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

That’s another concern, with append-only storage in general, starting
with Git.  We should look for solutions that work for both contributors
who change names and for users.  This has happened several times in Guix
and what people did was search/replace their name and adjust ‘.mailmap’.

Thanks,
Ludo’.



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread MSavoritias

On 3/17/24 18:20, Ian Eure wrote:



MSavoritias  writes:


On 3/17/24 11:39, Lars-Dominik Braun wrote:

Hey,


I have heard folks in the Guix maintenance sphere claim that we

never rewrite git history in Guix, as a matter of policy. I believe
we should revisit that policy (is it actually written anywhere?)
with an eye towards possible exceptions, and develop a mechanism for
securely maintaining continuity of Guix installations after history
has been rewritten so that we maintain this as a technical
possibility in the future, even if we should choose to use it
sparingly.
the fallout of rewriting Guix’ git history would be devastating. It
would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our trust anchor
if we rewrite history earlier than the introduction of this feature,
b) `guix pull` outright rejects changes to the commit history to 
prevent

downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of providing
reproducible software environments since the commit hash is used to
identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – would
be worth the effort. Our contributors matter, but so do our users. 
Never
ever rewriting our git history is a tradeoff we should make for our 
users.


Lars



Thats a good point. in the sense that its a tradeoff here and I
absolutely agree.


But let me add some food for thought here:

1. Were the social aspects considered when the system came into place?

2. Is it more important for the system to stay as is than to welcome
new contributors?

3. You mention "its a tradeoff we should make for our users". How many
trans people where involved in that decision and how much did their
opinion matter in this?


I am saying this because giving power to people(what is called users)
is not only handling them code or make sure everything is free
software.

Its also the hard part of making sure the voices of people that can
not code is heard and is participating and taking in mind.



Just want to say that I appreciate and agree with your thoughtful words.

I’d also note that name changes aren’t a concern limited to trans 
people, and framing this as "we have to upend everything Because 
Transgender" is both wrong and feels pretty bad to me.  Anyone can 
change their name at any time for any reason, or no reason at all, and 
may wish to update historical references to their previous names.  
Having a mechanism to support this is, in my view, a matter of basic 
decency and respect for all humans.


Thanks,

 — Ian


You are right. I failed to see how it could be desirable for other 
people too.


I agree it should be done for everybody.


MSavoritias




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Ian Eure



MSavoritias  writes:


On 3/17/24 13:53, paul wrote:

Hi all ,

thank you MSavoritias for bringing up points that many of us
share. It's clearly a tradeoff what to do about the past. For 
the
future, as Christpher already stated, we need a serious 
solution
that we can uphold as a free software project that does not 
alienate

users or contributors.

My opinion is that names are just wrong to be included, not 
only
because of deadnames, but in general having a database with a 
column
first_name and a column second_name is something only a 35 yrs 
old

white cis boy could have thought was a good idea to model the
spectrum of names humans use all over the world:

https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/
If we'd really need to identify contributors, and obviously 
Guix
doesn't, we could use an UUID/machine readable identifier which 
can
then be mapped to a displayed name. I believe git can already 
be

configured to do so.


giacomo



The uuid sounds like a very interesting solution indeed.

I wonder how easy it could be to add it to git.



This also seems like interesting territory to explore.  The 
concerns raised around rewriting history have valid points; I 
think it’s impractical to rewrite history any time a change needs 
to happen, as that would be an ongoing source of disruption.  But 
rewriting history *once*, to switch to a more general mechanism, 
seems like a reasonable trade to me.  This also presents an 
opportunity: we could combine this with a default branch switch 
from master to main.  A news entry left as the final commit in 
master could inform people of whatever steps may be needed to 
update (if that can’t be automated), and the main branch would 
contain the rewritten history.


It’s certainly not a perfect solution, but it seems pragmatic.

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Ian Eure



MSavoritias  writes:


On 3/17/24 11:39, Lars-Dominik Braun wrote:

Hey,

I have heard folks in the Guix maintenance sphere claim that 
we
never rewrite git history in Guix, as a matter of policy. I 
believe
we should revisit that policy (is it actually written 
anywhere?)
with an eye towards possible exceptions, and develop a 
mechanism for
securely maintaining continuity of Guix installations after 
history

has been rewritten so that we maintain this as a technical
possibility in the future, even if we should choose to use it
sparingly.
the fallout of rewriting Guix’ git history would be 
devastating. It

would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our 
trust anchor
if we rewrite history earlier than the introduction of this 
feature,
b) `guix pull` outright rejects changes to the commit history 
to prevent

downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of 
providing
reproducible software environments since the commit hash is 
used to

identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – 
would
be worth the effort. Our contributors matter, but so do our 
users. Never
ever rewriting our git history is a tradeoff we should make for 
our users.


Lars



Thats a good point. in the sense that its a tradeoff here and I
absolutely agree.


But let me add some food for thought here:

1. Were the social aspects considered when the system came into 
place?


2. Is it more important for the system to stay as is than to 
welcome

new contributors?

3. You mention "its a tradeoff we should make for our 
users". How many
trans people where involved in that decision and how much did 
their

opinion matter in this?


I am saying this because giving power to people(what is called 
users)

is not only handling them code or make sure everything is free
software.

Its also the hard part of making sure the voices of people that 
can

not code is heard and is participating and taking in mind.



Just want to say that I appreciate and agree with your thoughtful 
words.


I’d also note that name changes aren’t a concern limited to trans 
people, and framing this as "we have to upend everything Because 
Transgender" is both wrong and feels pretty bad to me.  Anyone can 
change their name at any time for any reason, or no reason at all, 
and may wish to update historical references to their previous 
names.  Having a mechanism to support this is, in my view, a 
matter of basic decency and respect for all humans.


Thanks,

 — Ian



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Olivier Dion
On Sat, 16 Mar 2024, Ian Eure  wrote:

[...]

> GPL’d software I’ve created has been packaged for Guix, which I assume
> means it’s been included in SWH.  While I’m dealing with their (IMO:
> unethical) opt-out process, I likely also need to stop new copies from
> being uploaded again in the future.

Even without Guix, SWH could upload your projects into their "database".
In fact, I believe anyone can ask to archive your project to SWH.  So
even if you ask Guix to not do the archiving, anyone contributing might
change that in the future.

I believe that preventing Guix from archiving your software is a
symbolic standpoint -- which I respect --, but would put more burden on
the Guix developers.  On the other hand, if enough people refuse to
archive to SWH, this might shift Guix onto a new direction for longterm
source archiving.

I'm not a lawyer, but perhaps a first solution -- for the AI stuff --
would be to add an exception to the GPL that prevents AI from training
on it.  Alas, as usual, our legislators are late on that matter, so that
might not even work.

[...]

-- 
Olivier Dion
oldiob.ca



Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Tomas Volf
On 2024-03-17 12:53:54 +0100, paul wrote:
> only a 35 yrs old white cis boy

Could you stop labeling people like this?  It makes me feel uncomfortable and
not welcomed...

T.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread MSavoritias



On 3/17/24 13:53, paul wrote:

Hi all ,

thank you MSavoritias for bringing up points that many of us share. 
It's clearly a tradeoff what to do about the past. For the future, as 
Christpher already stated, we need a serious solution that we can 
uphold as a free software project that does not alienate users or 
contributors.


My opinion is that names are just wrong to be included, not only 
because of deadnames, but in general having a database with a column 
first_name and a column second_name is something only a 35 yrs old 
white cis boy could have thought was a good idea to model the spectrum 
of names humans use all over the world:


https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/ 



If we'd really need to identify contributors, and obviously Guix 
doesn't, we could use an UUID/machine readable identifier which can 
then be mapped to a displayed name. I believe git can already be 
configured to do so.



giacomo



The uuid sounds like a very interesting solution indeed.

I wonder how easy it could be to add it to git.


I agree that making some rules about names that are going to be wrong at 
some point or in some place is the wrong solution long term for sure.



MSavoritias




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread paul

Hi all ,

thank you MSavoritias for bringing up points that many of us share. It's 
clearly a tradeoff what to do about the past. For the future, as 
Christpher already stated, we need a serious solution that we can uphold 
as a free software project that does not alienate users or contributors.


My opinion is that names are just wrong to be included, not only because 
of deadnames, but in general having a database with a column first_name 
and a column second_name is something only a 35 yrs old white cis boy 
could have thought was a good idea to model the spectrum of names humans 
use all over the world:


https://web.archive.org/web/20240317114846/https://www.kalzumeus.com/2010/06/17/falsehoods-programmers-believe-about-names/

If we'd really need to identify contributors, and obviously Guix 
doesn't, we could use an UUID/machine readable identifier which can then 
be mapped to a displayed name. I believe git can already be configured 
to do so.



giacomo




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread MSavoritias



On 3/17/24 11:39, Lars-Dominik Braun wrote:

Hey,


I have heard folks in the Guix maintenance sphere claim that we never rewrite 
git history in Guix, as a matter of policy. I believe we should revisit that 
policy (is it actually written anywhere?) with an eye towards possible 
exceptions, and develop a mechanism for securely maintaining continuity of Guix 
installations after history has been rewritten so that we maintain this as a 
technical possibility in the future, even if we should choose to use it 
sparingly.

the fallout of rewriting Guix’ git history would be devastating. It
would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our trust anchor
if we rewrite history earlier than the introduction of this feature,
b) `guix pull` outright rejects changes to the commit history to prevent
downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of providing
reproducible software environments since the commit hash is used to
identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – would
be worth the effort. Our contributors matter, but so do our users. Never
ever rewriting our git history is a tradeoff we should make for our users.

Lars


Thats a good point. in the sense that its a tradeoff here and I 
absolutely agree.



But let me add some food for thought here:

1. Were the social aspects considered when the system came into place?

2. Is it more important for the system to stay as is than to welcome new 
contributors?


3. You mention "its a tradeoff we should make for our users". How many 
trans people where involved in that decision and how much did their 
opinion matter in this?



I am saying this because giving power to people(what is called users) is 
not only handling them code or make sure everything is free software.


Its also the hard part of making sure the voices of people that can not 
code is heard and is participating and taking in mind.


I am not trying to say what we should do about commit history rewriting 
here. Personally the tradeoffs are probably worth it.


But I am trying to say what Guix should do as a culture over including 
people or excluding in the case of Software Heritage.



MSavoritias




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread Lars-Dominik Braun
Hey,

> I have heard folks in the Guix maintenance sphere claim that we never rewrite 
> git history in Guix, as a matter of policy. I believe we should revisit that 
> policy (is it actually written anywhere?) with an eye towards possible 
> exceptions, and develop a mechanism for securely maintaining continuity of 
> Guix installations after history has been rewritten so that we maintain this 
> as a technical possibility in the future, even if we should choose to use it 
> sparingly.

the fallout of rewriting Guix’ git history would be devastating. It
would break every single Guix installation, because

a) `guix pull` authenticates commits and we might lose our trust anchor
if we rewrite history earlier than the introduction of this feature,
b) `guix pull` outright rejects changes to the commit history to prevent
downgrade attacks.

Additionally it would break every single existing usage of the
time machine and thereby completely defeat the goal of providing
reproducible software environments since the commit hash is used to
identify the point in time to jump to.

I doubt developing “mechanisms” – whatever they look like – would
be worth the effort. Our contributors matter, but so do our users. Never
ever rewriting our git history is a tradeoff we should make for our users.

Lars




Re: Concerns/questions around Software Heritage Archive

2024-03-17 Thread MSavoritias



On 3/16/24 21:45, Tomas Volf wrote:

On 2024-03-16 20:24:50 +0200, MSavoritias wrote:

I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

This is probably worth thinking about as Guix is in a similar situation
regarding publishing source code, and people potentially wanting to
change historical source code both in things Guix packages and Guix
itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars that
contain source code.

We have 17TiB of compressed source code and built software stored for
bordeaux.guix.gnu.org now and we should probably work out how to handle
people asking for things to be removed or changed (for any and all
reasons).

It's probably worth working out our position on this in advance of
someone asking.

I would go a step further actually. Software Heritage is effectively
breaking CoC of Guix now.

Im not proposing removing all code or something obviously that connects to
Software Heritage, but there should be some social action we can take.


For example until the matter is resolved and Software Heritage implements a
process that respects trans rights Software Heritage should not be welcome
in Guix Spaces.

I did skim the articles and I did not see any details on what the technical
solution should be.  SWH, among other things, archives the repositories and
allows fetching them by commit hash.  At least as far as I know.  Since that
commit hash does contain the author field, what is the proposed solution here to
change the author name without changing the commit hash?

While I am not a huge fan of the ability to map the "fake" author name over the
real one in the UI, what other solutions do you or the article author envision?
I am genuinely curious what you think can be done here.


I think you are arguing for something else than what I wrote? I didn't 
say about technical solutions and that's up to Software Heritage to 
figure it out.


I did say that there should be social consequences since Software 
Heritage is breaking CoC here.


And by breaking CoC I mean that Software Heritage seems to have a 
complete lack of empathy towards trans people.



Regarding what Guix could do personally the answer is clear: People are 
more important than machines and code.


So we should find a way that trans people feel safe in Guix.


MSavoritias


Have a nice day,
Tomas Volf




Fw: Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ryan Prior
[I intended to CC the following to guix-devel but forgot:]

--- Forwarded Message ---
From: Ryan Prior 
Date: On Saturday, March 16th, 2024 at 6:36 PM
Subject: Re: Concerns/questions around Software Heritage Archive
To: Vivien Kraus 


> 
> 
> On Saturday, March 16th, 2024 at 6:13 PM, Vivien Kraus 
> viv...@planete-kraus.eu wrote:
> 
> > 2. is more difficult, because Guix contributors sometimes change their
> > names too, and a commit reading “update my name” is not the best
> > solution. If I understand correctly, rewriting the history would be
> > understood as a “downgrade attack”, contrary to the ftfy case where the
> > developer could rewrite the history without such consequences. Is my
> > understanding correct?
> 
> 
> It's only a problem IMO because we make the decision to treat Guix as an 
> append-only series of commits and treat any other outcome as a potential 
> attack. One alternate solution would be to allow provision of an 
> authenticated alternate-history data structure, which indicates a set of (old 
> commit hash, new commit hash) tuples going back to the first rewritten commit 
> in the history, and the whole thing would be signed by a Guix committer. That 
> way, the updating Guix client can rewind history, apply the new commit(s), 
> verify that the old chain and new chain match what's provided in the 
> alternate-history structure & that its signature is valid. Thus verified, the 
> Guix installation could continue without needing to allow a downgrade 
> exception.
> 
> Perhaps there are much better ways of handling this, but I propose it in 
> hopes of clarifying that there are technical solutions which preserve 
> integrity while permitting history rewrites in situations where it is 
> desirable.
> 
> I have requested previously that some commits I've provided be rewritten to 
> update my name. In my case, it's because I've sometimes misconfigured my 
> email software such that some commits by me are signed just "ryan" or "Ryan 
> Prior via Protonmail" or similar, rather than my preference which is "Ryan 
> Prior".
> 
> In my case this causes me no harm and is simply an annoyance, so when I 
> encountered resistance to rewriting the offending commits, I dropped the 
> matter, and I still consider it dropped and settled. Even if we developed the 
> capability to securely present a rewritten history, I wouldn't demand that 
> such be used to address small concerns like mine.
> 
> However, I know we have at least two trans Guix contributors. Do they have 
> any commits with their deadnames on them? Not that this is an invitation to 
> go look; they can tell us if this is a concern worth raising. I include the 
> detail to clarify that this is not a distant concern. Perhaps they have been 
> silent thus far for the same reason that I have, because the policy against 
> rewrites presents too high a barrier? (Or it may not bother them, or maybe 
> they used their initials which are the same etc?) In any case I think it 
> would be courteous to develop a procedure by which we could remove deadnames 
> from old commits, or otherwise remove harmful information from Guix's 
> development history, should this become a necessity.
> 
> Ryan



Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Tomas Volf
On 2024-03-17 00:16:26 +0100, Vivien Kraus wrote:
> Hello!
>
> Le samedi 16 mars 2024 à 17:50 +, Christopher Baines a écrit :
> > This is probably worth thinking about as Guix is in a similar
> > situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
>
> I see two problems:
>
> 1. providing packages;
> 2. developing Guix itself.
>
> I am sure that 1. is not a real problem, we could just ask the
> developer to release a new version incrementing the patch number,
> upgrade it on our side, and forget the old version. Garbage collection
> would ultimately get rid of the old tarballs.

How would that approach interact with `guix time-machine'?  If developer takes
the approach of the package mentioned here (rewrite the history), would that not
cause the previous version to be no longer buildable, since the commit would no
longer exist?

I am not sure what the developer would do for old tarballs in this situation.
Re-release them from the re-written history or just drop them?  Either would be
a problem.  Or would they not care about dead name in the tarballs?

Currently SWH protects against the first (git commit), not sure if there is any
protection against the second currently (does SWH injects tarballs as well?).

Either I am missing something, or this would actually be a problem for the
time-machine use case.

> 2. is more difficult, because Guix contributors sometimes change their
> names too, and a commit reading “update my name” is not the best
> solution. If I understand correctly, rewriting the history would be
> understood as a “downgrade attack”, contrary to the ftfy case where the
> developer could rewrite the history without such consequences. Is my
> understanding correct?

For my use case using .mailmap was enough, but that was not a dead name
situation.  However it is a solution that works today, and changes the name
visible in most git operations (afaict) without modifying the history.  So
something to consider.


Tomas

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Vivien Kraus
Hello!

Le samedi 16 mars 2024 à 17:50 +, Christopher Baines a écrit :
> This is probably worth thinking about as Guix is in a similar
> situation
> regarding publishing source code, and people potentially wanting to
> change historical source code both in things Guix packages and Guix
> itself.

I see two problems:

1. providing packages;
2. developing Guix itself.

I am sure that 1. is not a real problem, we could just ask the
developer to release a new version incrementing the patch number,
upgrade it on our side, and forget the old version. Garbage collection
would ultimately get rid of the old tarballs.

2. is more difficult, because Guix contributors sometimes change their
names too, and a commit reading “update my name” is not the best
solution. If I understand correctly, rewriting the history would be
understood as a “downgrade attack”, contrary to the ftfy case where the
developer could rewrite the history without such consequences. Is my
understanding correct?



Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ryan Prior
On Saturday, March 16th, 2024 at 10:52 AM, Ian Eure  wrote:

> 
> 
> Hi Guixy people,
> [...]
> I was also distressed to see how poorly they treated a developer
> who wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

I read these posts with interest. It is worth noting that the complained-about 
organization, Inria, supports Guix as well & has close historical ties to the 
project (although it is does not have decision-making power here AFAIK.) It is 
a shame that Inria have treated this matter with such apparent disregard.

I have heard folks in the Guix maintenance sphere claim that we never rewrite 
git history in Guix, as a matter of policy. I believe we should revisit that 
policy (is it actually written anywhere?) with an eye towards possible 
exceptions, and develop a mechanism for securely maintaining continuity of Guix 
installations after history has been rewritten so that we maintain this as a 
technical possibility in the future, even if we should choose to use it 
sparingly.

Ryan



Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Tomas Volf
On 2024-03-16 12:06:27 -0700, Ian Eure wrote:
>
> Christopher Baines  writes:
>
> > [[PGP Signed Part:Undecided]]
> >
> > Ian Eure  writes:
> >
> > > Hi Guixy people,
> > >
> > > I’d never heard of SWH before I started hacking on Guix last fall,
> > > and
> > > it struck me as rather a good idea.  However, I’ve seen some things
> > > lately which have soured me on them.
> > >
> > > They appear to be using the archive to build LLMs:
> > > https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
> > >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > >
> > > GPL’d software I’ve created has been packaged for Guix, which I
> > > assume
> > > means it’s been included in SWH.  While I’m dealing with their (IMO:
> > > unethical) opt-out process, I likely also need to stop new copies
> > > from
> > > being uploaded again in the future.
> > >
> > > Is there a way to indicate, in a Guix package, that it should
> > > *never*
> > > be included in SWH?
> >
> > Not currently, and I don't really see the point in such a mechanism. If
> > you really never want them to store your code, then you need to license
> > it accordingly (and not make it free software).
> >
>
> I don’t want my code in SWH *because* it’s free.  A primary use of LLMs is
> laundering freely licensed software into proprietary, commercial projects
> through "AI" code completion and generation. Any Free software in an LLM
> training set can and will be used in violation of its license, without a
> clear path for the author to seek recourse.  I deleted my code off Github
> and abandoned it completely for this exact reason, and am deeply irked to be
> going through this nonsense again.
>
> A more salient question may be: Is there a process within Guix (either the
> program or the organization) which uploads source to SWH?  Or does it rely
> on SWH indepently?

`guix lint PKG-NAME' schedules SWH archival if possible.  No code is directly
uploaded (at least currently), so assuming you have a IP list of SWH, it should
be possible to block it.  At least AFAIK.

If you have the list, or know how to get it, could you share it?  I would be
interesting in blocking it as well from my git hosting.

>
> If the latter, my problem is likely solved by blocking SWH at my network
> edge and opting out of their archive (or trying to) and the downstream
> training models they’ve already put it in.  If the former, the only control
> I currently have to protect my license is removing packages from Guix which
> contain it.  I don’t want that outcome.
>
> Noting also that the path here seems to be SWH->huggingface->bigcode
> training set, and the opt-out process for the training set appears to be a
> complete sham.  To opt-out, you must create a Github Issue; only one opt-out
> has *ever* been processed, and there are 200+ sitting there, many with no
> response for nearly a year[1].  I want no part of any of this.
>
>
> > > Is there a way to tell Guix to never download source from SWH?
> >
> > Also no, and it's probably best to do this at the network level on your
> > systems/network if you want this to be the case.
> >
>
> I’ll investigate this, though I’d prefer if there was a way to configure
> source mirrors in the Guix daemon.
>
>
> > Skipping back to this though:
> >
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> >
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
> >
>
> Yes, I agree that Guix needs a better solution for this.
>
> Thanks,
>
>  — Ian
>
> [1]: https://github.com/bigcode-project/opt-out-v2/issues
>

T.

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Tomas Volf
On 2024-03-16 20:24:50 +0200, MSavoritias wrote:
> > > I was also distressed to see how poorly they treated a developer who
> > > wished to update their name:
> > > https://cohost.org/arborelia/post/4968198-the-software-heritag
> > > https://cohost.org/arborelia/post/5052044-the-software-heritag
> > This is probably worth thinking about as Guix is in a similar situation
> > regarding publishing source code, and people potentially wanting to
> > change historical source code both in things Guix packages and Guix
> > itself.
> >
> > Like Software Heritage, there's cryptographical implications for
> > rewriting the Git history and modifying source tarballs or nars that
> > contain source code.
> >
> > We have 17TiB of compressed source code and built software stored for
> > bordeaux.guix.gnu.org now and we should probably work out how to handle
> > people asking for things to be removed or changed (for any and all
> > reasons).
> >
> > It's probably worth working out our position on this in advance of
> > someone asking.
>
> I would go a step further actually. Software Heritage is effectively
> breaking CoC of Guix now.
>
> Im not proposing removing all code or something obviously that connects to
> Software Heritage, but there should be some social action we can take.
>
>
> For example until the matter is resolved and Software Heritage implements a
> process that respects trans rights Software Heritage should not be welcome
> in Guix Spaces.

I did skim the articles and I did not see any details on what the technical
solution should be.  SWH, among other things, archives the repositories and
allows fetching them by commit hash.  At least as far as I know.  Since that
commit hash does contain the author field, what is the proposed solution here to
change the author name without changing the commit hash?

While I am not a huge fan of the ability to map the "fake" author name over the
real one in the UI, what other solutions do you or the article author envision?
I am genuinely curious what you think can be done here.

Have a nice day,
Tomas Volf

--
There are only two hard things in Computer Science:
cache invalidation, naming things and off-by-one errors.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ian Eure



Christopher Baines  writes:


[[PGP Signed Part:Undecided]]

Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last 
fall, and
it struck me as rather a good idea.  However, I’ve seen some 
things

lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I 
assume
means it’s been included in SWH.  While I’m dealing with their 
(IMO:
unethical) opt-out process, I likely also need to stop new 
copies from

being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should 
*never*

be included in SWH?


Not currently, and I don't really see the point in such a 
mechanism. If
you really never want them to store your code, then you need to 
license

it accordingly (and not make it free software).



I don’t want my code in SWH *because* it’s free.  A primary use of 
LLMs is laundering freely licensed software into proprietary, 
commercial projects through "AI" code completion and generation. 
Any Free software in an LLM training set can and will be used in 
violation of its license, without a clear path for the author to 
seek recourse.  I deleted my code off Github and abandoned it 
completely for this exact reason, and am deeply irked to be going 
through this nonsense again.


A more salient question may be: Is there a process within Guix 
(either the program or the organization) which uploads source to 
SWH?  Or does it rely on SWH indepently?


If the latter, my problem is likely solved by blocking SWH at my 
network edge and opting out of their archive (or trying to) and 
the downstream training models they’ve already put it in.  If the 
former, the only control I currently have to protect my license is 
removing packages from Guix which contain it.  I don’t want that 
outcome.


Noting also that the path here seems to be 
SWH->huggingface->bigcode training set, and the opt-out process 
for the training set appears to be a complete sham.  To opt-out, 
you must create a Github Issue; only one opt-out has *ever* been 
processed, and there are 200+ sitting there, many with no response 
for nearly a year[1].  I want no part of any of this.




Is there a way to tell Guix to never download source from SWH?


Also no, and it's probably best to do this at the network level 
on your

systems/network if you want this to be the case.



I’ll investigate this, though I’d prefer if there was a way to 
configure source mirrors in the Guix daemon.




Skipping back to this though:

I was also distressed to see how poorly they treated a 
developer who

wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag


This is probably worth thinking about as Guix is in a similar 
situation
regarding publishing source code, and people potentially wanting 
to
change historical source code both in things Guix packages and 
Guix

itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars 
that

contain source code.

We have 17TiB of compressed source code and built software 
stored for
bordeaux.guix.gnu.org now and we should probably work out how to 
handle
people asking for things to be removed or changed (for any and 
all

reasons).

It's probably worth working out our position on this in advance 
of

someone asking.



Yes, I agree that Guix needs a better solution for this.

Thanks,

 — Ian

[1]: https://github.com/bigcode-project/opt-out-v2/issues



Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Christopher Baines

MSavoritias  writes:

> On 3/16/24 19:50, Christopher Baines wrote:
>> Ian Eure  writes:
>>
>>> Hi Guixy people,
>>>
>>> I’d never heard of SWH before I started hacking on Guix last fall, and
>>> it struck me as rather a good idea.  However, I’ve seen some things
>>> lately which have soured me on them.
>>>
>>> They appear to be using the archive to build LLMs:
>>> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>>>
>>> I was also distressed to see how poorly they treated a developer who
>>> wished to update their name:
>>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>>>
>>> GPL’d software I’ve created has been packaged for Guix, which I assume
>>> means it’s been included in SWH.  While I’m dealing with their (IMO:
>>> unethical) opt-out process, I likely also need to stop new copies from
>>> being uploaded again in the future.
>>>
>>> Is there a way to indicate, in a Guix package, that it should *never*
>>> be included in SWH?
>> Not currently, and I don't really see the point in such a mechanism. If
>> you really never want them to store your code, then you need to license
>> it accordingly (and not make it free software).
>
> You are talking about legal tho. Yes legally they can copy the code.
>
> But what can Guix do socially to give people the choice? For reasons
> of consent that is.

...

>>> I was also distressed to see how poorly they treated a developer who
>>> wished to update their name:
>>> https://cohost.org/arborelia/post/4968198-the-software-heritag
>>> https://cohost.org/arborelia/post/5052044-the-software-heritag
>> This is probably worth thinking about as Guix is in a similar situation
>> regarding publishing source code, and people potentially wanting to
>> change historical source code both in things Guix packages and Guix
>> itself.
>>
>> Like Software Heritage, there's cryptographical implications for
>> rewriting the Git history and modifying source tarballs or nars that
>> contain source code.
>>
>> We have 17TiB of compressed source code and built software stored for
>> bordeaux.guix.gnu.org now and we should probably work out how to handle
>> people asking for things to be removed or changed (for any and all
>> reasons).
>>
>> It's probably worth working out our position on this in advance of
>> someone asking.
>
> I would go a step further actually. Software Heritage is effectively
> breaking CoC of Guix now.
>
> Im not proposing removing all code or something obviously that
> connects to Software Heritage, but there should be some social action
> we can take.
>
>
> For example until the matter is resolved and Software Heritage
> implements a process that respects trans rights Software Heritage
> should not be welcome in Guix Spaces.

As I say, Guix is in a very similar situation as a project to Software
Heritage, we publish artefacts containing peoples personal details and
there are technical implications in changing the personal details in
those artefacts.

The only difference as far as I'm aware is that no one is currently
asking Guix as a project to update their personal details in the
artefacts we store and publish.

As a project, we should sort out our stuff before jumping to judge
others.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread MSavoritias



On 3/16/24 19:50, Christopher Baines wrote:

Ian Eure  writes:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, and
it struck me as rather a good idea.  However, I’ve seen some things
lately which have soured me on them.

They appear to be using the archive to build LLMs:
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/

I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

GPL’d software I’ve created has been packaged for Guix, which I assume
means it’s been included in SWH.  While I’m dealing with their (IMO:
unethical) opt-out process, I likely also need to stop new copies from
being uploaded again in the future.

Is there a way to indicate, in a Guix package, that it should *never*
be included in SWH?

Not currently, and I don't really see the point in such a mechanism. If
you really never want them to store your code, then you need to license
it accordingly (and not make it free software).


You are talking about legal tho. Yes legally they can copy the code.

But what can Guix do socially to give people the choice? For reasons of 
consent that is.



I was also distressed to see how poorly they treated a developer who
wished to update their name:
https://cohost.org/arborelia/post/4968198-the-software-heritag
https://cohost.org/arborelia/post/5052044-the-software-heritag

This is probably worth thinking about as Guix is in a similar situation
regarding publishing source code, and people potentially wanting to
change historical source code both in things Guix packages and Guix
itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars that
contain source code.

We have 17TiB of compressed source code and built software stored for
bordeaux.guix.gnu.org now and we should probably work out how to handle
people asking for things to be removed or changed (for any and all
reasons).

It's probably worth working out our position on this in advance of
someone asking.


I would go a step further actually. Software Heritage is effectively 
breaking CoC of Guix now.


Im not proposing removing all code or something obviously that connects 
to Software Heritage, but there should be some social action we can take.



For example until the matter is resolved and Software Heritage 
implements a process that respects trans rights Software Heritage should 
not be welcome in Guix Spaces.



MSavoritias




Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread Christopher Baines

Ian Eure  writes:

> Hi Guixy people,
>
> I’d never heard of SWH before I started hacking on Guix last fall, and
> it struck me as rather a good idea.  However, I’ve seen some things
> lately which have soured me on them.
>
> They appear to be using the archive to build LLMs:
> https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/
>
> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag
>
> GPL’d software I’ve created has been packaged for Guix, which I assume
> means it’s been included in SWH.  While I’m dealing with their (IMO:
> unethical) opt-out process, I likely also need to stop new copies from
> being uploaded again in the future.
>
> Is there a way to indicate, in a Guix package, that it should *never*
> be included in SWH?

Not currently, and I don't really see the point in such a mechanism. If
you really never want them to store your code, then you need to license
it accordingly (and not make it free software).

> Is there a way to tell Guix to never download source from SWH?

Also no, and it's probably best to do this at the network level on your
systems/network if you want this to be the case.

Skipping back to this though:

> I was also distressed to see how poorly they treated a developer who
> wished to update their name:
> https://cohost.org/arborelia/post/4968198-the-software-heritag
> https://cohost.org/arborelia/post/5052044-the-software-heritag

This is probably worth thinking about as Guix is in a similar situation
regarding publishing source code, and people potentially wanting to
change historical source code both in things Guix packages and Guix
itself.

Like Software Heritage, there's cryptographical implications for
rewriting the Git history and modifying source tarballs or nars that
contain source code.

We have 17TiB of compressed source code and built software stored for
bordeaux.guix.gnu.org now and we should probably work out how to handle
people asking for things to be removed or changed (for any and all
reasons).

It's probably worth working out our position on this in advance of
someone asking.


signature.asc
Description: PGP signature


Re: Concerns/questions around Software Heritage Archive

2024-03-16 Thread MSavoritias

On 3/16/24 17:52, Ian Eure wrote:


Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, and 
it struck me as rather a good idea.  However, I’ve seen some things 
lately which have soured me on them.


They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/


I was also distressed to see how poorly they treated a developer who 
wished to update their name: 
https://cohost.org/arborelia/post/4968198-the-software-heritag 
https://cohost.org/arborelia/post/5052044-the-software-heritag


GPL’d software I’ve created has been packaged for Guix, which I assume 
means it’s been included in SWH.  While I’m dealing with their (IMO: 
unethical) opt-out process, I likely also need to stop new copies from 
being uploaded again in the future.


Is there a way to indicate, in a Guix package, that it should *never* 
be included in SWH?


Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian



Oh no.

Apparently they have A.I. and blockchain besides being also transphobic.

Thanks for the heads up. That's all I needed to know to never touch 
whatever they are doing.



MSavoritias




Concerns/questions around Software Heritage Archive

2024-03-16 Thread Ian Eure

Hi Guixy people,

I’d never heard of SWH before I started hacking on Guix last fall, 
and it struck me as rather a good idea.  However, I’ve seen some 
things lately which have soured me on them.


They appear to be using the archive to build LLMs: 
https://www.softwareheritage.org/2024/02/28/responsible-ai-with-starcoder2/


I was also distressed to see how poorly they treated a developer 
who wished to update their name: 
https://cohost.org/arborelia/post/4968198-the-software-heritag 
https://cohost.org/arborelia/post/5052044-the-software-heritag


GPL’d software I’ve created has been packaged for Guix, which I 
assume means it’s been included in SWH.  While I’m dealing with 
their (IMO: unethical) opt-out process, I likely also need to stop 
new copies from being uploaded again in the future.


Is there a way to indicate, in a Guix package, that it should 
*never* be included in SWH?


Is there a way to tell Guix to never download source from SWH?

I want absolutely nothing to do with them.

Thanks,

 — Ian