Re: Brief update about software freedom and artificial intelligence

2023-02-27 Thread Paul Wise
On Mon, 2023-02-27 at 01:45 +0100, Roberto A. Foglietta wrote:

> Because the principle of the copyright existence is about protecting
> the authors' exclusive of that {business, commercial, marketing}
> rights.

The purpose of copyright is allegedly (in the USA) "To promote the
Progress of Science and useful Arts, by securing for limited Times to
Authors and Inventors the exclusive Right to their respective Writings
and Discoveries.". The author's rights are *secondary*, which is why
fair use exists. Of course these days copyright isn't very time-limited
and in the age of DCMA/DRM, video mashups, fanfic and supporter-funded
creators, copyright just ends up limiting progress in many fields,
another reason why fair use is important. Making fair use only
available for non-commercial uses would almost destroy it.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise


signature.asc
Description: This is a digitally signed message part


Re: Brief update about software freedom and artificial intelligence

2023-02-27 Thread Roberto A. Foglietta
On Mon, 27 Feb 2023 at 19:08, Russ Allbery  wrote:

>
> No.  It's entirely possible that using databases as training sets for an
> AI/ML engine is fair use under existing United States law and precedent as
> long as that use is sufficiently transformative (the first factor of the
> test, and I suspect the most important one here).

Considering what you reported in the previous e-mail about US national
law in 17 U.S.C. § 107 in 1976, It is not possible to use an entire or
a significant portion of a database for {business, commercial,
marketing} purposes without the copyright holder.

Whoever says the contrary forgot that fair use has been introduced to
allow those non-profit activities which have a social value plus few
profit activities (like journalism) that have a social role but the
former could use a very limited portion of copyrighted work. Very
simple and straightforward example is a newspaper article that cites a
couple of paragraphs from a book or some statistical data from a
private database. There is no chance that the incorporation of an
entire database (or a significant part of it) would enter into fair
use for {business, commercial, marketing} purposes otherwise the
principle of copyright would be gone.

I strongly feel that this discussion cannot continue because the
presentation of a mass of legal stuff without a comprehension of the
law principles would lead nowhere more than a show like some US trials
are. Principles cannot be bend by misinterpretation, misjudgement and
ill-written law like US national law in 17 U.S.C. § 107 in 1976 in
which point (1)...(4) are written in such a way that everyone that is
not very acknowledged about principles could misunderstand up to
absurdity.

This (1) does not mean that non-profit and for-profit activities are
equal in enjoy the fair use

(1) the purpose and character of the use, including whether such use
is of a commercial nature or is for nonprofit educational purposes;

but it means the opposite, that the two activities can fair-use a
completely different amount of the copyrighted work

   (3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole

and in particular the (3) also means that if I write an article of a
few words, it is not fair-use 2 paragraphs of a book.

One more thing: it does not matter that two parties had N trials
settled but the agreement they had at the end - principle - because a
significant judgement is a definitive one otherwise it means that it
was not significant enough even to close that specific case.

> The obvious example is
> a search engine, which performs a similar transformation of clearly
> copyrighted works into a new service with a different purpose, without the
> explicit permission of the copyright holders.

This is another completely story for two reasons:

1. indexing by keywords - the website manager tagged that keyword, so
the content has not been accessed
2. web crawling is an automatic process that do a keyword
identification and associate them to the url

This process has nothing to do with the content unless you would
affirm that the word "cataclysm" cannot be used because it belongs to
a certain copyrighted book and moreover this process is completely
automated in which no human creativity has been involved. Moreover,
indexing and web crawling are totally different processes that lead to
totally different results and aims than those related to an AI
training. Forget to make an analogy between AI training and Google
business because they are completely different things.

>
> This is the reason why people have focused so much on GitHub Copilot's
> willingness to insert large blocks of code from other projects verbatim.
> Reproducing code from other projects is less transformative and looks more
> like simple copying, and therefore opens GitHub to a legal argument that
> their AI model is not sufficiently transformative to be fair use.

Transformative is not the key, incorporating large pieces of code is
not the key. This is the peak of the iceberg for which people realised
that their code has been used. The iceberg to handle is the learning
process before it happens which is about the input collection. Here we
are: the input collection of an AI/ML training system is what we want
to keep free. Why do we want to keep the input collection? Because
like in compilation we also have the entire model in freedom. This in
exchange for the right to use our code as input data.

I am pretty sure that those complaining about GitHub Copilot are not
upset because the AI is not transformative enough to masquerade their
code!

Best regards, R-



Re: Brief update about software freedom and artificial intelligence

2023-02-27 Thread Russ Allbery
"Roberto A. Foglietta"  writes:

> - then I decided to protect my projects repositories as database
> (collection) in addition to the standard way to protect the code with
> a well-known license

> - because of the copyright law about databases, if someone creates a
> larger database that contains my database or a part of it, then they
> have to comply with the license that I choose to protect my project as
> a database.

In the United States, this is only true if (a) the collection is
copyrightable (let's presume that's true in this case), and (b) their use
of your collection is not fair use.  If their use of your collection is
fair use, then they do not have to comply with your license.

In other countries, I have no idea.  Presumably there is a similar set of
rules under the same or different terms to allow such things as parodies,
but the boundaries may be different and I know very little about how those
rules have been applied to software outside of the US.  My understanding
is the Berne Convention doesn't standardize the rules around fair use
(under whatever name), so this can differ a lot by jurisdiction.

> You see, it is a very simple and straightforward concept. The only two
> ways to get off this are 1. make unlawful the database copyright law,
> 2. make a law for which the training input collection is not coverable
> by the copyright law. In both cases every employer can bring to their
> home a copy of a database or a copy of AI training inputs and share it
> with all the rest of the world. Moreover, the 1. includes the 2 while
> the 2. would seriously undermine the database copyright law because
> every database could be a training set for an AI/ML engine.

> Russ, do you agree? :-)

No.  It's entirely possible that using databases as training sets for an
AI/ML engine is fair use under existing United States law and precedent as
long as that use is sufficiently transformative (the first factor of the
test, and I suspect the most important one here).  The obvious example is
a search engine, which performs a similar transformation of clearly
copyrighted works into a new service with a different purpose, without the
explicit permission of the copyright holders.

This is the reason why people have focused so much on GitHub Copilot's
willingness to insert large blocks of code from other projects verbatim.
Reproducing code from other projects is less transformative and looks more
like simple copying, and therefore opens GitHub to a legal argument that
their AI model is not sufficiently transformative to be fair use.

-- 
Russ Allbery (r...@debian.org)  



Re: Brief update about software freedom and artificial intelligence

2023-02-27 Thread Russ Allbery
"Roberto A. Foglietta"  writes:
> On Mon, 27 Feb 2023 at 07:16, Russ Allbery  wrote:

>> This is definitely not true in the United States; there is a Supreme
>> Court decision saying the exact opposite.  The ruling in Google
>> v. Oracle said Google's commercial and business use of Oracle's
>> copyrighted APIs met the test for fair use.

> It is true despite a single US case judgment.

It's not a single US court judgment.  The standard for fair use in the
United States was created by a series of Supreme Court judgments starting
with Folsom v. Marsh in 1841 and enshrined in US national law in 17
U.S.C. § 107 in 1976:

Notwithstanding the provisions of sections 106 and 106A, the fair use
of a copyrighted work, including such use by reproduction in copies or
phonorecords or by any other means specified by that section, for
purposes such as criticism, comment, news reporting, teaching
(including multiple copies for classroom use), scholarship, or
research, is not an infringement of copyright. In determining whether
the use made of a work in any particular case is a fair use the
factors to be considered shall include—

(1) the purpose and character of the use, including whether such use
is of a commercial nature or is for nonprofit educational purposes;

(2) the nature of the copyrighted work;

(3) the amount and substantiality of the portion used in relation to
the copyrighted work as a whole; and

(4) the effect of the use upon the potential market for or value of
the copyrighted work.

The fact that a work is unpublished shall not itself bar a finding of
fair use if such finding is made upon consideration of all the above
factors.

You can find this history numerous places on-line, for example:

https://law.marquette.edu/facultyblog/2022/10/the-surprisingly-confused-history-of-fair-use-is-it-a-limit-or-a-defense-or-both/

Many fair use cases in US history have been about commercial use.
Probably most, since companies with commercial uses are more likely to go
through the trouble of lawsuits.  Commercial fair use is routine within
the classic examples of fair use, such as parody and quoting for
commentary.

This is the law in the United States.  The law in other countries of
course may be quite different.  But given that many of the actors who are
relevant to a discussion of large AI models at present have a significant
locus in the United States, US law is going to play a large role.

> No court ruling was ever emitted in favour of Google vs Oracle
> leveraging fair use but it was an agreement between the two parties
> supported by Microsoft.

This is not correct summary of the outcome of Google v. Oracle, nor is it
what the Ars Technica article you liked said.  There was no agreement
between the parties in the question before the Supreme Court.  The case
went to judgment and the Supreme Court ruled in favor of Google on fair
use grounds, mooting (and not ruling on) the question of copyrightability
of the API definitions.

Appeals like this in the US are generally over a specific question of law
and do not settle the *entire case*, so the Supreme Court then remanded
the case to trial court to dispose of the rest of the lawsuit.  I didn't
follow it after that because the details following the Supreme Court
decision are generally uninteresting since they're probably forced by the
decision.  It's quite possible that the parties mutually agreed to dismiss
the case after that decision because the decision meant Google was certain
to win.  But the Supreme Court decision was not an agreement between
parties.

This is important because in US law if the parties had reached an
agreement before the decision, the case would generally be dismissed and
thus not receive a court judgment and therefore not create precedent.
Google v. Oracle did not settle; it was decided by the Supreme Court and
therefore did create binding precedent for further district court
decisions on similar cases.

> I can reconstruct the interpretation of a law from basic principles
> otherwise it would not be a law but something that appeared from
> nothing: no any law roots, no any law authority.

If this is your approach to legal analysis, I think I will stop here,
since any further discussion along these lines is going to be pointless.

> Moreover, it does not matter how fair use is defined in many different
> legislations around the world. By copyright principle, it cannot allow
> doing activities like {business, commercial, marketing} without the
> consent of the author or of the license.

This is simply not true, and it is very good for free softawre that this
is not true.  One is still allowed to do reverse engineering and API
replacement under fair use even if one is doing it for business and
commercial purposes, and lots of free software development is done for
business and commercial purposes.

-- 
Russ Allbery (r...@debian.org)  

Re: From kali to debian

2023-02-27 Thread Marc Haber
On Mon, Feb 27, 2023 at 02:46:07PM +0700, Arnaud Rebillout wrote:
> In any case, answer is No. Kali Linux and Debian are different Linux
> distributions. If you want to install Debian, you need to download the
> Debian installer and install Debian on your disk, therefore erasing your
> previous Kali Linux installation. You can't jump from one distro to the
> other.

Kali is somewhat based on Debian, so changing from Kali to a Debian
distribution that is NEWER than the Debian that this Kali version is
based on should be technically possible.

This however needs vast expertise in Debian and might result in a
package combination that nobody tested, making it difficult to support
that installation. Additionally, this crosscrade path might end up on
Debian unstable, making it necessary to wait at least one Debian release
cycle to eventually end up on Debian stable.

In a nutshell, if you have to ask whether it's possible, chances are big
that your technical skills are not sufficient to end up in a useable and
supportable Debian installation. Hence, my advice to the original poster
is "don't do this".

Of course it is possible to move over the home directory's contents to
the new installation. Best restore from a backup (hint! hint!) and don't
restore the dotfiles.

Greetings
Marc

-- 
-
Marc Haber | "I don't trust Computers. They | Mailadresse im Header
Leimen, Germany|  lose things."Winona Ryder | Fon: *49 6224 1600402
Nordisch by Nature |  How to make an American Quilt | Fax: *49 6224 1600421



Welcome new Debian Developers: sahil, jru

2023-02-27 Thread Jonathan Carter

Greetings!

Congratulations and welcome to the following new Debian
Developers who have completed the NM process, and are now full project 
members:


* Sahil Dhiman 
* Jakub Ružička 

Thank you for your contributions to Debian!

-Jonathan, Debian Project Leader


OpenPGP_0xB01D1A72AC8DC9A1.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: Brief update about software freedom and artificial intelligence

2023-02-27 Thread Roberto A. Foglietta
On Mon, 27 Feb 2023 at 08:50, Russ Allbery  wrote:
>
> "Roberto A. Foglietta"  writes:
>
> > A totally automatic procedure like web crawling and web indexing
> > re-enter in your example, perfectly. However, the input collection that
> > a ML/AI training system needs is a protectable work because the data
> > should be structured, selected and properly labeled even if these
> > activities are done with rules like it happens using SQL for
> > databases.
>
> Yes, I agree, I think that a trained AI model is a protectable work.
> However, it is not protectable *by you* unless you're the one who wrote
> the model and chose its training.
>
> Therefore, putting a clause in your copyright license saying that if your
> work is incorporated into an AI model, that AI model as a collection is
> covered by some particular license is not really a thing you can do.  The
> best you can do is the standard GPL thing of saying that you don't have to
> license your collection under any particular license, but if you don't,
> you don't have any right to include this specific work.  Maybe that's what
> you were getting at, and I just didn't understand.
>

Dear Russ, I was completely wrong about your ability to contribute to
this discussion because the chance you gave me to confute your thesys
is the best occasion to pave the way to the lawyers that will one day
enforce the A/L/GPLv4 in a court. So, let me explain it in a very
simple and straightforward way:

- A/L/GPLv3 applies to source code and scripts that should be compiled
or run by an interpreter

- the AI/ML training engines use source code and scripts as data, this
might or might not be a fair use, but for sure is a novelty which is
not covered by A/L/GPLv3

- then I decided to protect my projects repositories as database
(collection) in addition to the standard way to protect the code with
a well-known license

- because of the copyright law about databases, if someone creates a
larger database that contains my database or a part of it, then they
have to comply with the license that I choose to protect my project as
a database.

You see, it is a very simple and straightforward concept. The only two
ways to get off this are 1. make unlawful the database copyright law,
2. make a law for which the training input collection is not coverable
by the copyright law. In both cases every employer can bring to their
home a copy of a database or a copy of AI training inputs and share it
with all the rest of the world. Moreover, the 1. includes the 2 while
the 2. would seriously undermine the database copyright law because
every database could be a training set for an AI/ML engine.

Russ, do you agree? :-)

Best regards, R-