Re: legal questions regarding machine learning models

2009-05-27 Thread Ben Finney
Mathieu Blondel  writes:

> * The model alone can be distributed under a free license.
> - As a consequence of this, neither the original data nor the program
> to build the model need to be free.

Going by the FSF definition of a free work, specifically freedom 1 and 3
http://www.gnu.org/philosophy/free-sw.html>, a necessary
precondition for a work to be free is for its recipients to have free
access to the source form of the work.

What does “the source form of the work” mean for these models?
Whatever the answer to that is, describes something that needs to be
freely available to every recipient, in order to consider the work free.

> * The DFSG is more restrictive and requires the source of any software
> in Debian.

The DFSG has different restrictions from the FSF definition, true. I
don't think it differs on this point though: free access to the source
form of the work is part of the definition of free software.

-- 
 \ “I got some new underwear the other day. Well, new to me.” —Emo |
  `\   Philips |
_o__)  |
Ben Finney


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Mathieu Blondel
On Thu, May 28, 2009 at 5:51 AM, Francesco Poli  wrote:

>> Afterall, a model is just a big set of numbers.
>
> Machine code is just a long sequence of 0s and 1s...

I knew someone would come up with this :-)

Let me summarize and please correct me if I'm wrong.

* The model alone can be distributed under a free license.
- As a consequence of this, neither the original data nor the program
to build the model need to be free.

* The DFSG is more restrictive and requires the source of any software
in Debian.
- If you consider that the model is the source like it was accepted
for a picture which is a 2D rendering of a 3D model, then you can
package the model directly.

- Otherwise, it is necessary that the data are included in the source
package and the tools to build the model are in Debian main.
-> To cope with models which take too long to compute, it should be
possible to ship a pre-built architecture-independent model together
with the data. However this doesn't solve the problem that the data
may be too large to be hosted in the archive.
-> If data size becomes a problem, then one could resort to use the
non-free archive in order to ship the model only.

Thank you,
Mathieu Blondel


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Appropriate use of debian-legal (was: legal questions regarding machine learning models)

2009-05-27 Thread Ben Finney
Steve Langasek  writes:

> [specific person]'s posts are an inappropriate use of this mailing
> list and not productive, and [they should] stop posting.

On what are you basing your judgement of “appropriate use of this
mailing list”? Can you give specific examples of posts you think are
inappropriate for this mailing list, and why those specific posts are
inappropriate, so that we can understand your position?

-- 
 \“To me, boxing is like a ballet, except there's no music, no |
  `\   choreography, and the dancers hit each other.” —Jack Handey |
_o__)  |
Ben Finney


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Francesco Poli
On Wed, 27 May 2009 11:37:56 +0200 Steve Langasek wrote:

> On Wed, May 27, 2009 at 10:33:52AM +0200, Josselin Mouette wrote:
> > > Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA).
> 
> > If you really feel the urge to add meaningless acronyms to all your
> > emails, please do so in your signature.
> 
> Better yet:  he should recognize that the reason he needs to add all these
> acronyms is because his posts are an inappropriate use of this mailing list
> and not productive, and stop posting.

You're not new to such impolite replies, and I don't think your
reputation benefits from them.

Anyway, if disagreeing with FTP masters and expressing one's own
opinion (while *explicitly* clarifying that what is expressed is just
one's own opinion, and not necessarily the official Debian position) is
an "inappropriate use of this mailing list", then I suggest that the
list is shut down as soon as possible and that debian-le...@l.d.o is
turned into a forwarder to ftpmas...@d.o ...
That way you have the guarantee that *no* reply from debian-le...@l.d.o
can possibly include heretic and sacrilegious opinions that dare to
disagree with the FTP masters!
I am not sure that the FTP masters would be overly happy to have to
deal with all the questions that are directed to debian-le...@l.d.o,
but one does not have to care about little details like these...


I used to think that the Debian Project cared about Free Software and
maybe even about free speech, but something apparently went
wrong...  :-(

-- 
 New location for my website! Update your bookmarks!
 http://www.inventati.org/frx
. Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4


pgpLvxuASEGTr.pgp
Description: PGP signature


Re: legal questions regarding machine learning models

2009-05-27 Thread Francesco Poli
On Wed, 27 May 2009 10:33:52 +0200 Josselin Mouette wrote:

> Le mercredi 27 mai 2009 à 00:36 +0200, Francesco Poli a écrit :
[...]
> > I instead think that FTP masters should change their minds about 2D
> > images rendered from 3D models.
> 
> I suggest you start your own distribution, in which you won’t ship:
>   * xfonts-* (bitmap renderings of non-free vector fonts)

Are you saying that xfonts-* are derived from non-free fonts?
How can they be DFSG-free, then?

>   * all icons shipped without SVG source

When an icon is actually created in SVG format, what's so strange about
insisting that its real source (i.e.: SVG) is shipped in the Debian
(main) source package?

>   * all pictures shipped without XCF/PSD source (oh yeah, that makes
> a lot)

Again, for pictures that are created in XCF format, the preferred form
for making modifications is the .xcf file, in most cases.
Why are you insisting that source-less works should be accepted in
Debian main?

>   * actually, all pictures that are initially photographs of an
> object (the preferred form of modification is the original
> object; if you want to see it at another angle, you need to take
> another photograph)

For photographs, the physical object is *not* the preferred form for
making modifications to the work, it's the preferred form for
*recreating* the work from scratch.
I think we have already had this discussion.  See
http://lists.debian.org/debian-legal/2008/12/msg00085.html

You may argue that the same reasoning applies to 3D models, but I think
the key difference stays in the word "preferred".  Since you cannot
transfer physical objects through a network, or copy & modify them, and
so forth, they are not preferred for making modifications to
photographs.  3D models are instead digital information that may well be
the preferred form for making modifications to a work.

Of course, in some cases, the huge size of a 3D model could well move
the preference to some other form.  As I said, it's always a
case-by-case decision, but not one that should be taken lightly, IMHO.

>   * all sound files shipped without the full genetic code of the
> speaker

As for photographs, I don't think that this is the actual source.

> 
> You could call it something like gNewSense, and you could discuss during
> hours with RMS how much better it is this way.

Naah, I disagree with RMS on a number of matters, so I don't think that
my "own distro" would be more similar to gNewSense, than to Debian...

> 
> > Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA).
> 
> If you really feel the urge to add meaningless acronyms to all your
> emails, please do so in your signature.

Not all my messages require the same set of disclaimers, if at all.


-- 
 New location for my website! Update your bookmarks!
 http://www.inventati.org/frx
. Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4


pgpcFXkmmIfEC.pgp
Description: PGP signature


Re: legal questions regarding machine learning models

2009-05-27 Thread Francesco Poli
On Wed, 27 May 2009 11:36:55 +0200 Mark Weyer wrote:

[...]
> Extremes: I do not agree with this classification of my view.
> I value a free game for the fact, that I can fool around with the source
> to make it "better". Adding features, levels, characters. If this means
> that I have to add long ears to some sprite (which is obviously generated
> from some 3D model), then I want to have access to that model and to the
> toolchain used to turn the model into the sprite. Because that is much
> more simple and robust, and creates a much more consistent set of sprite
> animation parts, than doing it with gimp on each part of each animation
> sequence individually. Free data is important for the very same reason
> that free programs are!

Exactly so.
I agree that this is the key aspect to take into account when talking
about this issue.

Unfortunately some people seem to think that getting more games (or
images, or music, or ...) is worth sacrificing the important
freedoms...  :-(

> 
> What to do: As always it is a tradeoff between quantity and quality, in
> this case of packages. Maintaining a high freeness standard has an impact
> on the resources needed, so it limits the number of costly packages that
> you can support for any given amount of available resources.
> I value Debian because (and as long as) it puts the emphasis on freeness.

100 % agreement here.
I also think that Debian *should* value Freeness standards over the mere
quantity of packages in main.

> 
> > PS:I'm CC'ing to the Debian Games Team mailing list.
> 
> Done as well, but I am not subscribed to that list.

Same here: I am subscribed to debian-legal, but not to
debian-devel-games.


-- 
 New location for my website! Update your bookmarks!
 http://www.inventati.org/frx
. Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4


pgp6qLCiuxHb9.pgp
Description: PGP signature


Re: legal questions regarding machine learning models

2009-05-27 Thread Francesco Poli
On Wed, 27 May 2009 11:25:09 +0900 Mathieu Blondel wrote:

> On Wed, May 27, 2009 at 7:36 AM, Francesco Poli wrote:
> 
> > I think that in the case of machine learning models, source form is
> > even more clearly distinct from compiled object.
> > We can consider an artificial neural network, for instance (Mathieu,
> > correct me if it's a wrong example).
> > I am under the impression that basically nobody would change connection
> > weights by hand, in order to modify a neural network.
> 
> Yes the connection weights of an artificial neural network are a good
> example of the parameters I was talking about. In practice, nobody
> would change a connection weight by hand because it's impossible to
> predict the effect of this particular weight on the overall
> performance of the model. Training algorithms are mostly clever ways
> to find a good model without trying the infinity of parameter
> combinations.

Good, this confirms my supposition.

> So in practice yes, a model would be barely useful for
> further work on the model without the original data. In that regard,
> the original data AND the program used to train the model (this
> includes the implementations and the options passed to the algorithm)
> can be seen as the only real source.

The program used to train the model is not necessarily part of the
source, IMHO.

The GNU GPL v3 states (in Section 1):

| However, it [the "Corresponding Source" for a work] does not include
| the work's System Libraries, or general-purpose tools or generally
| available free programs which are used unmodified in performing
| those activities [generate, install, and run the object, and modify
| the work] but which are not part of the work.

> 
> But yet again, I could pretend that I just happened to find the model
> parameters by hand.

Free Software is not about pretending you are a sort of oracle who can
guess magic numbers!
Otherwise, any source availability requirement would be moot: I could
always pretend I wrote the machine code by hand, but that won't be
true, in most cases.

> Afterall, a model is just a big set of numbers.

Machine code is just a long sequence of 0s and 1s...

[...]
> However, this is not good on the long
> term since that makes the model dependent on the person who holds the
> data.

Definitely.

[...]
> Is it forbidden for
> someone to release an image made with Photoshop as free?

You *can* create a DFSG-free image with Adobe Photoshop.

If the source form may be read and modified with DFSG-free tools (e.g.:
The Gimp), then everything is OK and the image may be included in
Debian main.

If, on the other hand, the source form of the image may *only* be
manipulated with Photoshop and other non-free tools, then I think that
the image may still be DFSG-free, but belongs in the Debian contrib
archive, at best.

At least, this is how I understand it.

> 
> Regarding Debian packaging, I think it's a wise decision to rebuild
> the model whenever the data and the training program are free, the
> data is not too large and the computation not too long. Should
> objective criterion of what is too large and what is too large be
> decided or should that be left to the DD? Then a remaining question is
> what to do with models for which we don't have the original data or
> the original training program?

My personal take on the matter is that, in order for a package to be
included in Debian main:

 * the package must comply with the DFSG

 * source must be distributed in the source package

 * tools needed to generate (or to use) the object must be DFSG-free
   and included in Debian main

This is how I interpret Policy 2.2.1:
http://www.debian.org/doc/debian-policy/ch-archive.html#s-main

However, it is my understanding that, in some cases (e.g. long
rebuilding times), it is acceptable to also ship pre-built
(architecture-independent) objects in the source package, *along with*
the corresponding source.  One should however be extremely careful in
doing this, since it makes it harder to check and be sure that Policy
2.2.1 requirements are satisfied.


I hope I clarified my opinions.
As stated before, I should stress again that what I expressed above are
my own opinions.
Usual disclaimers: IANAL, TINLA, IANADD, TINASOTODP.


-- 
 New location for my website! Update your bookmarks!
 http://www.inventati.org/frx
. Francesco Poli .
 GnuPG key fpr == C979 F34B 27CE 5CD8 DC12  31B5 78F4 279B DD6D FCF4


pgpRedXwYAn6s.pgp
Description: PGP signature


Re: legal questions regarding machine learning models

2009-05-27 Thread Mark Weyer

I know I should not reply to polemic posts because it is just one step
short of troll-feeding, but anyway:

> I suggest you start your own distribution, in which you won’t ship:
>   * xfonts-* (bitmap renderings of non-free vector fonts)

I agree that these do not belong in a free distribution. There should be
plenty of free alternatives, ness pah?

>   * all icons shipped without SVG source 
>   * all pictures shipped without XCF/PSD source (oh yeah, that makes
> a lot)

I would handle these on a case-by-case basis. For a 64x64 icon which has
no connection to other icons (apart from what can easily be done by copy
and paste), I would say the icon itself is just as good as its source.

For SVG: Yes, the ability to scale the icon to a new resolution is very
important.

I assume that your next move will be something like "But then, we cannot
ship GNOME or KDE!". I have seen such arguments before (don't know if it
was from you, though). This is just blackmail. In the same way you could
argue for the inclusion of .
And, personally, I do not care whether GNOME or KDE are in Debian.

>   * actually, all pictures that are initially photographs of an
> object (the preferred form of modification is the original
> object; if you want to see it at another angle, you need to take
> another photograph)
>   * all sound files shipped without the full genetic code of the
> speaker

You are being ridiculous on purpose. Source, as I understand it, is
always something digital.


> You could call it something like gNewSense, and you could discuss during
> hours with RMS how much better it is this way.

Just because GNU and RMS have similar views, that does not immediately
make the view invalid. This has to be judged on a case-by-case basis.


Best regards,

  Mark Weyer


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Steve Langasek
On Wed, May 27, 2009 at 10:33:52AM +0200, Josselin Mouette wrote:
> > Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA).

> If you really feel the urge to add meaningless acronyms to all your
> emails, please do so in your signature.

Better yet:  he should recognize that the reason he needs to add all these
acronyms is because his posts are an inappropriate use of this mailing list
and not productive, and stop posting.

-- 
Steve Langasek   Give me a lever long enough and a Free OS
Debian Developer   to set it on, and I can move the world.
Ubuntu Developerhttp://www.debian.org/
slanga...@ubuntu.com vor...@debian.org


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Mark Weyer
> > I agree with you. In particular, in many cases a single 3D model is used
> > to create many 2D images. If you don't have the model, you need to do
> > the modification many times.
> > And then there is the case of increasing the resolution...

> I don't know if it would be technically possible to go to that
> extremes. Having the source code of all the music and video intros for
> all the games, of all the sounds, could be probably 100 times bigger
> than the current archives. Well, you get the idea. I don't think it's
> a single package what we're talking about. I remember there was a
> thread some time ago on what would happen if we took the "having a
> whole free source and toolchain" when applied to music, and how it
> would be absolutely impossible to achieve, at least right now. Any
> idea on what to do in those situations?

That's a mixture of questions. I'll add my 2e-2 Euro to each separately.

Archive size: The case that I had in mind is that the data is purely
synthetic. In those cases the source form is negligibly small when compared
to the binary form. Especially in the cases you mention: Game intros
rendered from some 3D scene. Game music created from some music score.
Sounds which are programmed.
I assume that you have non-synthetic data in mind: Music which is actually
recorded, videos which are shot with real actors, sounds recorded from the
real world. And that what is shipped is a severely compressed form of the
original. In that case I guess one can argue that the source requirement
is void: I always understand source to be preferred form for modifications
among the digital forms of the software. The kind of modifications I see
for e.g. music (replace the violin player by someone who actually can play
the instrument; correct a discord which is due to a typo in the score) is
impossible to achieve without rerecording, so a big digital version of the
music is just as useless as a small one.

Building time: Coming back to purely synthetic data. building time can be
a real pain. Waiting 24 hours (on fast machines) for a build is fine for
me as upstream, but not something I would want to cause to your buildd when
my software is just one out of thousands of packages. There, I do see a
practical problem.
With my upstream hat on, I will continue to ship my data under licenses
that do require source, but I will not care whether you redo the building
or whether you just copy the precompiled data which I give you. Provided
of course, that you also ship the source.

Extremes: I do not agree with this classification of my view.
I value a free game for the fact, that I can fool around with the source
to make it "better". Adding features, levels, characters. If this means
that I have to add long ears to some sprite (which is obviously generated
from some 3D model), then I want to have access to that model and to the
toolchain used to turn the model into the sprite. Because that is much
more simple and robust, and creates a much more consistent set of sprite
animation parts, than doing it with gimp on each part of each animation
sequence individually. Free data is important for the very same reason
that free programs are!

What to do: As always it is a tradeoff between quantity and quality, in
this case of packages. Maintaining a high freeness standard has an impact
on the resources needed, so it limits the number of costly packages that
you can support for any given amount of available resources.
I value Debian because (and as long as) it puts the emphasis on freeness.

> PS:I'm CC'ing to the Debian Games Team mailing list.

Done as well, but I am not subscribed to that list.


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Miriam Ruiz
2009/5/27 Mark Weyer :
>> > This looks very similar to distributing a picture which is a 2D
>> > rendering of a 3D model without distributing the original model. This is
>> > already accepted in the archive, and the reason is that a 2D picture is
>> > its own source, and can serve as a base for modified versions this way.
>>
>> I disagree with this decision by the FTP masters.
>> I personally think that, in most cases, the 2D rendering is not the
>> actual source, since many modifications would be best made by changing
>> the 3D model and re-rendering the 2D image.
>
> I agree with you. In particular, in many cases a single 3D model is used
> to create many 2D images. If you don't have the model, you need to do
> the modification many times.
> And then there is the case of increasing the resolution...

I don't know if it would be technically possible to go to that
extremes. Having the source code of all the music and video intros for
all the games, of all the sounds, could be probably 100 times bigger
than the current archives. Well, you get the idea. I don't think it's
a single package what we're talking about. I remember there was a
thread some time ago on what would happen if we took the "having a
whole free source and toolchain" when applied to music, and how it
would be absolutely impossible to achieve, at least right now. Any
idea on what to do in those situations?

Greetings,
Miry

PS:I'm CC'ing to the Debian Games Team mailing list.


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Josselin Mouette
Le mercredi 27 mai 2009 à 00:36 +0200, Francesco Poli a écrit :
> > Of course, the decision is up to the FTP masters, but I think this
> > should be accepted for the sake of consistency with things we already
> > cannot decently exclude from the archive.
> 
> I instead think that FTP masters should change their minds about 2D
> images rendered from 3D models.

I suggest you start your own distribution, in which you won’t ship:
  * xfonts-* (bitmap renderings of non-free vector fonts)
  * all icons shipped without SVG source 
  * all pictures shipped without XCF/PSD source (oh yeah, that makes
a lot)
  * actually, all pictures that are initially photographs of an
object (the preferred form of modification is the original
object; if you want to see it at another angle, you need to take
another photograph)
  * all sound files shipped without the full genetic code of the
speaker

You could call it something like gNewSense, and you could discuss during
hours with RMS how much better it is this way.

> Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA).

If you really feel the urge to add meaningless acronyms to all your
emails, please do so in your signature.

-- 
 .''`.  Josselin Mouette
: :' :
`. `'   “I recommend you to learn English in hope that you in
  `- future understand things”  -- Jörg Schilling


signature.asc
Description: Ceci est une partie de message numériquement signée


Re: legal questions regarding machine learning models

2009-05-27 Thread Mark Weyer

> I mentioned Voxforge in my previous email. Their goal is to use their
> free spech data to train models with HTK and use the models with
> Julius. You can get the source code of HTK after registration on their
> website but the license has severe restrictions so HTK is not free
> software. Julius is a free software speech recognition engine that can
> use models trained with HTK. Note that HTK is pretty much THE speech
> recognition framework in the speech recognition community. If you
> consider that the ultimate source of a model is not only the data but
> also the software used to train it, then Voxforge models built with
> HTK can't be free, even though the data were free. Is it forbidden for
> someone to release an image made with Photoshop as free?

As I understand it, this depends on what you mean by "free". It is quite
possible to distribute these models under a free license, even under one
which requires distribution of source. The source code would then be the
Voxforge data plus the parameters given to HTK. It would not include the
source code of HTK, as HTK acts in this process like a compiler.
However, a corresponding Debian package would be in contrib at best (and
that only, if HTK can be shipped in non-free), because the package would
have a build-dependency on HTK.
I guess, in the long run your community needs a free replacement of HTK.

Again, this is only how I understand things.

Best regards,

  Mark Weyer


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org



Re: legal questions regarding machine learning models

2009-05-27 Thread Mark Weyer
> > This looks very similar to distributing a picture which is a 2D
> > rendering of a 3D model without distributing the original model. This is
> > already accepted in the archive, and the reason is that a 2D picture is
> > its own source, and can serve as a base for modified versions this way.
> 
> I disagree with this decision by the FTP masters.
> I personally think that, in most cases, the 2D rendering is not the
> actual source, since many modifications would be best made by changing
> the 3D model and re-rendering the 2D image.

I agree with you. In particular, in many cases a single 3D model is used
to create many 2D images. If you don't have the model, you need to do
the modification many times.
And then there is the case of increasing the resolution...

> Disclaimers, of course: IANADD, TINASOTODP (and IANAL, TINLA).

Same here.

Best regards,

  Mark Weyer


-- 
To UNSUBSCRIBE, email to debian-legal-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org