Re: [OSM-legal-talk] OSM for training ML machines

althio Wed, 10 Apr 2019 06:03:54 -0700

Christoph,

You may have skipped parts of my message, so excuse me if I repeat a few lines.
You quoted only two sentences and I slightly wonder if you genuinely
read the whole.
If I misread your critique, please help me and maybe quote the exact
and detailed part where you disagree, not the introductive summary
only.



> > A typical "learned" model, based on a ML algorithm and a substantial
> > extract of OSM data:
> > That seems like a Produced Work to me.
>
> Maybe i have not been clear enough with my comment - approaching this
> matter based on gut feeling and wishful thinking (seems like...)
> without considering the practical effects is a very bad idea.

I stand by my initial assessment:
In a **typical use case** for applying ML algorithms (not just
replicate the training data in bulk), I consider Produced Work as the
best fit.


> You can design 'learning' algorithms to essentially replicate the
> training data so to just sweepingly declare any output of algorithms as
> having no copyright connection to training data is a recipe for desaster [...]

If you replicate the original OSM data, in a substantial amount, this
does not qualify any more.
This is underlined in my first message under:
"licence for the results (outputs): **provided there are an
insubstantial extract or contain no OSM data** [...]"
and
"If the results (outputs) are used to create a new database that
contains the whole or a substantial part of the contents of the OSM
database, this new database would be considered a Derivative Database
and would trigger share-alike obligations under section 4.4.b of the
ODbL. [shameless plug of Geocoding guideline]"

I don't think my original message can be read as "sweepingly declare
any output of algorithms as having no copyright connection".
I don't think we can have a fruitful discussion if you selectively
read messages or redact some important parts.


> is a recipe for desaster (if you subscribe to the spirit of the OdbL) or a 
> recipe for
> success (if your goal is to abolish share-alike and attribution through
> the back door - which of course many corporate OSM data users would
> find highly desirable).

No other comment on this section.


> as also said concentrating exclusively on the produced work vs. derivative 
> database is not really helpful,
> [snip]
> that does not mean that the output of this algorithm, [...], is not a 
> derivative database.

I consider your interpretation very similar to mine.
I fail to see what you are criticizing.


> If you need an example:  Take a translator for geographic names [...].
> These names - in sufficient volume - evidently form a derivative database IMO

I agree with you.
Yet I don't see why you provide this example, or where you disagree with me.

For the record: I find the geocoding example more interesting since it
already has practical applications, it provides a parallel for the
data process of ML and it comes with a Community-LegalWG guideline.


> When considering this subject, maybe think of it less as a question of
> copying data, think of it more as a process of mimicry.

My final two cents:
Take the Geocoding guideline, replace "Geocoding" by "Machine
Learning" and this is, in my humble opinion, an acceptable first draft
for discussion.
Is this draft suitable, or is there any parts that do not hold against
reality or practical effects?
Is there a need to take into account the type of input and output
data, and whether the output data is suitable for inclusion in a
geographical database such as OSM?
See also bits of the Horizontal Layers guideline, such as "If you
improve data used in the OpenStreetMap layer, such as additions or
factual corrections, then you need to share those improvements." Would
they apply? How it could be extended for non-map products?

-- althio

_______________________________________________
legal-talk mailing list
legal-talk@openstreetmap.org
https://lists.openstreetmap.org/listinfo/legal-talk

Re: [OSM-legal-talk] OSM for training ML machines

Reply via email to