Christoph, You may have skipped parts of my message, so excuse me if I repeat a few lines. You quoted only two sentences and I slightly wonder if you genuinely read the whole. If I misread your critique, please help me and maybe quote the exact and detailed part where you disagree, not the introductive summary only.
> > A typical "learned" model, based on a ML algorithm and a substantial > > extract of OSM data: > > That seems like a Produced Work to me. > > Maybe i have not been clear enough with my comment - approaching this > matter based on gut feeling and wishful thinking (seems like...) > without considering the practical effects is a very bad idea. I stand by my initial assessment: In a **typical use case** for applying ML algorithms (not just replicate the training data in bulk), I consider Produced Work as the best fit. > You can design 'learning' algorithms to essentially replicate the > training data so to just sweepingly declare any output of algorithms as > having no copyright connection to training data is a recipe for desaster [...] If you replicate the original OSM data, in a substantial amount, this does not qualify any more. This is underlined in my first message under: "licence for the results (outputs): **provided there are an insubstantial extract or contain no OSM data** [...]" and "If the results (outputs) are used to create a new database that contains the whole or a substantial part of the contents of the OSM database, this new database would be considered a Derivative Database and would trigger share-alike obligations under section 4.4.b of the ODbL. [shameless plug of Geocoding guideline]" I don't think my original message can be read as "sweepingly declare any output of algorithms as having no copyright connection". I don't think we can have a fruitful discussion if you selectively read messages or redact some important parts. > is a recipe for desaster (if you subscribe to the spirit of the OdbL) or a > recipe for > success (if your goal is to abolish share-alike and attribution through > the back door - which of course many corporate OSM data users would > find highly desirable). No other comment on this section. > as also said concentrating exclusively on the produced work vs. derivative > database is not really helpful, > [snip] > that does not mean that the output of this algorithm, [...], is not a > derivative database. I consider your interpretation very similar to mine. I fail to see what you are criticizing. > If you need an example: Take a translator for geographic names [...]. > These names - in sufficient volume - evidently form a derivative database IMO I agree with you. Yet I don't see why you provide this example, or where you disagree with me. For the record: I find the geocoding example more interesting since it already has practical applications, it provides a parallel for the data process of ML and it comes with a Community-LegalWG guideline. > When considering this subject, maybe think of it less as a question of > copying data, think of it more as a process of mimicry. My final two cents: Take the Geocoding guideline, replace "Geocoding" by "Machine Learning" and this is, in my humble opinion, an acceptable first draft for discussion. Is this draft suitable, or is there any parts that do not hold against reality or practical effects? Is there a need to take into account the type of input and output data, and whether the output data is suitable for inclusion in a geographical database such as OSM? See also bits of the Horizontal Layers guideline, such as "If you improve data used in the OpenStreetMap layer, such as additions or factual corrections, then you need to share those improvements." Would they apply? How it could be extended for non-map products? -- althio _______________________________________________ legal-talk mailing list legal-talk@openstreetmap.org https://lists.openstreetmap.org/listinfo/legal-talk