Ok, I'll proceed with the review process then.

-- Richard

> On 09.02.2016, at 16:32, Marshall Schor <[email protected]> wrote:
> 
> I agree with this analysis; I think this is minimal risk.
> 
> -Marshall
> 
> On 2/9/2016 4:24 AM, Peter Klügl wrote:
>> He crawled it from this site [1] and then he modified the result by
>> removing entries or single letters.
>> 
>> I do not see any license notice. Is this a good or bad sign for us?
>> 
>> IANAL (and actually do not know much about it) but I would assume that
>> it is not problematic. There is no specific source file and the owner
>> probably cannot call copyright for single firstnames.
>> 
>> Best,
>> 
>> Peter
>> 
>> [1] http://www.vornamen-liste.de/
>> 
>> Am 09.02.2016 um 10:17 schrieb Peter Klügl:
>>> I additionally sent an email to the last address I know.
>>> 
>>> Am 08.02.2016 um 22:26 schrieb Richard Eckart de Castilho:
>>>> The problem I see is that we currently do not know where the file comes 
>>>> from
>>>> (provenance). I find it hard to believe that the file was an original 
>>>> creation
>>>> from Stefan. I believe that it could take quite some time to compile such a
>>>> list of names. More likely is in my opinion, that the file was obtained 
>>>> from
>>>> some third-party source. 
>>>> 
>>>> If we knew that third-party source, we might easily be able to clear IP.
>>>> 
>>>> Since we do not know it, we currently have to resort to speculation about 
>>>> the
>>>> lawfulness of compiling specialized unigram lists.
>>>> 
>>>> It looks like we can agree this is not a blocker for the present release as
>>>> involved risk is apparently very low. Still, we should try to clear this.
>>>> 
>>>> I've placed a comment on UIMA-3926 asking Stefan to shed some light on the
>>>> provenance of the file. Let's see what comes of it.
>>>> 
>>>> Thanks for digging up the issue number Marschall!
>>>> 
>>>> Cheers,
>>>> 
>>>> -- Richard
>>>> 
>>>>> On 08.02.2016, at 21:56, Marshall Schor <[email protected]> wrote:
>>>>> 
>>>>> So, first I'd like to summarize, in case I don't fully understand the 
>>>>> issue.
>>>>> 
>>>>> Ruta contains some examples; the example data include 90K file 
>>>>> FirstNames.txt,
>>>>> in example-projects/GermanNovels/reosources.
>>>>> 
>>>>> From what I can see, there are no actual German Novels included in the
>>>>> example-project/GermanNovels.
>>>>> 
>>>>> From the discussion, it seems the word lists were not originally part of 
>>>>> the
>>>>> contribution; but a comment in UIMA-3926 Peter asks if the word list 
>>>>> could be
>>>>> contributed, but not the novels, and Stefan then contributed them.
>>>>> 
>>>>> I am not a lawyer, so this is not a legal opinion, but I did a quick 
>>>>> internet
>>>>> search and believe that compiling a list of words used in a novel does not
>>>>> infringe the copyright in that novel, because this data is entirely 
>>>>> independent
>>>>> of the expressive value of any of the underlying sources that might have 
>>>>> been
>>>>> used to compile the list; and the list has lost any similarity to the 
>>>>> underlying
>>>>> sources in terms of things like plot, theme, etc.
>>>>> 
>>>>> So I think the risk is low.  We could probably reduce the risk by asking 
>>>>> Stephan
>>>>> where these lists came from, and if he is aware of any IP issues with 
>>>>> them.
>>>>> 
>>>>> To the extent that we collect information and form opinions on issues 
>>>>> like this,
>>>>> I recommend adding a file to the SVN, not necessarily included in the 
>>>>> build,
>>>>> called something like license-notice-research.txt, just to record these 
>>>>> things
>>>>> in one place, so we can find it quickly if a question comes up later and 
>>>>> we want
>>>>> to remember what and why we did something.
>>>>> 
>>>>> -Marshall
>>>>> 
>>>>> 
>>>>> On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote:
>>>>>> On 08.02.2016, at 11:11, Peter Klügl <[email protected]> wrote:
>>>>>>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho:
>>>>>>>> On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote:
>>>>>>>>> Hi,
>>>>>>>>> 
>>>>>>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho:
>>>>>>>>>> Checks:
>>>>>>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new 
>>>>>>>>>> dependencies - OK
>>>>>>>>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no 
>>>>>>>>>> source info/license for this file is given anywhere: doesn't seem OK
>>>>>>>>>> - stopping checks at this point for the moment
>>>>>>>>> What kind of source info/license would you expect? The file together
>>>>>>>>> with the other files was contributed as part of UIMA-3926 with an ICLA
>>>>>>>>> present. I do not remember if I knew the source of the file by then, 
>>>>>>>>> but
>>>>>>>>> I remember that I had some conversations with the contributor that the
>>>>>>>>> files need to be OK for a contribution. That's the reason why the
>>>>>>>>> test/dev data was not contributed since it had some CC license that 
>>>>>>>>> was
>>>>>>>>> problematic.
>>>>>>>> The other dev/test data doesn't seem problematic at all, but the 90k 
>>>>>>>> names
>>>>>>>> file seems non-trivial. If it were CC, the license would need to be 
>>>>>>>> mentioned
>>>>>>>> in a LICENSE.txt file. My suggestion would be to simply strip the file 
>>>>>>>> down
>>>>>>>> to the names needed for the example.
>>>>>>> If I have to guess I'd say that the names have been crawled and that
>>>>>>> there is no original source file with a specific license.
>>>>>>> 
>>>>>>> The novels had the CC license last time I checked. I do not remember
>>>>>>> all, but when I looked it up in Apache's third party pages, it indicated
>>>>>>> that it was not possible to include them. However, I could have been 
>>>>>>> wrong.
>>>>>>> 
>>>>>>> Hmm... it depends what is needed for the example. The initial example
>>>>>>> were 10-20 novels. I could strip it down to the firstnames of one novel
>>>>>>> I remember to be part of the dev set, but is that really necessary?
>>>>>> Let's see what Marshall thinks about it.
>>>>>> 
>>>>>> -- Richard
>> 
> 

Reply via email to