Ok, I'll proceed with the review process then. -- Richard
> On 09.02.2016, at 16:32, Marshall Schor <[email protected]> wrote: > > I agree with this analysis; I think this is minimal risk. > > -Marshall > > On 2/9/2016 4:24 AM, Peter Klügl wrote: >> He crawled it from this site [1] and then he modified the result by >> removing entries or single letters. >> >> I do not see any license notice. Is this a good or bad sign for us? >> >> IANAL (and actually do not know much about it) but I would assume that >> it is not problematic. There is no specific source file and the owner >> probably cannot call copyright for single firstnames. >> >> Best, >> >> Peter >> >> [1] http://www.vornamen-liste.de/ >> >> Am 09.02.2016 um 10:17 schrieb Peter Klügl: >>> I additionally sent an email to the last address I know. >>> >>> Am 08.02.2016 um 22:26 schrieb Richard Eckart de Castilho: >>>> The problem I see is that we currently do not know where the file comes >>>> from >>>> (provenance). I find it hard to believe that the file was an original >>>> creation >>>> from Stefan. I believe that it could take quite some time to compile such a >>>> list of names. More likely is in my opinion, that the file was obtained >>>> from >>>> some third-party source. >>>> >>>> If we knew that third-party source, we might easily be able to clear IP. >>>> >>>> Since we do not know it, we currently have to resort to speculation about >>>> the >>>> lawfulness of compiling specialized unigram lists. >>>> >>>> It looks like we can agree this is not a blocker for the present release as >>>> involved risk is apparently very low. Still, we should try to clear this. >>>> >>>> I've placed a comment on UIMA-3926 asking Stefan to shed some light on the >>>> provenance of the file. Let's see what comes of it. >>>> >>>> Thanks for digging up the issue number Marschall! >>>> >>>> Cheers, >>>> >>>> -- Richard >>>> >>>>> On 08.02.2016, at 21:56, Marshall Schor <[email protected]> wrote: >>>>> >>>>> So, first I'd like to summarize, in case I don't fully understand the >>>>> issue. >>>>> >>>>> Ruta contains some examples; the example data include 90K file >>>>> FirstNames.txt, >>>>> in example-projects/GermanNovels/reosources. >>>>> >>>>> From what I can see, there are no actual German Novels included in the >>>>> example-project/GermanNovels. >>>>> >>>>> From the discussion, it seems the word lists were not originally part of >>>>> the >>>>> contribution; but a comment in UIMA-3926 Peter asks if the word list >>>>> could be >>>>> contributed, but not the novels, and Stefan then contributed them. >>>>> >>>>> I am not a lawyer, so this is not a legal opinion, but I did a quick >>>>> internet >>>>> search and believe that compiling a list of words used in a novel does not >>>>> infringe the copyright in that novel, because this data is entirely >>>>> independent >>>>> of the expressive value of any of the underlying sources that might have >>>>> been >>>>> used to compile the list; and the list has lost any similarity to the >>>>> underlying >>>>> sources in terms of things like plot, theme, etc. >>>>> >>>>> So I think the risk is low. We could probably reduce the risk by asking >>>>> Stephan >>>>> where these lists came from, and if he is aware of any IP issues with >>>>> them. >>>>> >>>>> To the extent that we collect information and form opinions on issues >>>>> like this, >>>>> I recommend adding a file to the SVN, not necessarily included in the >>>>> build, >>>>> called something like license-notice-research.txt, just to record these >>>>> things >>>>> in one place, so we can find it quickly if a question comes up later and >>>>> we want >>>>> to remember what and why we did something. >>>>> >>>>> -Marshall >>>>> >>>>> >>>>> On 2/8/2016 5:21 AM, Richard Eckart de Castilho wrote: >>>>>> On 08.02.2016, at 11:11, Peter Klügl <[email protected]> wrote: >>>>>>> Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho: >>>>>>>> On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho: >>>>>>>>>> Checks: >>>>>>>>>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new >>>>>>>>>> dependencies - OK >>>>>>>>>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no >>>>>>>>>> source info/license for this file is given anywhere: doesn't seem OK >>>>>>>>>> - stopping checks at this point for the moment >>>>>>>>> What kind of source info/license would you expect? The file together >>>>>>>>> with the other files was contributed as part of UIMA-3926 with an ICLA >>>>>>>>> present. I do not remember if I knew the source of the file by then, >>>>>>>>> but >>>>>>>>> I remember that I had some conversations with the contributor that the >>>>>>>>> files need to be OK for a contribution. That's the reason why the >>>>>>>>> test/dev data was not contributed since it had some CC license that >>>>>>>>> was >>>>>>>>> problematic. >>>>>>>> The other dev/test data doesn't seem problematic at all, but the 90k >>>>>>>> names >>>>>>>> file seems non-trivial. If it were CC, the license would need to be >>>>>>>> mentioned >>>>>>>> in a LICENSE.txt file. My suggestion would be to simply strip the file >>>>>>>> down >>>>>>>> to the names needed for the example. >>>>>>> If I have to guess I'd say that the names have been crawled and that >>>>>>> there is no original source file with a specific license. >>>>>>> >>>>>>> The novels had the CC license last time I checked. I do not remember >>>>>>> all, but when I looked it up in Apache's third party pages, it indicated >>>>>>> that it was not possible to include them. However, I could have been >>>>>>> wrong. >>>>>>> >>>>>>> Hmm... it depends what is needed for the example. The initial example >>>>>>> were 10-20 novels. I could strip it down to the firstnames of one novel >>>>>>> I remember to be part of the dev set, but is that really necessary? >>>>>> Let's see what Marshall thinks about it. >>>>>> >>>>>> -- Richard >> >
