Hi, Am 08.02.2016 um 10:44 schrieb Richard Eckart de Castilho: > On 08.02.2016, at 10:11, Peter Klügl <[email protected]> wrote: >> Hi, >> >> Am 07.02.2016 um 19:52 schrieb Richard Eckart de Castilho: >>> Checks: >>> - compared POMs in 2.3.0 svn tag against 2.4.0 tag: no new dependencies - OK >>> - the FirstNames.txt file in GermanNovels is quite large 90k, but no source >>> info/license for this file is given anywhere: doesn't seem OK >>> - stopping checks at this point for the moment >> What kind of source info/license would you expect? The file together >> with the other files was contributed as part of UIMA-3926 with an ICLA >> present. I do not remember if I knew the source of the file by then, but >> I remember that I had some conversations with the contributor that the >> files need to be OK for a contribution. That's the reason why the >> test/dev data was not contributed since it had some CC license that was >> problematic. > The other dev/test data doesn't seem problematic at all, but the 90k names > file seems non-trivial. If it were CC, the license would need to be mentioned > in a LICENSE.txt file. My suggestion would be to simply strip the file down > to the names needed for the example.
If I have to guess I'd say that the names have been crawled and that there is no original source file with a specific license. The novels had the CC license last time I checked. I do not remember all, but when I looked it up in Apache's third party pages, it indicated that it was not possible to include them. However, I could have been wrong. Hmm... it depends what is needed for the example. The initial example were 10-20 novels. I could strip it down to the firstnames of one novel I remember to be part of the dev set, but is that really necessary? >>> Questions: >>> - several files in the GermanNovels resources have the first word >>> duplicated, they also start with a BOM - necessary? >> The BOM is the reason why the wordlists contain duplicate entries in the >> beginning. This is an open issue [UIMA-3778]. The BOM is not necessary, >> but was simply not removed. > Ok, so this is on the radar. > >>> - is it necessary that the GermanNovels example contains >>> GeneratedDKPRoCoreTypes.xml - can these not be obtained through Maven? If >>> it is necessary, provenance information would be good. >> It was necessary when the rules were contributed, but it would be >> possible now with some new features. I do not have the time to upgrade >> the project (its priority is too low and it would require to change the >> tutorial). I could add provenance information. I assume that it should >> be in a README file but not in the NOTICE file, or is there an issue >> converning the DKPro type systems? > The DKPro Core type systems are covered by ASL (although for some reason > there are no ASL headers in the original DKPro Core XML files...). So > in principle there is no problem, and because the original files don't have > ASL headers, they also were not stripped by the aggregation process - again > no problem. > > My understanding is that staying strict to the Apache rules, the contents > of the NOTICE of the DKPro Core artifacts from which the types were > obtained would need to be copied into a NOTICE within the examples > project. If Ruta could obtain the types directly from the Maven artifacts, > the types file and NOTICE inclusion would not be necessary. > > If we all agree that this should be fixed for the next release after 2.4.0, > I'd be ok for me. I am making these comments definitely with the Apache > hat on, not with the DKPro Core hat on. I created an issue for it: https://issues.apache.org/jira/browse/UIMA-4789 Best, Peter >>> Comments: >>> - tutorial-GermanNovels.tex is written in German, not English. >> That was the target group of the tutorial and there was no volunteer to >> translate it. > Ok. > >>> So much so far. >>> >>> Cheers, >>> >>> -- Richard >>> >>>> On 29.01.2016, at 11:25, Peter Klügl <[email protected]> wrote: >>>> >>>> Hi, >>>> >>>> the third release candidate of Apache UIMA Ruta v2.4.0 is ready for voting. >>>> >>>> Changes rc2 -> rc3: >>>> - UIMA-4758 - Ruta: reluctant qualifier right to left lookahead to >>>> literal string expression matcher >>>> - UIMA-4768 - Ruta: generic argument for aliases type interpreted as >>>> generic feature expression >>>> >>>> Changes rc1 -> rc2: >>>> - UIMA-4760 - Ruta: duplicate verbalization of type in type matcher >>>> >>>> General information: >>>> This release contains many nice and useful new features and additionally >>>> fixed many annoying bugs. Here's a short overview of the main changes: >>>> - Explicit referencing of annotations with variables, labels and addresses >>>> - Helper methods for applying rules directly in Java code >>>> - Macros for conditions and actions (prototypical) >>>> - Limited support of UIMA arrays (prototypical) >>>> - New action for splitting annotations >>>> - New block for resetting match context >>>> - Import of uimaFIT analysis engines with manditory parameters >>>> - Many, many bug fixes and other improvements >>>> >>>> Staging repository: >>>> https://repository.apache.org/content/repositories/orgapacheuima-1083/ >>>> >>>> SVN tag: >>>> https://svn.apache.org/repos/asf/uima/ruta/tags/ruta-2.4.0 >>>> >>>> Update site: >>>> https://dist.apache.org/repos/dist/dev/uima/ruta-2.4.0-rc3/eclipse-update-site/ruta/ >>>> >>>> Archive with all sources: >>>> https://dist.apache.org/repos/dist/dev/uima/ruta-2.4.0-rc3/source-release/ruta-2.4.0-source-release.zip >>>> >>>> Overall 52 issues have been fixed for this release (one of them with >>>> "Cannot Reproduce"). >>>> They can be found in the RELEASE_NOTES.html. >>>> >>>> ... and here: >>>> >>>> https://issues.apache.org/jira/issues/?filter=12333870&jql=project%20%3D%20UIMA%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened%2C%20Resolved%2C%20Closed)%20AND%20fixVersion%20%3D%202.4.0ruta%20AND%20component%20%3D%20ruta%20ORDER%20BY%20priority%20DESC%2C%20updated%20ASC%2C%20created%20DESC >>>> >>>> Please vote on release: >>>> >>>> [ ] +1 OK to release >>>> [ ] 0 Don't care >>>> [ ] -1 Not OK to release, because ... >>>> >>>> Thanks. >>>> >>>> Peter
