The first pass at separating the umls resources from ASF is ready... Basically, developers can just pick and choose the ctakes resources by artifictid now.
Details: The below steps had to be done: 1) UMLS resource project(s) are left behind no sourceforge under new projects: http://svn.code.sf.net/p/ctakesresources/code/trunk/ [New account and space created for net.sourceforge.ctakesresources] 2) New modules deployed to oss sonatype and maven central: https://oss.sonatype.org/index.html#nexus-search;quick~ctakesresources [New account and space created for net.sourceforge.ctakesresources] 3) The appropriate ctakes modules a.k.a ctakes-dictionary-lookup/pom.xml now just needs to include: <dependency> <groupId>net.sourceforge.ctakesresources</groupId> <artifactId>ctakes-resources-umls2011ab</artifactId> <version>3.0.0</version> </dependency> 4) Finally to make it transparent for developers, added the maven-dependency-plugin:unpack-dependencies to unzip them into target. This is because things like Lucene need them to be unpacked files rather than within a jar. 4a) End users could just download the resources zip file from https://sourceforge.net/projects/ctakesresources/files/ and add it to their resources folder and provide their umls username/pw during execution. Note: Only the umls resources have been separated now due to the ASF licensing incompatibilities, but other projects should be able to do the same using this mechanism. --Pei > -----Original Message----- > From: Jörn Kottmann [mailto:[email protected]] > Sent: Monday, November 05, 2012 7:42 AM > To: [email protected] > Subject: Re: [DISCUSS] What should we do with cTAKES resources? > > In my opinion we should release what we can from here at Apache and only > the resources which have an incompatible license need to be handled > differently, e.g. external site. > > Models which are trained on private clinical data can be released as long as > the original creator decides to license them under AL 2.0. If that is done by > a > committer it should be fine to just check them in or put them on the website. > > The wikipedia license is compatible and an index of it as well, but we > probably need to have attributio for it in a NOTICE file, and maybe include > the license in the LICENSE file. > > Jörn > > On 11/02/2012 10:46 PM, Chen, Pei wrote: > > I think we postponed this topic previously and since the ASF code seems to > be in decent shape now, I think it's time to revisit this discussion for the > longer term. > > Currently, we have the below resources bundled with our source code > > and distribution > > > > - UMLS dictionaries (hsqldb format and in lucene indexes) > > > > - Models (which were okay be to release opened source) that have > been train from various clinical data > > > > - Wikipedia index > > > > What are our options as ASF source code, binaries, models, > > dependencies all need to be compliant with ASL 2.0 > > (http://www.apache.org/legal/3party.html) > > > > 1) Leave things as they are, but we need to confirm with the sources > > and > also will probably need to seek approval from Apache Legal for each of the > resources > > > > 2) Host the resources externally such as SourceForge similar to OpenNLP > models (http://opennlp.sourceforge.net/models-1.5/) > > > > a. Single zip per release for users to download? > > > > Option 2 seems the least painful in terms of compliance. > > Since 3.0.0-incubating, each resource has a fully qualified name/path and is > read from the classpath so it should be fairly easy if we decided to pull it > in > from external sources. > > > > --Pei > > > >
