Hi Rodrigo I took you mail from the contact section of the Github. Feel free to forward this to the whole team if you like.
First I want to thank the whole IXA pipes team team for providing all those extensions and especially high quality NER models for OpenNLP. As a German speaking person I especially enjoy using the German NER model a lot. With this message I want let you know about the integration of the IXA pipe nerc module with Apache Stanbol (see STANBOL-1422 [1]). The mein contribution of this Integration is code that allows to use your models for and extensions to OpenNLP to be used in applications running within an OSGI environment (such as Apache Stanbol). The reminder of this (long) mail will provide details on how to make IXA pipes nerc wok within an OSGI environment. - - - OpenNLP as such does support OSGI. However for extensions to be useable in OSGI they need as well support OSGI. As the documentation of OpenNLP on how to achieve this is very sparse I will provide all the necessary information for doing so in this mail based on the example of integrating the `eus.ixa:ixa-pipe-nerc` module with Apache Stanbol. Typically OpenNLP will load those extensions via the Classpath baed on information provided by the `manifest.properties` contained in the `{model}.bin` file. Here an example taken from the `it-clusters-evalita09.bin` file of the 1.5.0 release: Training-Eventhash=d41d8cd98f00b204e9800998ecf8427e Manifest-Version=1.0 Language=it serializer-class-itwikiprecleantokc1000p1txt=eus.ixa.ixa.pipe.nerc.dict.BrownCluster$BrownClusterSerializer The last line tells OpenNLp to use the class `eus.ixa.ixa.pipe.nerc.dict.BrownCluster$BrownClusterSerializer` implementing the `ArtifactSerializer` interface to load the artifact with the name `itwikiprecleantokc1000p1txt` contained in the model. When running within OSGI OpenNLP can not load such extensions via the classpath because in OpenNLP modules do have only access to packages they are explicitly importing. In OSGI Extensions are typically handled by registering them as Services with the OSGI ServiceRegistry. This is also the way how OpenNLP searches for extensions when it is used within an OSGI framework. To use the IXA NERC models within an OSGI environment one needs to provide two things: 1. an OSGI Bundle 2. register the extensions as OSGI services. (1) is done by be pom.xml file o.a.stanbol.commons.ixa-pipe-nerc module [2]. The configuration of the `maven-bundle-plugin` does all the work. The `Import-Package` section defines ignored, optional and required dependencies. I used this to ignore `net.sourceforge.argparse4j`; make the dependencies to `org.jdom2` and the `opennlp.tools.cmdline` package optional as those seam only to be required for training and not during runtime. The `Export-Package` section defines functionality this bundle provides to other modules. Currently all `eus.ixa.ixa.pipe.nerc` packages are exported. Note also that the `maven-bundle-plugin` will copy all classes of exported packages to the bundle. By this way the code provided by the `eus.ixa:ixa-pipe-nerc` is getting into the bundle. The `Private-Package` section defines functionality the stays private to this bundle. This incudee the BundleActivator I implemented to register your extensions as services (see below). (2) The registration of the IXA pipes nerc extensions as OSGI service is done by a BundleActivator. The class implemdntic the BundleActivator is configured by the `Bundle-Activator` directive in the `maven-bundle-plugin` configuration. The BundleActorvator itself [3] is simple. It has two lifecycle method `start(BundleContext context)` and `stop(BundleContext context)`. The start method registers all the extensions as OSGI services. The stop method unregisters them. In OSGI services are registered for a 1..n names (typically the class names of the service interfaces) and a dictionary of parameters. Parameters can be used by Service consumers to query for Services. OpenNLP requires to the the class name of the service implementation as value of the `opennlp` parameter. So a typical ServiceRegistration looks like Dictionary<String,Object> prop = new Hashtable<String,Object>(); prop.put("opennlp", Word2VecClusterSerializer.class.getName()); registeredServices.add(context.registerService( ArtifactSerializer.class.getName(), new Word2VecClusterSerializer(), prop)); Where `registeredServices` is a set holding references to all registered service needed to unregister them in the `stop(BundleContext context)` method. With this in place it is now possible to successfully load all the IXA pipe nerd models of the 1.5.0 release with OpenNLP when running in an OSGI environment. If you would like to directly support OSGI in one of your next releases feel free to ask. I am sure I can find some time to help out with bringing this feature directly to IXA pipe nerc (and possible the other) modules best Rupert [1] https://issues.apache.org/jira/browse/STANBOL-1422 [2] http://svn.apache.org/repos/asf/stanbol/trunk/commons/ixa-pipe-nerc/pom.xml [3] http://svn.apache.org/repos/asf/stanbol/trunk/commons/ixa-pipe-nerc/src/main/java/org/apache/stanbol/commons/ixa/pipe/nerc/Activator.java -- | Rupert Westenthaler rupert.westentha...@gmail.com | Bodenlehenstraße 11 ++43-699-11108907 | A-5500 Bischofshofen