Re: Retrieving annotator back from analysis engine
Thanks Marshall, What I have is each annotator wrapped as a separate analysis engine ("pipeline"), and then I'm manually running each of those in turn because I want to be able to control the order. In fact, what I'm really trying to achieve is controlling the order that the annotators are run in, based information I get back from them. Surely the analysis engine/resource specifier must have some kind of reference back to the original class, otherwise how does it know what code to run? Perhaps there's not a method at the moment to get back to the original annotator, but is it stored somewhere I could get to via reflection (accepting all the risks and bad practices that entails!) James On 30 March 2017 at 15:07, Marshall Schor wrote: > Hi James, > > The UIMA terminology discusses two kinds of entities: > > a) Annotators - take a CAS in, operate on it, update it, etc. These are > the > building blocks of pipelines. > > b) UIMA Applications (e.g., "pipelines") made up of some collection of > Annotators. > > In most UIMA applications, there might be 1 pipeline, each having a number > of > Annotators. Is this what you have? Or are you running multiple (perhaps > different) collections of annotators, each having its own pipeline? > > The produceAnalysisEngine call takes an object which is a > ResourceSpecifier. > That object is a description of the entire pipeline - what annotators are > in it, > configuration parameters, etc. The output of that is an AnalysisEngine > object > that represents the whole pipeline. > > There's no reference from that AnalysisEngine object back to the > ResourceSpecifier that was used to direct the construction of the pipeline. > > So, I don't think what you want to do can be done. > > > > That being said, perhaps the high level design can be adjusted. I'm > wondering > if two things got a bit conflated in the design - the idea of analysis > engine > "components" (e.g. Annotators) and the idea of analysis engines themselves > (the > pipelines that contain the annotators, configuration data, etc.)? > > -Marshall > > > On 3/29/2017 1:11 PM, James Baker wrote: > > In my UIMA application, I have a number of AnalysisEngines (as you might > > expect). These were created using UIMAFramework. > produceAnalysisEngine(...) > > on my annotators, which all extend MyAnnotator (which in turn extends > > JCasAnnotator_ImplBase). > > > > I want to get from the AnalysisEngine back to the original class (cast to > > MyAnnotator) so that I can access some of the additional functions I've > > added to the class. However, I can't seem to work out how to do that. > Could > > someone give some pointers? > > > > For clarity, I've included below some code of what I'm trying to acheive > > (I'm aware that the code below doesn't work as I've tried it!) > > > > > > > > AnalysisEngine ae = getAnalysisEngine(); //Get the analysis engine from > > whereever it is, this bit's not important > > > > MyAnnotator ma = (MyAnnotator) ae; //Throws ClassCastException > > ma.callMyFunction(); //This is what I'm really trying to get to > > > > > > > > Thanks, > > James > > > >
Retrieving annotator back from analysis engine
In my UIMA application, I have a number of AnalysisEngines (as you might expect). These were created using UIMAFramework.produceAnalysisEngine(...) on my annotators, which all extend MyAnnotator (which in turn extends JCasAnnotator_ImplBase). I want to get from the AnalysisEngine back to the original class (cast to MyAnnotator) so that I can access some of the additional functions I've added to the class. However, I can't seem to work out how to do that. Could someone give some pointers? For clarity, I've included below some code of what I'm trying to acheive (I'm aware that the code below doesn't work as I've tried it!) AnalysisEngine ae = getAnalysisEngine(); //Get the analysis engine from whereever it is, this bit's not important MyAnnotator ma = (MyAnnotator) ae; //Throws ClassCastException ma.callMyFunction(); //This is what I'm really trying to get to Thanks, James
Sharing Information Between Analysis Engines and Flow Controllers
As part of my UIMA pipeline, I have a number of analysis engines that produce different annotations based on their configuration. For instance, you might configure a gazetteer annotator to annotate people in one instance, and locations in another (potentially both within the same pipeline). I also have a flow controller that looks at the inputs and outputs of each analysis engine and works out the optimum order to run them in. However, I'm struggling to find a way for the analysis engine to pass information about the types it consumers/produces to the flow controller. I looked at doing it with capabilities, but that doesn't work because I am unable to modify those capabilities (as far as I can tell) at run time to match the configuration. I also looked at trying to use the Session object, but it would appear that this is not shared between the FlowContext and the UimaContext. The only thing I've found that is even remotely plausible is that I can access the ConfigurationParameters of the analysis engine from the flow controller, but using that will involve the flow controller having an understanding of how each analysis engine can be configured so I'd rather not use that if I can help it. How can I pass information from the analysis engine to the flow controller? Alternatively, is there a way to get from the analysis engine key to the instance itself so that I can directly ask the analysis engine how it is configured? Thanks, James
Reordering Analysis Engines
Is it possible to reorder the analysis engines in a CPE once it is created? I have a CPE consisting of a large number of analysis engines, and I'd like to automatically optimise the order. However, some of the parameters needed to perform the optimisation are only calculated once the analysis engines are initialised. So I will need to initialise the CPE and then reorder the analysis engines. Is this possible? Thanks, James
Re: Reordering Analysis Engines
Thanks Richard, switching over to SimplePipeline did the trick. I'll update the GitHub repository with a working solution for reference. Is there any information available on the advantages/disadvantages of SimplePipeline over using the CPE? The application I'm using already uses CPE, so I'd like to understand what the impact of moving away from that might be. James On 27 February 2017 at 15:00, Richard Eckart de Castilho wrote: > I don't think that CPE supports flow controllers. I would recommend you > first try this with SimplePipeline. > > The CpeBuilder / CpePipeline takes an aggregate and disassembles it into > its components and then passes each component separately to the CPE. At > this point, the FlowController is lost. You *could* wrap your > flow-controlled aggregate into yet another aggregate and pass that to the > CPE, but then the CPE would try to scale it out en-block. > > Cheers, > > -- Richard > > > On 27.02.2017, at 14:46, james.d.ba...@gmail.com wrote: > > > > I realised after sending this that actually I could do what I want with > a FlowController. However, I’ve been struggling to get a FlowController up > and running as part of my pipeline. I’ve created a simple project which > should run the annotators in reverse order… but it’s still running them in > the listed order and in fact isn’t even initialising the FlowController. > There are very few examples of using a FlowController with UimaFIT online, > so is there anyone who could cast an eye over what I’ve done and help me > find the issue? > > > > https://github.com/jamesdbaker/uima-ordering > > > > Thanks, > > James > > > >> On 27 Feb 2017, at 11:48, james.d.ba...@gmail.com wrote: > >> > >> Is it possible to reorder the analysis engines in a CPE once it is > created? > >> > >> I have a CPE consisting of a large number of analysis engines, and I'd > like to automatically optimise the order. However, some of the parameters > needed to perform the optimisation are only calculated once the analysis > engines are initialised. So I will need to initialise the CPE and then > reorder the analysis engines. Is this possible? > >> > >> Thanks, > >> James > > > >
Re: UIMA DUCC - Excessive Initialize Failure
We've looked at all the logs we can find, but we'll have another look and see if there are any we missed. Yes, there's a shared file system set up and both the data and the job files are located on it. On 3 November 2014 14:23, Lou DeGenaro wrote: > Log files may be informative. There are system and user log files. System > log files are located in ducc_runtime/logs. User log files (which may be > more helpful here) are located in --log_directory that was specified > on the submit command (and can also be seen by employing the WebServer). > > Do you have a shared file system amongst your machines and is your user > data located there? > > Lou. > > On Mon, Nov 3, 2014 at 8:34 AM, James Baker > wrote: > > > I'm trying, without much success, to run DUCC in multi-user-multi-machine > > mode. We've had it running successfully in single-user-single-machine > mode, > > and have followed the installation guide to move to a > > multi-user-multi-machine configuration, but we keep getting an Excessive > > Initialize Failure error from JobDriverTerminateException.java. > > > > This has happened with both the example job and our own job. We've looked > > through the logs and can't see anything that might suggest what is > causing > > the issue. > > > > Any one got any idea how to fix the issue? > > > > Thanks, > > James > > >
UIMA DUCC - Excessive Initialize Failure
I'm trying, without much success, to run DUCC in multi-user-multi-machine mode. We've had it running successfully in single-user-single-machine mode, and have followed the installation guide to move to a multi-user-multi-machine configuration, but we keep getting an Excessive Initialize Failure error from JobDriverTerminateException.java. This has happened with both the example job and our own job. We've looked through the logs and can't see anything that might suggest what is causing the issue. Any one got any idea how to fix the issue? Thanks, James
UIMA DUCC - Multi-machine Installation
I've been working through the installation of UIMA DUCC, and have successfully got it set up and running on a single machine. I'd now like to move to running it on a cluster of machines, but it isn't clear to me from the installation guide as to whether I need to install DUCC on each node, or whether ducc_ling is the only thing that needs installing on the non-head nodes. Could anyone shed some light on the process please? Thanks, James
Re: Share database connections between annotators
My annotators aren't encompassed within an aggregate AE, but are chained together in a CPE descriptor. Is it possible to define the external resource in the CPE descriptor for the annotators to access, rather than having bundle them into an aggregate in order to share the resource? On 17 September 2014 12:16, Richard Eckart de Castilho wrote: > When you have created a description in uimaFIT, call toXML on it to get the > configuration file. Mind that the uimaFIT-generated description may contain > many redundancies, in particular the types might be re-defined multiple > times. > uimaFIT doesn't take care of minimizing the description since that is not > important for its operation. > > Cheers, > > -- Richard > > On 17.09.2014, at 13:00, Johannes Darms > wrote: > > > Hey James, > > > >> How do I ensure the resource is shared across multiple > >> annotators? Do I give it the same key, name and/or URI? Does it need to > be > >> declared separately and then referenced somehow? > > I'm not sure how to do it with configuration Files. I create and inject > them using UIMAfit [1]. > > > > > > Regards, > > > > Johannes > > > > [1] > https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#d5e387 > > - Original Message - > > From: "James Baker" > > To: user@uima.apache.org > > Sent: Wednesday, September 17, 2014 12:20:54 PM > > Subject: Re: Share database connections between annotators > > > > Thanks Johannes, > > > > I've not used External Resources before and the documentation seems to be > > fairly limited. How do I ensure the resource is shared across multiple > > annotators? Do I give it the same key, name and/or URI? Does it need to > be > > declared separately and then referenced somehow? > > > > James > > > > On 17 September 2014 10:57, Johannes Darms < > > johannes.da...@scai.fraunhofer.de> wrote: > > > >> Hey James, > >> > >> I don't know if its the best way. I would implement a collection Pool as > >> an External Resources and pass these Resource to the different > Annotators > >> in your AE. > >> > >> Regards, > >> > >> Johannes > >> > >> [1] > >> > https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources > >> > >> - Original Message - > >> From: "James Baker" > >> To: user@uima.apache.org > >> Sent: Wednesday, September 17, 2014 11:33:58 AM > >> Subject: Share database connections between annotators > >> > >> What is the best way of sharing a database connection (in my case, to > >> MongoDB) between annotators? > >> > >> Currently, I instantiate a new database connection for each annotator, > but > >> as my UIMA pipeline now has over 50 annotators (not all of which > connect to > >> the database), I am ending up with a large number of database > connections. > >> Ideally, I'd like to move to a single connection pool that can be > shared by > >> the annotators. The Mongo driver supports connection pooling, but I'm > >> unsure how best to implement in UIMA. > >> > >> Any advice would be appreciated. > >> > >> Thanks, > >> James > >> > >
Re: Share database connections between annotators
Thanks Johannes, I've not used External Resources before and the documentation seems to be fairly limited. How do I ensure the resource is shared across multiple annotators? Do I give it the same key, name and/or URI? Does it need to be declared separately and then referenced somehow? James On 17 September 2014 10:57, Johannes Darms < johannes.da...@scai.fraunhofer.de> wrote: > Hey James, > > I don't know if its the best way. I would implement a collection Pool as > an External Resources and pass these Resource to the different Annotators > in your AE. > > Regards, > > Johannes > > [1] > https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources > > - Original Message - > From: "James Baker" > To: user@uima.apache.org > Sent: Wednesday, September 17, 2014 11:33:58 AM > Subject: Share database connections between annotators > > What is the best way of sharing a database connection (in my case, to > MongoDB) between annotators? > > Currently, I instantiate a new database connection for each annotator, but > as my UIMA pipeline now has over 50 annotators (not all of which connect to > the database), I am ending up with a large number of database connections. > Ideally, I'd like to move to a single connection pool that can be shared by > the annotators. The Mongo driver supports connection pooling, but I'm > unsure how best to implement in UIMA. > > Any advice would be appreciated. > > Thanks, > James >
Share database connections between annotators
What is the best way of sharing a database connection (in my case, to MongoDB) between annotators? Currently, I instantiate a new database connection for each annotator, but as my UIMA pipeline now has over 50 annotators (not all of which connect to the database), I am ending up with a large number of database connections. Ideally, I'd like to move to a single connection pool that can be shared by the annotators. The Mongo driver supports connection pooling, but I'm unsure how best to implement in UIMA. Any advice would be appreciated. Thanks, James
Passing additional parameters through to CPE components
Is it possible to provide additional configuration parameters in a CPE descriptor XML file that aren't specified in the annotator/collection reader descriptor XML file? I have a collection reader that accepts the classname of a class to use to do the content extraction as a parameter. This works fine, but I'd like to be able to pass additional parameters to the content extractor via the XML. The parameters will be dependant on the content extractor though, so I can't specify them in the collection reader descriptor. For example, ContentExtractor1 might need a parameter 'encoding', and ContentExtractor2 might need a parameter 'baseUrl'. I have been able to achieve this with UimaFIT by creating the collection reader without the XML and injecting the parameters, but when I try and do it from the XML file the parameters don't make it through to my content extractor (I pass the UimaContext object through to the content extractor). I suspect they might be being ignored by UIMA because they aren't in the descriptor. How can I work around this? Thanks, James
Passing additional parameters through to CPE components
Is it possible to provide additional configuration parameters in a CPE descriptor XML file that aren't specified in the annotator/collection reader descriptor XML file? I have a collection reader that accepts the classname of a class to use to do the content extraction as a parameter. This works fine, but I'd like to be able to pass additional parameters to the content extractor via the XML. The parameters will be dependant on the content extractor though, so I can't specify them in the collection reader descriptor. For example, ContentExtractor1 might need a parameter 'encoding', and ContentExtractor2 might need a parameter 'baseUrl'. I have been able to achieve this with UimaFIT by creating the collection reader without the XML and injecting the parameters, but when I try and do it from the XML file the parameters don't make it through to my content extractor (I pass the UimaContext object through to the content extractor). I suspect they might be being ignored by UIMA because they aren't in the descriptor. How can I work around this? Thanks, James
Re: Using Apache UIMA Ruta from my own annotator
That solved it, thanks. On Tuesday, March 18, 2014, Peter Klügl wrote: > Am 18.03.2014 13:56, schrieb Peter Klügl: > > Hi, > > > > my first guess is that you need to add the Ruta type system > > (BasicTypeSystem, which imports InternalTypeSystem) to your analysis > > engine or to your type system, depending how you run your pipeline or > > create the CAS to be processed. > > > > More comments below... > > > > Am 18.03.2014 13:37, schrieb James Baker: > >> Hi, > >> > >> I have a series of UIMA Ruta rules that I wish to run from within my own > >> UIMA annotator. This is described here, but I can't get it to work: > >> > http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.integration > > I will add a mention about the type system to the documentation. > > > > Ah... there is already a mention in the documentation: > > "We also take care that the Ruta basic typesystem is loaded when our > annotator is initialized. The Ruta typesystem descriptors are available > from ruta-core/src/main/resources/org/apache/uima/ruta/engine/" > > Missed that... > > > Peter > >
Using Apache UIMA Ruta from my own annotator
Hi, I have a series of UIMA Ruta rules that I wish to run from within my own UIMA annotator. This is described here, but I can't get it to work: http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.integration When I try to run the annotator (from within a JUnit test, which I have used with other UIMA annotators successfully in the past), I get an error telling me that one of the Ruta basic annotation types (org.apache.uima.ruta.type.TokenSeed) is used in the Java code but isn't defined in the XML. I've added the absolute path to the Ruta type system (BasicTypeSystem.xml and InternalTypeSystem.xml) to the descriptorPaths parameter (as detailed here: http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic.parameter.descriptorPaths), but that doesn't seem to make a difference. I've had a look through the Ruta source code and couldn't figure out where I was going wrong. Has anyone successfully got a Ruta script to run from within a UIMA annotator? How did you manage to get it working? Thanks, James P.S. This question has also been posted here: http://stackoverflow.com/questions/22479510/using-apache-uima-ruta-from-my-own-annotator
Using UIMA within an existing application
I'm writing an extension to an existing application. The idea is that the user will select a CPE descriptor and do some configuration, then the application will run the CPE through UIMA and load the results into the application. I've got a lot of the parts of the code working individually, but I'm having a few issues: 1) The CPE may include annotators that aren't on the classpath. I can find the bin directory that contains the class files (based on the location of each annotators descriptor file), and create a URLClassLoader that references that directory (and hence finds the annotators) - but I can't figure out how to get UIMA to use that classloader. Any ideas how I'd do this, or is there a better way of loading in annotators that aren't on the classpath? (And no, it isn't feasible to add the directories to the classpath before hand) 2) In order to do the configuration (primarily mapping the UIMA outputs from UIMA to my application), I need to know the type system the CPE will produce so the user can do the mapping. I can't find a way of getting the type system before running the pipeline though. Is this achievable through UIMA, or am I going to have to parse the XML files myself to build the type system? 3) A much simpler question (I hope). Once I've run the process() method on the CPE, how do I get the output? Do I have to have a Consumer in the pipeline that will somehow pass the CAS objects to my application, or can I get at them directly? Thanks, James