Re: Retrieving annotator back from analysis engine

2017-03-30 Thread James Baker
Thanks Marshall,

What I have is each annotator wrapped as a separate analysis engine
("pipeline"), and then I'm manually running each of those in turn because I
want to be able to control the order. In fact, what I'm really trying to
achieve is controlling the order that the annotators are run in, based
information I get back from them.

Surely the analysis engine/resource specifier must have some kind of
reference back to the original class, otherwise how does it know what code
to run? Perhaps there's not a method at the moment to get back to the
original annotator, but is it stored somewhere I could get to via
reflection (accepting all the risks and bad practices that entails!)

James

On 30 March 2017 at 15:07, Marshall Schor  wrote:

> Hi James,
>
> The UIMA terminology discusses two kinds of entities:
>
>   a) Annotators - take a CAS in, operate on it, update it, etc.  These are
> the
> building blocks of pipelines.
>
>   b) UIMA Applications (e.g., "pipelines") made up of some collection of
> Annotators.
>
> In most UIMA applications, there might be 1 pipeline, each having a number
> of
> Annotators. Is this what you have?  Or are you running multiple (perhaps
> different) collections of annotators, each having its own pipeline?
>
> The produceAnalysisEngine call takes an object which is a
> ResourceSpecifier.
> That object is a description of the entire pipeline - what annotators are
> in it,
> configuration parameters, etc.  The output of that is an AnalysisEngine
> object
> that represents the whole pipeline.
>
> There's no reference from that AnalysisEngine object back to the
> ResourceSpecifier that was used to direct the construction of the pipeline.
>
> So, I don't think what you want to do can be done.
>
> 
>
> That being said, perhaps the high level design can be adjusted.  I'm
> wondering
> if two things got a bit conflated in the design - the idea of analysis
> engine
> "components" (e.g. Annotators) and the idea of analysis engines themselves
> (the
> pipelines that contain the annotators, configuration data, etc.)?
>
> -Marshall
>
>
> On 3/29/2017 1:11 PM, James Baker wrote:
> > In my UIMA application, I have a number of AnalysisEngines (as you might
> > expect). These were created using UIMAFramework.
> produceAnalysisEngine(...)
> > on my annotators, which all extend MyAnnotator (which in turn extends
> > JCasAnnotator_ImplBase).
> >
> > I want to get from the AnalysisEngine back to the original class (cast to
> > MyAnnotator) so that I can access some of the additional functions I've
> > added to the class. However, I can't seem to work out how to do that.
> Could
> > someone give some pointers?
> >
> > For clarity, I've included below some code of what I'm trying to acheive
> > (I'm aware that the code below doesn't work as I've tried it!)
> >
> > 
> >
> > AnalysisEngine ae = getAnalysisEngine(); //Get the analysis engine from
> > whereever it is, this bit's not important
> >
> > MyAnnotator ma = (MyAnnotator) ae; //Throws ClassCastException
> > ma.callMyFunction(); //This is what I'm really trying to get to
> >
> > 
> >
> > Thanks,
> > James
> >
>
>


Retrieving annotator back from analysis engine

2017-03-29 Thread James Baker
In my UIMA application, I have a number of AnalysisEngines (as you might
expect). These were created using UIMAFramework.produceAnalysisEngine(...)
on my annotators, which all extend MyAnnotator (which in turn extends
JCasAnnotator_ImplBase).

I want to get from the AnalysisEngine back to the original class (cast to
MyAnnotator) so that I can access some of the additional functions I've
added to the class. However, I can't seem to work out how to do that. Could
someone give some pointers?

For clarity, I've included below some code of what I'm trying to acheive
(I'm aware that the code below doesn't work as I've tried it!)



AnalysisEngine ae = getAnalysisEngine(); //Get the analysis engine from
whereever it is, this bit's not important

MyAnnotator ma = (MyAnnotator) ae; //Throws ClassCastException
ma.callMyFunction(); //This is what I'm really trying to get to



Thanks,
James


Sharing Information Between Analysis Engines and Flow Controllers

2017-03-02 Thread James Baker
As part of my UIMA pipeline, I have a number of analysis engines that
produce different annotations based on their configuration. For instance,
you might configure a gazetteer annotator to annotate people in one
instance, and locations in another (potentially both within the same
pipeline). I also have a flow controller that looks at the inputs and
outputs of each analysis engine and works out the optimum order to run them
in.

However, I'm struggling to find a way for the analysis engine to pass
information about the types it consumers/produces to the flow controller. I
looked at doing it with capabilities, but that doesn't work because I am
unable to modify those capabilities (as far as I can tell) at run time to
match the configuration. I also looked at trying to use the Session object,
but it would appear that this is not shared between the FlowContext and the
UimaContext. The only thing I've found that is even remotely plausible is
that I can access the ConfigurationParameters of the analysis engine from
the flow controller, but using that will involve the flow controller having
an understanding of how each analysis engine can be configured so I'd
rather not use that if I can help it.

How can I pass information from the analysis engine to the flow controller?
Alternatively, is there a way to get from the analysis engine key to the
instance itself so that I can directly ask the analysis engine how it is
configured?

Thanks,
James


Reordering Analysis Engines

2017-02-27 Thread James Baker
Is it possible to reorder the analysis engines in a CPE once it is created?

I have a CPE consisting of a large number of analysis engines, and I'd like
to automatically optimise the order. However, some of the parameters needed
to perform the optimisation are only calculated once the analysis engines
are initialised. So I will need to initialise the CPE and then reorder the
analysis engines. Is this possible?

Thanks,
James


Re: Reordering Analysis Engines

2017-02-27 Thread James Baker
Thanks Richard, switching over to SimplePipeline did the trick. I'll update
the GitHub repository with a working solution for reference.

Is there any information available on the advantages/disadvantages of
SimplePipeline over using the CPE? The application I'm using already uses
CPE, so I'd like to understand what the impact of moving away from that
might be.

James

On 27 February 2017 at 15:00, Richard Eckart de Castilho 
wrote:

> I don't think that CPE supports flow controllers. I would recommend you
> first try this with SimplePipeline.
>
> The CpeBuilder / CpePipeline takes an aggregate and disassembles it into
> its components and then passes each component separately to the CPE. At
> this point, the FlowController is lost. You *could* wrap your
> flow-controlled aggregate into yet another aggregate and pass that to the
> CPE, but then the CPE would try to scale it out en-block.
>
> Cheers,
>
> -- Richard
>
> > On 27.02.2017, at 14:46, james.d.ba...@gmail.com wrote:
> >
> > I realised after sending this that actually I could do what I want with
> a FlowController. However, I’ve been struggling to get a FlowController up
> and running as part of my pipeline. I’ve created a simple project which
> should run the annotators in reverse order… but it’s still running them in
> the listed order and in fact isn’t even initialising the FlowController.
> There are very few examples of using a FlowController with UimaFIT online,
> so is there anyone who could cast an eye over what I’ve done and help me
> find the issue?
> >
> > https://github.com/jamesdbaker/uima-ordering
> >
> > Thanks,
> > James
> >
> >> On 27 Feb 2017, at 11:48, james.d.ba...@gmail.com wrote:
> >>
> >> Is it possible to reorder the analysis engines in a CPE once it is
> created?
> >>
> >> I have a CPE consisting of a large number of analysis engines, and I'd
> like to automatically optimise the order. However, some of the parameters
> needed to perform the optimisation are only calculated once the analysis
> engines are initialised. So I will need to initialise the CPE and then
> reorder the analysis engines. Is this possible?
> >>
> >> Thanks,
> >> James
> >
>
>


Re: UIMA DUCC - Excessive Initialize Failure

2014-11-03 Thread James Baker
We've looked at all the logs we can find, but we'll have another look and
see if there are any we missed.

Yes, there's a shared file system set up and both the data and the job
files are located on it.

On 3 November 2014 14:23, Lou DeGenaro  wrote:

> Log files may be informative.  There are system and user log files.  System
> log files are located in ducc_runtime/logs.  User log files (which may be
> more helpful here) are located in --log_directory  that was specified
> on the submit command (and can also be seen by employing the WebServer).
>
> Do you have a shared file system amongst your machines and is your user
> data located there?
>
> Lou.
>
> On Mon, Nov 3, 2014 at 8:34 AM, James Baker 
> wrote:
>
> > I'm trying, without much success, to run DUCC in multi-user-multi-machine
> > mode. We've had it running successfully in single-user-single-machine
> mode,
> > and have followed the installation guide to move to a
> > multi-user-multi-machine configuration, but we keep getting an Excessive
> > Initialize Failure error from JobDriverTerminateException.java.
> >
> > This has happened with both the example job and our own job. We've looked
> > through the logs and can't see anything that might suggest what is
> causing
> > the issue.
> >
> > Any one got any idea how to fix the issue?
> >
> > Thanks,
> > James
> >
>


UIMA DUCC - Excessive Initialize Failure

2014-11-03 Thread James Baker
I'm trying, without much success, to run DUCC in multi-user-multi-machine
mode. We've had it running successfully in single-user-single-machine mode,
and have followed the installation guide to move to a
multi-user-multi-machine configuration, but we keep getting an Excessive
Initialize Failure error from JobDriverTerminateException.java.

This has happened with both the example job and our own job. We've looked
through the logs and can't see anything that might suggest what is causing
the issue.

Any one got any idea how to fix the issue?

Thanks,
James


UIMA DUCC - Multi-machine Installation

2014-10-30 Thread James Baker
I've been working through the installation of UIMA DUCC, and have
successfully got it set up and running on a single machine. I'd now like to
move to running it on a cluster of machines, but it isn't clear to me from
the installation guide as to whether I need to install DUCC on each node,
or whether ducc_ling is the only thing that needs installing on the
non-head nodes.

Could anyone shed some light on the process please?

Thanks,
James


Re: Share database connections between annotators

2014-09-17 Thread James Baker
My annotators aren't encompassed within an aggregate AE, but are chained
together in a CPE descriptor. Is it possible to define the external
resource in the CPE descriptor for the annotators to access, rather than
having bundle them into an aggregate in order to share the resource?

On 17 September 2014 12:16, Richard Eckart de Castilho 
wrote:

> When you have created a description in uimaFIT, call toXML on it to get the
> configuration file. Mind that the uimaFIT-generated description may contain
> many redundancies, in particular the types might be re-defined multiple
> times.
> uimaFIT doesn't take care of minimizing the description since that is not
> important for its operation.
>
> Cheers,
>
> -- Richard
>
> On 17.09.2014, at 13:00, Johannes Darms 
> wrote:
>
> > Hey James,
> >
> >> How do I ensure the resource is shared across multiple
> >> annotators? Do I give it the same key, name and/or URI? Does it need to
> be
> >> declared separately and then referenced somehow?
> > I'm not sure how to do it with configuration Files. I create and inject
> them using UIMAfit [1].
> >
> >
> > Regards,
> >
> > Johannes
> >
> > [1]
> https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#d5e387
> > - Original Message -
> > From: "James Baker" 
> > To: user@uima.apache.org
> > Sent: Wednesday, September 17, 2014 12:20:54 PM
> > Subject: Re: Share database connections between annotators
> >
> > Thanks Johannes,
> >
> > I've not used External Resources before and the documentation seems to be
> > fairly limited. How do I ensure the resource is shared across multiple
> > annotators? Do I give it the same key, name and/or URI? Does it need to
> be
> > declared separately and then referenced somehow?
> >
> > James
> >
> > On 17 September 2014 10:57, Johannes Darms <
> > johannes.da...@scai.fraunhofer.de> wrote:
> >
> >> Hey James,
> >>
> >> I don't know if its the best way. I would implement a collection Pool as
> >> an External Resources and pass these Resource to the different
> Annotators
> >> in your AE.
> >>
> >> Regards,
> >>
> >> Johannes
> >>
> >> [1]
> >>
> https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources
> >>
> >> - Original Message -
> >> From: "James Baker" 
> >> To: user@uima.apache.org
> >> Sent: Wednesday, September 17, 2014 11:33:58 AM
> >> Subject: Share database connections between annotators
> >>
> >> What is the best way of sharing a database connection (in my case, to
> >> MongoDB) between annotators?
> >>
> >> Currently, I instantiate a new database connection for each annotator,
> but
> >> as my UIMA pipeline now has over 50 annotators (not all of which
> connect to
> >> the database), I am ending up with a large number of database
> connections.
> >> Ideally, I'd like to move to a single connection pool that can be
> shared by
> >> the annotators. The Mongo driver supports connection pooling, but I'm
> >> unsure how best to implement in UIMA.
> >>
> >> Any advice would be appreciated.
> >>
> >> Thanks,
> >> James
> >>
>
>


Re: Share database connections between annotators

2014-09-17 Thread James Baker
Thanks Johannes,

I've not used External Resources before and the documentation seems to be
fairly limited. How do I ensure the resource is shared across multiple
annotators? Do I give it the same key, name and/or URI? Does it need to be
declared separately and then referenced somehow?

James

On 17 September 2014 10:57, Johannes Darms <
johannes.da...@scai.fraunhofer.de> wrote:

> Hey James,
>
> I don't know if its the best way. I would implement a collection Pool as
> an External Resources and pass these Resource to the different Annotators
> in your AE.
>
> Regards,
>
> Johannes
>
> [1]
> https://uima.apache.org/d/uimafit-current/tools.uimafit.book.html#ugr.tools.uimafit.externalresources
>
> - Original Message -
> From: "James Baker" 
> To: user@uima.apache.org
> Sent: Wednesday, September 17, 2014 11:33:58 AM
> Subject: Share database connections between annotators
>
> What is the best way of sharing a database connection (in my case, to
> MongoDB) between annotators?
>
> Currently, I instantiate a new database connection for each annotator, but
> as my UIMA pipeline now has over 50 annotators (not all of which connect to
> the database), I am ending up with a large number of database connections.
> Ideally, I'd like to move to a single connection pool that can be shared by
> the annotators. The Mongo driver supports connection pooling, but I'm
> unsure how best to implement in UIMA.
>
> Any advice would be appreciated.
>
> Thanks,
> James
>


Share database connections between annotators

2014-09-17 Thread James Baker
What is the best way of sharing a database connection (in my case, to
MongoDB) between annotators?

Currently, I instantiate a new database connection for each annotator, but
as my UIMA pipeline now has over 50 annotators (not all of which connect to
the database), I am ending up with a large number of database connections.
Ideally, I'd like to move to a single connection pool that can be shared by
the annotators. The Mongo driver supports connection pooling, but I'm
unsure how best to implement in UIMA.

Any advice would be appreciated.

Thanks,
James


Passing additional parameters through to CPE components

2014-07-25 Thread James Baker
Is it possible to provide additional configuration parameters in a CPE
descriptor XML file that aren't specified in the annotator/collection
reader descriptor XML file?

I have a collection reader that accepts the classname of a class to use to
do the content extraction as a parameter. This works fine, but I'd like to
be able to pass additional parameters to the content extractor via the XML.
The parameters will be dependant on the content extractor though, so I
can't specify them in the collection reader descriptor. For example,
ContentExtractor1 might need a parameter 'encoding', and ContentExtractor2
might need a parameter 'baseUrl'.

I have been able to achieve this with UimaFIT by creating the collection
reader without the XML and injecting the parameters, but when I try and do
it from the XML file the parameters don't make it through to my content
extractor (I pass the UimaContext object through to the content extractor).
I suspect they might be being ignored by UIMA because they aren't in the
descriptor. How can I work around this?

Thanks,
James


Passing additional parameters through to CPE components

2014-07-24 Thread James Baker
Is it possible to provide additional configuration parameters in a CPE
descriptor XML file that aren't specified in the annotator/collection
reader descriptor XML file?

I have a collection reader that accepts the classname of a class to use to
do the content extraction as a parameter. This works fine, but I'd like to
be able to pass additional parameters to the content extractor via the XML.
The parameters will be dependant on the content extractor though, so I
can't specify them in the collection reader descriptor. For example,
ContentExtractor1 might need a parameter 'encoding', and ContentExtractor2
might need a parameter 'baseUrl'.

I have been able to achieve this with UimaFIT by creating the collection
reader without the XML and injecting the parameters, but when I try and do
it from the XML file the parameters don't make it through to my content
extractor (I pass the UimaContext object through to the content extractor).
I suspect they might be being ignored by UIMA because they aren't in the
descriptor. How can I work around this?

Thanks,
James


Re: Using Apache UIMA Ruta from my own annotator

2014-03-18 Thread James Baker
That solved it, thanks.

On Tuesday, March 18, 2014, Peter Klügl  wrote:

> Am 18.03.2014 13:56, schrieb Peter Klügl:
> > Hi,
> >
> > my first guess is that you need to add the Ruta type system
> > (BasicTypeSystem, which imports InternalTypeSystem) to your analysis
> > engine or to your type system, depending how you run your pipeline or
> > create the CAS to be processed.
> >
> > More comments below...
> >
> > Am 18.03.2014 13:37, schrieb James Baker:
> >> Hi,
> >>
> >> I have a series of UIMA Ruta rules that I wish to run from within my own
> >> UIMA annotator. This is described here, but I can't get it to work:
> >>
> http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.integration
> > I will add a mention about the type system to the documentation.
> >
>
> Ah... there is already a mention in the documentation:
>
> "We  also take care that the Ruta basic typesystem is loaded when our
> annotator is initialized. The Ruta typesystem descriptors are available
> from ruta-core/src/main/resources/org/apache/uima/ruta/engine/"
>
> Missed that...
>
>
> Peter
>
>


Using Apache UIMA Ruta from my own annotator

2014-03-18 Thread James Baker
Hi,

I have a series of UIMA Ruta rules that I wish to run from within my own
UIMA annotator. This is described here, but I can't get it to work:
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.integration

When I try to run the annotator (from within a JUnit test, which I have
used with other UIMA annotators successfully in the past), I get an error
telling me that one of the Ruta basic annotation types
(org.apache.uima.ruta.type.TokenSeed) is used in the Java code but isn't
defined in the XML.

I've added the absolute path to the Ruta type system (BasicTypeSystem.xml
and InternalTypeSystem.xml) to the descriptorPaths parameter (as detailed
here:
http://uima.apache.org/d/ruta-current/tools.ruta.book.html#ugr.tools.ruta.ae.basic.parameter.descriptorPaths),
but that doesn't seem to make a difference.

I've had a look through the Ruta source code and couldn't figure out where
I was going wrong.
Has anyone successfully got a Ruta script to run from within a UIMA
annotator? How did you manage to get it working?

Thanks,
James

P.S. This question has also been posted here:
http://stackoverflow.com/questions/22479510/using-apache-uima-ruta-from-my-own-annotator


Using UIMA within an existing application

2013-03-25 Thread James Baker
I'm writing an extension to an existing application. The idea is that the
user will select a CPE descriptor and do some configuration, then the
application will run the CPE through UIMA and load the results into the
application.

I've got a lot of the parts of the code working individually, but I'm
having a few issues:

1) The CPE may include annotators that aren't on the classpath. I can find
the bin directory that contains the class files (based on the location of
each annotators descriptor file), and create a URLClassLoader that
references that directory (and hence finds the annotators) - but I can't
figure out how to get UIMA to use that classloader.

Any ideas how I'd do this, or is there a better way of loading in
annotators that aren't on the classpath? (And no, it isn't feasible to add
the directories to the classpath before hand)

2) In order to do the configuration (primarily mapping the UIMA outputs
from UIMA to my application), I need to know the type system the CPE will
produce so the user can do the mapping. I can't find a way of getting the
type system before running the pipeline though.

Is this achievable through UIMA, or am I going to have to parse the XML
files myself to build the type system?

3) A much simpler question (I hope). Once I've run the process() method on
the CPE, how do I get the output? Do I have to have a Consumer in the
pipeline that will somehow pass the CAS objects to my application, or can I
get at them directly?

Thanks,
James