Re: Retrieving annotator back from analysis engine

2017-03-31 Thread james . d . baker
I think the issue with both of those approaches is that the information I need 
is only available after the annotator has been initialised, so I need access to 
the actual initialised instance of the class.

In the case of the Flow Controller, I think I end up in the same situation 
where I don’t have direct access to the annotator; and in the case of the Map, 
as the initialisation is done within produceAnalysisEngine I’m still not sure 
if I have a route back to the initialised instance of class (unless I can get 
that from the ResourceSpecifier)?

I guess I could perhaps add some code in the initialise method that stores the 
required information at an application level which I could then associate by 
name (is there a way to get the analysis engine name from within the 
annotator?) with the analysis engine, if there’s no native way within UIMA to 
do it.

Out of interest, how big would the modification to UIMA be to allow what I’m 
trying to do? Are we talking a significant change, or is it small enough that 
if I were to put it in as a feature request someone might implement it?

James

> On 30 Mar 2017, at 22:26, Marshall Schor  wrote:
> 
> Hi James,
> 
> Although you might be able to find some chain of references to eventually get
> what you want, it would end up being very brittle, subject to change from
> release to release without warning etc.
> 
> A more general approach, that's immune to these would be to just create a user
> map in your application code which is calling produceAnalysisEngine, so at the
> next line it adds to this map a key (which is the analysis engine) and a value
> (which is the resourceSpecifier used to create the Analysis Engine) (which in
> your special case, is also the subclass of MyAnnotator, as I understand it).
> 
> Then whenever you need to go from the Analysis Engine to the MyAnnotator
> instance, you just look it up in this map.
> 
> -Marshall
> 
> 
> On 3/30/2017 10:24 AM, James Baker wrote:
>> Thanks Marshall,
>> 
>> What I have is each annotator wrapped as a separate analysis engine
>> ("pipeline"), and then I'm manually running each of those in turn because I
>> want to be able to control the order. In fact, what I'm really trying to
>> achieve is controlling the order that the annotators are run in, based
>> information I get back from them.
>> 
>> Surely the analysis engine/resource specifier must have some kind of
>> reference back to the original class, otherwise how does it know what code
>> to run? Perhaps there's not a method at the moment to get back to the
>> original annotator, but is it stored somewhere I could get to via
>> reflection (accepting all the risks and bad practices that entails!)
>> 
>> James
>> 
>> On 30 March 2017 at 15:07, Marshall Schor  wrote:
>> 
>>> Hi James,
>>> 
>>> The UIMA terminology discusses two kinds of entities:
>>> 
>>>  a) Annotators - take a CAS in, operate on it, update it, etc.  These are
>>> the
>>> building blocks of pipelines.
>>> 
>>>  b) UIMA Applications (e.g., "pipelines") made up of some collection of
>>> Annotators.
>>> 
>>> In most UIMA applications, there might be 1 pipeline, each having a number
>>> of
>>> Annotators. Is this what you have?  Or are you running multiple (perhaps
>>> different) collections of annotators, each having its own pipeline?
>>> 
>>> The produceAnalysisEngine call takes an object which is a
>>> ResourceSpecifier.
>>> That object is a description of the entire pipeline - what annotators are
>>> in it,
>>> configuration parameters, etc.  The output of that is an AnalysisEngine
>>> object
>>> that represents the whole pipeline.
>>> 
>>> There's no reference from that AnalysisEngine object back to the
>>> ResourceSpecifier that was used to direct the construction of the pipeline.
>>> 
>>> So, I don't think what you want to do can be done.
>>> 
>>> 
>>> 
>>> That being said, perhaps the high level design can be adjusted.  I'm
>>> wondering
>>> if two things got a bit conflated in the design - the idea of analysis
>>> engine
>>> "components" (e.g. Annotators) and the idea of analysis engines themselves
>>> (the
>>> pipelines that contain the annotators, configuration data, etc.)?
>>> 
>>> -Marshall
>>> 
>>> 
>>> On 3/29/2017 1:11 PM, James Baker wrote:
 In my UIMA application, I have a number of AnalysisEngines (as you might
 expect). These were created using UIMAFramework.
>>> produceAnalysisEngine(...)
 on my annotators, which all extend MyAnnotator (which in turn extends
 JCasAnnotator_ImplBase).
 
 I want to get from the AnalysisEngine back to the original class (cast to
 MyAnnotator) so that I can access some of the additional functions I've
 added to the class. However, I can't seem to work out how to do that.
>>> Could
 someone give some pointers?
 
 For clarity, I've included below some code of what I'm trying to acheive
 (I'm aware that the code below doesn't work as I've tried it!)
 
 -

Re: Reordering Analysis Engines

2017-02-27 Thread james . d . baker
Thanks Richard,

I don’t think we’re using any multi-threading bits at the moment, so it sounds 
like the SimplePipeline might be ok for the time being. If we do decide to move 
to a multi-threaded system in the future, presumably implementing our own is 
the best way forwards, or is there an alternative to CPE/SimplePipeline that 
I’ve missed?

James

> On 27 Feb 2017, at 16:35, Richard Eckart de Castilho  wrote:
> 
> On 27.02.2017, at 16:39, James Baker  wrote:
>> 
>> Thanks Richard, switching over to SimplePipeline did the trick. I'll update
>> the GitHub repository with a working solution for reference.
>> 
>> Is there any information available on the advantages/disadvantages of
>> SimplePipeline over using the CPE? The application I'm using already uses
>> CPE, so I'd like to understand what the impact of moving away from that
>> might be.
> 
> SimplePipeline is just a basic single-threaded thing. What it does internally
> is basically creating an aggregate from all the engines that it receives,
> creating a CAS, and using that CAS to loop over the collection reader and
> all the engines. All no extra threads created, no parallelization.
> 
> CpePipeline and CpeBuilder make use of the rather deprecated UIMA CPE [1].
> They are very simple wrappers around CPE mimicking the API of SimplePipeline.
> CpePipeline configures CPE to scale up the analysis engines creating one
> parallel instance per CPU core (reserving one core for your other work).
> With CpeBuilder, you have a bit more control over the settings, e.g. you 
> can change the number of threads to use and you can post-process the
> CpeDescription if you want that.
> 
> Cheers,
> 
> -- Richard
> 
> [1] 
> https://uima.apache.org/d/uimaj-2.9.0/references.html#ugr.ref.xml.cpe_descriptor



Re: Reordering Analysis Engines

2017-02-27 Thread james . d . baker
I realised after sending this that actually I could do what I want with a 
FlowController. However, I’ve been struggling to get a FlowController up and 
running as part of my pipeline. I’ve created a simple project which should run 
the annotators in reverse order… but it’s still running them in the listed 
order and in fact isn’t even initialising the FlowController. There are very 
few examples of using a FlowController with UimaFIT online, so is there anyone 
who could cast an eye over what I’ve done and help me find the issue?

https://github.com/jamesdbaker/uima-ordering

Thanks,
James

> On 27 Feb 2017, at 11:48, james.d.ba...@gmail.com wrote:
> 
> Is it possible to reorder the analysis engines in a CPE once it is created?
> 
> I have a CPE consisting of a large number of analysis engines, and I'd like 
> to automatically optimise the order. However, some of the parameters needed 
> to perform the optimisation are only calculated once the analysis engines are 
> initialised. So I will need to initialise the CPE and then reorder the 
> analysis engines. Is this possible?
> 
> Thanks,
> James



Reordering Analysis Engines

2017-02-27 Thread james . d . baker
Is it possible to reorder the analysis engines in a CPE once it is created?

I have a CPE consisting of a large number of analysis engines, and I'd like to 
automatically optimise the order. However, some of the parameters needed to 
perform the optimisation are only calculated once the analysis engines are 
initialised. So I will need to initialise the CPE and then reorder the analysis 
engines. Is this possible?

Thanks,
James

Re: UIMA DUCC - Excessive Initialize Failure

2014-11-03 Thread james . d . baker
There doesn’t seem to be anything of relevance in the user logs. In fact, there 
doesn’t seem to be much in them at all!

> On 3 Nov 2014, at 14:43, Lou DeGenaro  wrote:
> 
> If you navigate to http://uima-ducc-demo.apache.org:42133/jobs.jsp (that's
> for the live demo - you'd substitute your own host:port accordingly, of
> course) then click on the id of the failing job that should give you access
> to the user's log files (under the Proceses and/or Files tabs).
> 
> Lou.
> 
> On Mon, Nov 3, 2014 at 9:30 AM, James Baker  wrote:
> 
>> We've looked at all the logs we can find, but we'll have another look and
>> see if there are any we missed.
>> 
>> Yes, there's a shared file system set up and both the data and the job
>> files are located on it.
>> 
>> On 3 November 2014 14:23, Lou DeGenaro  wrote:
>> 
>>> Log files may be informative.  There are system and user log files.
>> System
>>> log files are located in ducc_runtime/logs.  User log files (which may be
>>> more helpful here) are located in --log_directory  that was
>> specified
>>> on the submit command (and can also be seen by employing the WebServer).
>>> 
>>> Do you have a shared file system amongst your machines and is your user
>>> data located there?
>>> 
>>> Lou.
>>> 
>>> On Mon, Nov 3, 2014 at 8:34 AM, James Baker 
>>> wrote:
>>> 
 I'm trying, without much success, to run DUCC in
>> multi-user-multi-machine
 mode. We've had it running successfully in single-user-single-machine
>>> mode,
 and have followed the installation guide to move to a
 multi-user-multi-machine configuration, but we keep getting an
>> Excessive
 Initialize Failure error from JobDriverTerminateException.java.
 
 This has happened with both the example job and our own job. We've
>> looked
 through the logs and can't see anything that might suggest what is
>>> causing
 the issue.
 
 Any one got any idea how to fix the issue?
 
 Thanks,
 James
 
>>> 
>> 



Re: Passing additional parameters through to CPE components

2014-07-24 Thread james . d . baker
Thanks,

That’s a possibility, although I was hoping I wouldn’t have to resort to that - 
it feels like a bit of a hack.

James

On 24 Jul 2014, at 19:05, Burn Lewis  wrote:

> Perhaps you could add a generic "parameters" parameter to
> MyCollectionReader and then override it in your CPE with a value
> appropriate to the specified classname, e.g.  such as
> 
>folder - /opt/test
>classname - test.MyContentExtractor
>parameters - baseUrl=http://www.example.com
> 
> 
> On Thu, Jul 24, 2014 at 1:55 PM, Eddie Epstein  wrote:
> 
>> Right, the only way for "encompassing" descriptors (like aggregates or
>> CPE)  to effect configuration parameters is via overrides.
>> 
>> Eddie
>> 
>> 
>> On Thu, Jul 24, 2014 at 11:31 AM,  wrote:
>> 
>>> I think you’ve misunderstood my question - I’m not asking whether I can
>>> override defined parameters, I’m asking if I can provide additional
>>> configuration parameters that aren’t defined in a descriptor file. Let me
>>> give an example:
>>> 
>>> MyCollectionReader.xml defines the following properties:
>>>folder [String] - The folder to process files from
>>>classname [String] - The qualified class name of a class
>>> implementing my ContentExtractor interface
>>> 
>>> MyCpe.xml uses MyCollectionReader.xml and provides the following
>>> properties, including some that MyContentExtractor uses but aren’t
>> defined
>>> above:
>>>folder - /opt/test
>>>classname - test.MyContentExtractor
>>>baseUrl - http://www.example.com
>>> 
>>> The parameter baseUrl, although it is specified in the MyCpe.xml file,
>>> isn’t defined in MyCollectionReader.xml because it is specific to the
>>> MyContentExtractor class and not necessarily known at design time.
>> However,
>>> UIMA isn’t passing it through to UimaContext presumably because it isn’t
>>> defined in the MyCollectionReader.xml.
>>> 
>>> Hope that helps clear it up.
>>> 
>>> 
>>> On 24 Jul 2014, at 14:51, Eddie Epstein  wrote:
>>> 
 A CPE descriptor can override configuration parameters defined in any
 integrated components.
 Documentation a little bit below
 
>>> 
>> http://uima.apache.org/d/uimaj-2.6.0/references.html#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual
 3.6.1.2.  Element
 
 This element provides a way to override the contained Analysis Engine's
 parameters settings. Any entry specified here must already be defined;
 values specified replace the corresponding values for each parameter.
>> For
 Cas Processors, this mechanism is only available when they are deployed
>>> in “
 integrated” mode. For Collection Readers and Initializers, it always is
 available.
 
 
 
 On Thu, Jul 24, 2014 at 8:19 AM, James Baker 
 wrote:
 
> Is it possible to provide additional configuration parameters in a CPE
> descriptor XML file that aren't specified in the annotator/collection
> reader descriptor XML file?
> 
> I have a collection reader that accepts the classname of a class to
>> use
>>> to
> do the content extraction as a parameter. This works fine, but I'd
>> like
>>> to
> be able to pass additional parameters to the content extractor via the
>>> XML.
> The parameters will be dependant on the content extractor though, so I
> can't specify them in the collection reader descriptor. For example,
> ContentExtractor1 might need a parameter 'encoding', and
>>> ContentExtractor2
> might need a parameter 'baseUrl'.
> 
> I have been able to achieve this with UimaFIT by creating the
>> collection
> reader without the XML and injecting the parameters, but when I try
>> and
>>> do
> it from the XML file the parameters don't make it through to my
>> content
> extractor (I pass the UimaContext object through to the content
>>> extractor).
> I suspect they might be being ignored by UIMA because they aren't in
>> the
> descriptor. How can I work around this?
> 
> Thanks,
> James
> 
>>> 
>>> 
>> 



Re: Passing additional parameters through to CPE components

2014-07-24 Thread james . d . baker
But you can only override something you’ve predefined. I want to specify 
something that I can’t predefine...


On 24 Jul 2014, at 18:55, Eddie Epstein  wrote:

> Right, the only way for "encompassing" descriptors (like aggregates or
> CPE)  to effect configuration parameters is via overrides.
> 
> Eddie
> 
> 
> On Thu, Jul 24, 2014 at 11:31 AM,  wrote:
> 
>> I think you’ve misunderstood my question - I’m not asking whether I can
>> override defined parameters, I’m asking if I can provide additional
>> configuration parameters that aren’t defined in a descriptor file. Let me
>> give an example:
>> 
>> MyCollectionReader.xml defines the following properties:
>>folder [String] - The folder to process files from
>>classname [String] - The qualified class name of a class
>> implementing my ContentExtractor interface
>> 
>> MyCpe.xml uses MyCollectionReader.xml and provides the following
>> properties, including some that MyContentExtractor uses but aren’t defined
>> above:
>>folder - /opt/test
>>classname - test.MyContentExtractor
>>baseUrl - http://www.example.com
>> 
>> The parameter baseUrl, although it is specified in the MyCpe.xml file,
>> isn’t defined in MyCollectionReader.xml because it is specific to the
>> MyContentExtractor class and not necessarily known at design time. However,
>> UIMA isn’t passing it through to UimaContext presumably because it isn’t
>> defined in the MyCollectionReader.xml.
>> 
>> Hope that helps clear it up.
>> 
>> 
>> On 24 Jul 2014, at 14:51, Eddie Epstein  wrote:
>> 
>>> A CPE descriptor can override configuration parameters defined in any
>>> integrated components.
>>> Documentation a little bit below
>>> 
>> http://uima.apache.org/d/uimaj-2.6.0/references.html#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual
>>> 3.6.1.2.  Element
>>> 
>>> This element provides a way to override the contained Analysis Engine's
>>> parameters settings. Any entry specified here must already be defined;
>>> values specified replace the corresponding values for each parameter. For
>>> Cas Processors, this mechanism is only available when they are deployed
>> in “
>>> integrated” mode. For Collection Readers and Initializers, it always is
>>> available.
>>> 
>>> 
>>> 
>>> On Thu, Jul 24, 2014 at 8:19 AM, James Baker 
>>> wrote:
>>> 
 Is it possible to provide additional configuration parameters in a CPE
 descriptor XML file that aren't specified in the annotator/collection
 reader descriptor XML file?
 
 I have a collection reader that accepts the classname of a class to use
>> to
 do the content extraction as a parameter. This works fine, but I'd like
>> to
 be able to pass additional parameters to the content extractor via the
>> XML.
 The parameters will be dependant on the content extractor though, so I
 can't specify them in the collection reader descriptor. For example,
 ContentExtractor1 might need a parameter 'encoding', and
>> ContentExtractor2
 might need a parameter 'baseUrl'.
 
 I have been able to achieve this with UimaFIT by creating the collection
 reader without the XML and injecting the parameters, but when I try and
>> do
 it from the XML file the parameters don't make it through to my content
 extractor (I pass the UimaContext object through to the content
>> extractor).
 I suspect they might be being ignored by UIMA because they aren't in the
 descriptor. How can I work around this?
 
 Thanks,
 James
 
>> 
>> 



Re: Passing additional parameters through to CPE components

2014-07-24 Thread james . d . baker
I think you’ve misunderstood my question - I’m not asking whether I can 
override defined parameters, I’m asking if I can provide additional 
configuration parameters that aren’t defined in a descriptor file. Let me give 
an example:

MyCollectionReader.xml defines the following properties:
folder [String] - The folder to process files from
classname [String] - The qualified class name of a class implementing 
my ContentExtractor interface

MyCpe.xml uses MyCollectionReader.xml and provides the following properties, 
including some that MyContentExtractor uses but aren’t defined above:
folder - /opt/test
classname - test.MyContentExtractor
baseUrl - http://www.example.com

The parameter baseUrl, although it is specified in the MyCpe.xml file, isn’t 
defined in MyCollectionReader.xml because it is specific to the 
MyContentExtractor class and not necessarily known at design time. However, 
UIMA isn’t passing it through to UimaContext presumably because it isn’t 
defined in the MyCollectionReader.xml.

Hope that helps clear it up.


On 24 Jul 2014, at 14:51, Eddie Epstein  wrote:

> A CPE descriptor can override configuration parameters defined in any
> integrated components.
> Documentation a little bit below
> http://uima.apache.org/d/uimaj-2.6.0/references.html#ugr.ref.xml.cpe_descriptor.descriptor.cas_processors.individual
> 3.6.1.2.  Element
> 
> This element provides a way to override the contained Analysis Engine's
> parameters settings. Any entry specified here must already be defined;
> values specified replace the corresponding values for each parameter. For
> Cas Processors, this mechanism is only available when they are deployed in “
> integrated” mode. For Collection Readers and Initializers, it always is
> available.
> 
> 
> 
> On Thu, Jul 24, 2014 at 8:19 AM, James Baker 
> wrote:
> 
>> Is it possible to provide additional configuration parameters in a CPE
>> descriptor XML file that aren't specified in the annotator/collection
>> reader descriptor XML file?
>> 
>> I have a collection reader that accepts the classname of a class to use to
>> do the content extraction as a parameter. This works fine, but I'd like to
>> be able to pass additional parameters to the content extractor via the XML.
>> The parameters will be dependant on the content extractor though, so I
>> can't specify them in the collection reader descriptor. For example,
>> ContentExtractor1 might need a parameter 'encoding', and ContentExtractor2
>> might need a parameter 'baseUrl'.
>> 
>> I have been able to achieve this with UimaFIT by creating the collection
>> reader without the XML and injecting the parameters, but when I try and do
>> it from the XML file the parameters don't make it through to my content
>> extractor (I pass the UimaContext object through to the content extractor).
>> I suspect they might be being ignored by UIMA because they aren't in the
>> descriptor. How can I work around this?
>> 
>> Thanks,
>> James
>>