Re: Restricting an aggregate engine to a substring or mention

2014-06-25 Thread Richard Eckart de Castilho
Hi Armin,

how would you do the last step: telling the nested AE to process only the 
mentions of the segment type? 

As far as I can see, it again boils down to the point that the 
SegmentProcessingAE would internally create one or more new CASes or view, pass 
those to the nested AE, and then would have to merge the results produced by 
the nested AE back into the original CAS.

Cheers,

-- Richard

On 23.06.2014, at 08:53, armin.weg...@bka.bund.de wrote:

 Hello!
 
 I've got another maybe not so good idea. Why not pass an aggregate analysis 
 engine as a parameter? First, build an aggregate analysis engine the usual 
 way. Second, serialize it to an XML-string. Third, pass that string to the 
 SegmentProcessingAE as String parameter together with another parameter 
 denoting the segment types. Fourth, deserialize the aggregate engine. Last, 
 Iterate over all mentions of the segment type and process each segment with 
 the aggregate engine. Does this work?
 
 What do you think?
 
 Armin



RE: Restricting a aggregate engine to a substring or mention

2014-06-20 Thread Oliver Christ
Hi Armin, 

I'm not aware of a generic mechanism to restrict an AN's scope of processing, 
but I'm very new to UIMA. 

It seems that Petr's approach does address the general case though: if some AE 
doesn't support zones, create a new view containing just the content you want 
to have processed, and run the AE on that view.

Cheers, Oli

-Original Message-
From: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de] 
Sent: Friday, June 20, 2014 4:12 AM
To: user@uima.apache.org
Subject: AW: Restricting a aggregate engine to a substring or mention

Hi Oli!

If I get it right, the ability for restricting processing to mentions of given 
types is inherited from a base class. So every analysis engine that should do 
this, must inherit from that base clase. Sure, that's one way of doing it. But 
it's part of the analysis engine.

Thanks,
Armin

-Ursprüngliche Nachricht-
Von: Oliver Christ [mailto:ochr...@ebsco.com]
Gesendet: Dienstag, 17. Juni 2014 20:48
An: user@uima.apache.org
Betreff: RE: Restricting a aggregate engine to a substring or mention

dkpro-core's BreakIteratorSegmenter (rather: its base class) takes the same 
approach. It allows you to specify that segmentation should occur within 
zones, defined by some other annotation type.

https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.segmentation-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/segmentation/SegmenterBase.java


Usage:

pipeline.add(createEngineDescription(BreakIteratorSegmenter.class,
BreakIteratorSegmenter.PARAM_ZONE_TYPES, new String[] { 
MyZoneAnnotation.class.getName() }));

Cheers, Oli

-Original Message-
From: Thomas Ginter [mailto:thomas.gin...@utah.edu]
Sent: Tuesday, June 17, 2014 2:20 PM
To: user@uima.apache.org
Subject: Re: Restricting a aggregate engine to a substring or mention

We do this by having a parameter for some of our standard annotators, like our 
RegexAnnotator, that allows the user to specify an annotation type.  If a type 
is specified then the operations of the annotator are restricted to the covered 
text of the annotation type instances specified.  If no annotation type is 
provided then the entire document is assumed.  In that way we can have 
annotators that perform some logic to find the regions of interest and then the 
subsequent annotators only operate on those regions.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




On Jun 12, 2014, at 4:00 AM, Dr. Armin Wegner arminweg...@googlemail.com 
wrote:

 Hello!
 
 Is there an UIMA component which restricts an aggregated analysis 
 engine to a substring of the document text or to mentions of a given 
 annotation type? That is, is there a UIMA aquivalent to GATE's Segment 
 Processing PR?
 
 Thanks,
 Armin



Re: Restricting a aggregate engine to a substring or mention

2014-06-17 Thread Marshall Schor
One other thought (probably not well-formed...):

You could use 1 CAS, but multiple views.

Each view can have its own subject-of-analysis.  This might not work for you,
though, as you might want the original subject-of-analysis in order to preserve
the offset values for annotations' begin and end features.

Each view can have its own set of indexes.  This would enable you to index just
the annotations that were in scope.  But this might not work for you if the
Feature Structures themselves had references beyond the scope - they would be
valid and you might want them to be invalid or null or ?? 

With multiple views, it is possible to have cross-view references as values in
the CAS, in case that was of interest.

Perhaps this might be something to consider.

-Marshall


On 6/16/2014 12:23 PM, Richard Eckart de Castilho wrote:
 The CasMultiplier is not a scoping operator per se.

 I understood that you want to scope your AEs to specific sections of a CAS.
 Since there is no generic scoping operator in UIMA (that I would be aware of),
 the next best thing one can do (I think) is to slice the CAS into multiple CAS
 that each represent the scope you want to work on and merge them in the end.

 Cheers,

 -- Richard

 On 16.06.2014, at 11:47, Dr. Armin Wegner arminweg...@googlemail.com wrote:

 Hello Richard!

 As far as I know, CasMultipliers split the CAS in two or more new
 CASes, that are processed independently, and must be put together to a
 final CAS again. That's not what I want to do. I have only one CAS and
 want to add annotations to this CAS. Can this be achieved with
 CasMultipliers?

 Cheers,
 Armin

 On 6/12/14, Richard Eckart de Castilho r...@apache.org wrote:
 Hi Armin,

 the only generic approach that I am aware of would be a CasMultiplier.

 Different component collections may offer alternative solutions
 in general or in specific components.

 I believe Ruta has the concept of limiting rules to certain context
 annotation types, but I do not know if that also works when external
 AEs are invoked.

 Cheers,

 -- Richard

 On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com
 wrote:

 Hello!

 Is there an UIMA component which restricts an aggregated analysis
 engine to a substring of the document text or to mentions of a given
 annotation type? That is, is there a UIMA aquivalent to GATE's Segment
 Processing PR?

 Thanks,
 Armin





Re: Restricting a aggregate engine to a substring or mention

2014-06-17 Thread Petr Baudis
On Tue, Jun 17, 2014 at 06:48:15PM +, Oliver Christ wrote:
 dkpro-core's BreakIteratorSegmenter (rather: its base class) takes the same 
 approach. It allows you to specify that segmentation should occur within 
 zones, defined by some other annotation type.

  And for most other dkpro-core's annotators adding other linguistic
features, it is thankfully typically fine to just prune the Sentence
annotations to the areas you want annotated.

  That's the approach I'm using when I first pre-filter a document for
interesting sentences, then copy just these over to another view and
run the taggers and parsers on just these.

Petr Pasky Baudis


Re: Restricting a aggregate engine to a substring or mention

2014-06-12 Thread Richard Eckart de Castilho
Hi Armin,

the only generic approach that I am aware of would be a CasMultiplier.

Different component collections may offer alternative solutions
in general or in specific components.

I believe Ruta has the concept of limiting rules to certain context
annotation types, but I do not know if that also works when external
AEs are invoked.

Cheers,

-- Richard

On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com wrote:

 Hello!
 
 Is there an UIMA component which restricts an aggregated analysis
 engine to a substring of the document text or to mentions of a given
 annotation type? That is, is there a UIMA aquivalent to GATE's Segment
 Processing PR?
 
 Thanks,
 Armin



Re: Restricting a aggregate engine to a substring or mention

2014-06-12 Thread Peter Klügl
Hi,

Am 12.06.2014 17:39, schrieb Richard Eckart de Castilho:
 Hi Armin,

 the only generic approach that I am aware of would be a CasMultiplier.

 Different component collections may offer alternative solutions
 in general or in specific components.

 I believe Ruta has the concept of limiting rules to certain context
 annotation types, but I do not know if that also works when external
 AEs are invoked.

yes, there is something like that, the CALL action. However, I haven't
used in a long time (years) and you need to know what you are doing. The
action creates a CAS dependent on the filtering setting. So, normally,
with the default settings, the AE ends up with a document without
whitespaces...

I would not recommend using this functionality if you are not using Ruta
anyway.

Best,

Peter





 Cheers,

 -- Richard

 On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com wrote:

 Hello!

 Is there an UIMA component which restricts an aggregated analysis
 engine to a substring of the document text or to mentions of a given
 annotation type? That is, is there a UIMA aquivalent to GATE's Segment
 Processing PR?

 Thanks,
 Armin