Re: Restricting an aggregate engine to a substring or mention
Hi Armin, how would you do the last step: telling the nested AE to process only the mentions of the segment type? As far as I can see, it again boils down to the point that the SegmentProcessingAE would internally create one or more new CASes or view, pass those to the nested AE, and then would have to merge the results produced by the nested AE back into the original CAS. Cheers, -- Richard On 23.06.2014, at 08:53, armin.weg...@bka.bund.de wrote: Hello! I've got another maybe not so good idea. Why not pass an aggregate analysis engine as a parameter? First, build an aggregate analysis engine the usual way. Second, serialize it to an XML-string. Third, pass that string to the SegmentProcessingAE as String parameter together with another parameter denoting the segment types. Fourth, deserialize the aggregate engine. Last, Iterate over all mentions of the segment type and process each segment with the aggregate engine. Does this work? What do you think? Armin
RE: Restricting a aggregate engine to a substring or mention
Hi Armin, I'm not aware of a generic mechanism to restrict an AN's scope of processing, but I'm very new to UIMA. It seems that Petr's approach does address the general case though: if some AE doesn't support zones, create a new view containing just the content you want to have processed, and run the AE on that view. Cheers, Oli -Original Message- From: armin.weg...@bka.bund.de [mailto:armin.weg...@bka.bund.de] Sent: Friday, June 20, 2014 4:12 AM To: user@uima.apache.org Subject: AW: Restricting a aggregate engine to a substring or mention Hi Oli! If I get it right, the ability for restricting processing to mentions of given types is inherited from a base class. So every analysis engine that should do this, must inherit from that base clase. Sure, that's one way of doing it. But it's part of the analysis engine. Thanks, Armin -Ursprüngliche Nachricht- Von: Oliver Christ [mailto:ochr...@ebsco.com] Gesendet: Dienstag, 17. Juni 2014 20:48 An: user@uima.apache.org Betreff: RE: Restricting a aggregate engine to a substring or mention dkpro-core's BreakIteratorSegmenter (rather: its base class) takes the same approach. It allows you to specify that segmentation should occur within zones, defined by some other annotation type. https://code.google.com/p/dkpro-core-asl/source/browse/de.tudarmstadt.ukp.dkpro.core-asl/trunk/de.tudarmstadt.ukp.dkpro.core.api.segmentation-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/segmentation/SegmenterBase.java Usage: pipeline.add(createEngineDescription(BreakIteratorSegmenter.class, BreakIteratorSegmenter.PARAM_ZONE_TYPES, new String[] { MyZoneAnnotation.class.getName() })); Cheers, Oli -Original Message- From: Thomas Ginter [mailto:thomas.gin...@utah.edu] Sent: Tuesday, June 17, 2014 2:20 PM To: user@uima.apache.org Subject: Re: Restricting a aggregate engine to a substring or mention We do this by having a parameter for some of our standard annotators, like our RegexAnnotator, that allows the user to specify an annotation type. If a type is specified then the operations of the annotator are restricted to the covered text of the annotation type instances specified. If no annotation type is provided then the entire document is assumed. In that way we can have annotators that perform some logic to find the regions of interest and then the subsequent annotators only operate on those regions. Thanks, Thomas Ginter 801-448-7676 thomas.gin...@utah.edu On Jun 12, 2014, at 4:00 AM, Dr. Armin Wegner arminweg...@googlemail.com wrote: Hello! Is there an UIMA component which restricts an aggregated analysis engine to a substring of the document text or to mentions of a given annotation type? That is, is there a UIMA aquivalent to GATE's Segment Processing PR? Thanks, Armin
Re: Restricting a aggregate engine to a substring or mention
One other thought (probably not well-formed...): You could use 1 CAS, but multiple views. Each view can have its own subject-of-analysis. This might not work for you, though, as you might want the original subject-of-analysis in order to preserve the offset values for annotations' begin and end features. Each view can have its own set of indexes. This would enable you to index just the annotations that were in scope. But this might not work for you if the Feature Structures themselves had references beyond the scope - they would be valid and you might want them to be invalid or null or ?? With multiple views, it is possible to have cross-view references as values in the CAS, in case that was of interest. Perhaps this might be something to consider. -Marshall On 6/16/2014 12:23 PM, Richard Eckart de Castilho wrote: The CasMultiplier is not a scoping operator per se. I understood that you want to scope your AEs to specific sections of a CAS. Since there is no generic scoping operator in UIMA (that I would be aware of), the next best thing one can do (I think) is to slice the CAS into multiple CAS that each represent the scope you want to work on and merge them in the end. Cheers, -- Richard On 16.06.2014, at 11:47, Dr. Armin Wegner arminweg...@googlemail.com wrote: Hello Richard! As far as I know, CasMultipliers split the CAS in two or more new CASes, that are processed independently, and must be put together to a final CAS again. That's not what I want to do. I have only one CAS and want to add annotations to this CAS. Can this be achieved with CasMultipliers? Cheers, Armin On 6/12/14, Richard Eckart de Castilho r...@apache.org wrote: Hi Armin, the only generic approach that I am aware of would be a CasMultiplier. Different component collections may offer alternative solutions in general or in specific components. I believe Ruta has the concept of limiting rules to certain context annotation types, but I do not know if that also works when external AEs are invoked. Cheers, -- Richard On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com wrote: Hello! Is there an UIMA component which restricts an aggregated analysis engine to a substring of the document text or to mentions of a given annotation type? That is, is there a UIMA aquivalent to GATE's Segment Processing PR? Thanks, Armin
Re: Restricting a aggregate engine to a substring or mention
On Tue, Jun 17, 2014 at 06:48:15PM +, Oliver Christ wrote: dkpro-core's BreakIteratorSegmenter (rather: its base class) takes the same approach. It allows you to specify that segmentation should occur within zones, defined by some other annotation type. And for most other dkpro-core's annotators adding other linguistic features, it is thankfully typically fine to just prune the Sentence annotations to the areas you want annotated. That's the approach I'm using when I first pre-filter a document for interesting sentences, then copy just these over to another view and run the taggers and parsers on just these. Petr Pasky Baudis
Re: Restricting a aggregate engine to a substring or mention
Hi Armin, the only generic approach that I am aware of would be a CasMultiplier. Different component collections may offer alternative solutions in general or in specific components. I believe Ruta has the concept of limiting rules to certain context annotation types, but I do not know if that also works when external AEs are invoked. Cheers, -- Richard On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com wrote: Hello! Is there an UIMA component which restricts an aggregated analysis engine to a substring of the document text or to mentions of a given annotation type? That is, is there a UIMA aquivalent to GATE's Segment Processing PR? Thanks, Armin
Re: Restricting a aggregate engine to a substring or mention
Hi, Am 12.06.2014 17:39, schrieb Richard Eckart de Castilho: Hi Armin, the only generic approach that I am aware of would be a CasMultiplier. Different component collections may offer alternative solutions in general or in specific components. I believe Ruta has the concept of limiting rules to certain context annotation types, but I do not know if that also works when external AEs are invoked. yes, there is something like that, the CALL action. However, I haven't used in a long time (years) and you need to know what you are doing. The action creates a CAS dependent on the filtering setting. So, normally, with the default settings, the AE ends up with a document without whitespaces... I would not recommend using this functionality if you are not using Ruta anyway. Best, Peter Cheers, -- Richard On 12.06.2014, at 12:00, Dr. Armin Wegner arminweg...@googlemail.com wrote: Hello! Is there an UIMA component which restricts an aggregated analysis engine to a substring of the document text or to mentions of a given annotation type? That is, is there a UIMA aquivalent to GATE's Segment Processing PR? Thanks, Armin