Re: XML files as input to UIMA?

2019-02-22 Thread José Tomás Atria
I implemented a fairly general XML collection reader using a SAX parser
that takes a handler resource that can implement the necessary logic for
dealing with the idiosyncrasies of different encoding schemes.

It was originally based on DKPro's XML readers, which are also very easy to
adapt to different formats.

Mine is available here:
https://github.com/jtatria/lector/tree/master/src/main/java/edu/columbia/incite/uima/io

It uses two components to implement a given format's logic: A "TextFilter"
resource to normalize SOFA text from XML character data and a
"MappingProvider" that implements the logic needed to process XML elements
(typically by mapping them to UIMA annotations).

If you already have coded all the logic for dealing with your source
material, it should not be too hard to adapt it for use with these
components.

Hope it is of some use. I'd be happy to answer any questions you may have.

best,
jta

On Fri, Feb 22, 2019 at 9:17 AM Bonnie MacKellar 
wrote:

> Thanks so much!
>
> Bonnie MacKellar
>
> On Fri, Feb 22, 2019 at 7:03 AM Erik Fäßler 
> wrote:
>
> > Hey,
> >
> > just wanted to say that I didn’t come around to make the component
> > available yet, will do first thing next week!
> >
> > Best,
> >
> > Erik
> >
> > > On 20. Feb 2019, at 19:47, Bonnie MacKellar 
> > wrote:
> > >
> > > Hi,
> > > Yes, we are using that format. I have a parser that I wrote, but it
> isn't
> > > integrated into UIMA. It runs separately and loads the full clinical
> > trial
> > > data into a triplestore (Stardog). I would be interested in your system
> > > since I am not really familiar with how to write file readers in the
> UMIA
> > > framework. Perhaps I can merge my parser into it and end up with just
> the
> > > right thing. If you can make it available, I would definitely be
> > > interested.  I will take a look at the other links as well.  Thanks!!
> > >
> > > Bonnie MacKellar
> > >
> > > On Wed, Feb 20, 2019 at 3:54 AM Erik Fäßler  >
> > > wrote:
> > >
> > >> Dear Bonnie,
> > >>
> > >> are you talking about the clinical trial XML format used by
> > >> ClinicalTrials. gov by any chance?
> > >> If so, I did create a UIMA reader for these data. Its not perfect but
> > >> perhaps enough for your purposes and also you might want to enhance
> it.
> > >> Please let me know if you would be interested in that, I did not get
> > >> around to make it publicly available yet but could do so quickly.
> > >>
> > >> To answer the general question to the best of my knowledge:
> > >> There is no such thing as a general XML reader built-in into the UIMA
> > >> framework. For all non-trivial formats, a specific reader is
> necessary.
> > >> This also holds true with regard to the employed type system.
> > >> That being said, there are UIMA readers that try to serve as a general
> > XML
> > >> reading facility, e.g. the “XML Reader” from our lab (JULIELab,
> > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader <
> > >> https://github.com/JULIELab/jcore-base/tree/master/jcore-xml-reader
> >).
> > >> However, in my experience XML inputs come in a lot of different forms
> > >> which might often not be suitable to a generic approach which is why I
> > >> wrote quite a few UIMA readers for specific XML formats in the past.
> > >>
> > >> Hope that helps,
> > >>
> > >> Erik
> > >>
> > >>> On 20. Feb 2019, at 01:13, Bonnie MacKellar 
> > >> wrote:
> > >>>
> > >>> This is probably a very naive question, but I can't seem to find
> > anything
> > >>> about this. I currently have a lot of XML files (clinical trial
> > >>> descriptions). My current workflow is to run a preprocessor that
> parses
> > >> the
> > >>> XML and generates text files in a simple format. I then run these
> files
> > >> in
> > >>> a UIMA pipeline, using FileCollectionReader to load the text files,
> > RUTA
> > >> to
> > >>> parse the simple format, the Metamap annotator to do some UMLS
> > >> annotations,
> > >>> and finally I have a writer that generates RDF triples from the UMIA
> > >>> annotations and loads the triples into a database. This has worked
> but
> > is
> > >>> clunky, especially the preprocessing. I feel like there has to be a
> > >> better
> > >>> way. Is there any support for reading XML files  or do I need to
> write
> > my
> > >>> own CollectionReader? Are there any other tools within UIMA for
> > handling
> > >>> XML text?
> > >>>
> > >>> thanks,
> > >>> Bonnie MacKellar
> > >>
> > >>
> >
> >
>


-- 
entia non sunt multiplicanda praeter necessitatem


@ConfParam initialization of values from symbol name of static instance?

2017-04-26 Thread José Tomás Atria
Hello all!

I know that UIMA-FIT is capable of initializing enum-typed configuration
parameters from a string equal to the name of one of the values in an enum
class. i.e. this works as documented:

public static enum SomeEnum {
A_VALUE;
}

@ConfigurationParameter( name = "example", mandatory = false,
defaultValue="A_VALUE" )
private SomeEnum enumValue;

However, I recently had to refactor one my enums into a normal class with
static members, and I found that UIMA-FIT is equally capable of
initializing these parameters from a string equal to the symbol of a class
instance set up as a static member i.e. this also works:

public static class SomeClass {
public static final SomeClass A_VALUE = new SomeClass()
}

@ConfigurationParameter( name = "example", mandatory = false,
defaultValue="A_VALUE" )
private SomeClass classValue;

Is this known behaviour? I tried stepping through the conf param
initialization logic, but I got lost in the depths of Spring, and I get the
feeling that this is basically a side-effect of spring's implementation of
enum initialization details, which seems a little unreliable... Then again,
enum types are basically immutable collections of class instances as static
members, no?

Is it reasonable to use this "feature"? if so, should it be mentioned in
the documentation?

I have attached a working example of what I mean if my explanation above
doesn't make sense.

Any comments would be welcome, I'm very curious to know why this works and
whether it is reliable enough to be used in production...

Thanks!
jta

ps: reposting from the abandoned uima-fit mailing list. Sorry for
crossposting!
-- 

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.
package demo;

import java.io.IOException;

import org.apache.uima.analysis_engine.AnalysisEngineDescription;
import org.apache.uima.analysis_engine.AnalysisEngineProcessException;
import org.apache.uima.cas.CAS;
import org.apache.uima.collection.CollectionException;
import org.apache.uima.collection.CollectionProcessingEngine;
import org.apache.uima.collection.CollectionReaderDescription;
import org.apache.uima.collection.EntityProcessStatus;
import org.apache.uima.collection.StatusCallbackListener;
import org.apache.uima.collection.metadata.CpeDescriptorException;
import org.apache.uima.fit.component.JCasAnnotator_ImplBase;
import org.apache.uima.fit.component.JCasCollectionReader_ImplBase;
import org.apache.uima.fit.cpe.CpeBuilder;
import org.apache.uima.fit.descriptor.ConfigurationParameter;
import org.apache.uima.fit.factory.AnalysisEngineFactory;
import org.apache.uima.fit.factory.CollectionReaderFactory;
import org.apache.uima.jcas.JCas;
import org.apache.uima.resource.ResourceInitializationException;
import org.apache.uima.util.InvalidXMLException;
import org.apache.uima.util.Progress;
import org.xml.sax.SAXException;

public class StaticInit extends JCasAnnotator_ImplBase {

public static final String PARAM_EX1 = "ex1";
@ConfigurationParameter( name = PARAM_EX1, mandatory = false,
defaultValue = "SOME_VALUE"
)
private Example1 ex1;

public static final String PARAM_EX2 = "ex2";
@ConfigurationParameter( name = PARAM_EX1, mandatory = false,
defaultValue = "SOME_VALUE"
)
private Example2 ex2;

@Override
public void process( JCas aJCas ) throws AnalysisEngineProcessException {
System.out.println( ex1.foo );
System.out.println( ex2.foo );
}

public static enum Example1 {
SOME_VALUE( "This is an enum", "" )
;

private final String foo;
private final String bar;

private Example1( String foo, String bar ) {
this.foo = foo;
this.bar = bar;
}
}

public static class Example2 {
public static final Example2 SOME_VALUE = new Example2( "This is a static member", "" );

private final String foo;
private final String bar;

private Example2( String foo, String bar ) {
this.foo = foo;
this.bar = bar;
}
}

public static void main( String[] args ) throws ResourceInitializationException, IOException, SAXException, CpeDescriptorException, InvalidXMLException {
CollectionReaderDescription crd = CollectionReaderFactory.createReaderDescription( SomeReader.class );

AnalysisEngineDescription ae = AnalysisEngineFactory.createEngineDescription(StaticInit.class );

CpeBuilder cpb = new CpeBuilder();
cpb.setReader( crd );
cpb.setAnalysisEngine( ae );
CollectionProcessingEngine cpe = cpb.createCpe( new SomeCallcakListener() );
cpe.process();
}

public static class SomeReader extends JCasCollectionReader_ImplBase {
private boolean next = true;
public SomeReader() {}
@Override public void getNext( JCas jCas ) throws IOException, CollectionException {}
@Override 

Re: UIMA Database

2017-01-25 Thread José Tomás Atria
Hi Wahed,

I spent some time looking for something like you mention, and in terms of
relational DBMS, I found only this: https://github.com/renaud/uima_sql but
after working on it for a while and talking with the original author, we
concluded that, following Richard's sugestion, this was generally a Bad
Idea.

You're much better off using some other type of DBMS. Personally, I would
look into a document database that can query the XMI representation of the
CASes directly along with the type system specification if you need to
retain some kind of type-checking.

If you decide to go that route, I would suggest BaseX: http://basex.org/,
as it's what I've used locally for a 3gb corpus. Just point BaseX to a dir
with the XMI files in it and you should be up and running in seconds. Then
you can write all the plumbing code you need in xquery, which is much much
nicer than SQL.

Cheers,
jta

On Wed, Jan 25, 2017 at 11:55 AM  wrote:

> Hi,
> is there any way to store processed CAS in a database? Does anyone
> have experience with that? Which database would you recommend (SQL,
> NoSQL, Graphdb,..) considering that the typesystem could be changed.
> How would one map the TypeSystem to a DB? It also should be able to
> manage big amout of data (TB or more).
>
> Thanks in advance
>
> -Wahed
>
> --
> A. Wahed Hemati
> Text-Technology Lab
> Fakultät für Informatik und Mathematik
> Johann Wolfgang Goethe-Universität Frankfurt am Main
> Senckenberganlage 31, Raum 401c
> 60325 Frankfurt am Main
> Postfach: 154
> Tel: +49 69-798-28925 <+49%2069%2079828925>
> Email: hem...@em.uni-frankfurt.de
> Web: http://www.hucompute.org/
>
> --

sent from a phone. please excuse terseness and tpyos.

enviado desde un teléfono. por favor disculpe la parquedad y los erroers.


Re: Consulta

2016-03-24 Thread José Tomás Atria
Hola Pedro,

Yo hablo español (y soy de Chile), pero no entiendo a que te refieres con
"casos de implementacion en español". Que es lo que necesitas?

Saludos,
jta

2016-03-24 14:05 GMT-04:00 Pedro Contreras Flores :

> Hola,
>
> ¿Hay algún usuario de UIMA que hable español?
>
> Necesito ver casos de implementación en español.
>
>
>
> Saludos,
>
> PEDRO CONTRERAS
>
> Chile
>
>
>
>
>
>


-- 
entia non sunt multiplicanda praeter necessitatem


ExternalResourceFactoryTest hangs when building uimafit-core 2.2.0-SNAPSHOT from github master

2016-03-21 Thread José Tomás Atria
Hello all,

Two things;

First, building uimafit-core hangs on running ExternalFactoryTest on every
line involving UIMAFramework.produceAnalysisEngine( desc ). If I comment
out those lines (basically bypassing the tests), the build completes with
no issue.

Is this some quirk of my local build environment? Is there anything I
should be doing in order to get uimafit-core to build corectly?

Second, I remember there being a public snapshots repository, maybe
co-hosted with dkpro's snapshot repos. Am I correct on this? If so, where
can I find this repo? I'm getting a little nervious having to rebuild
uimafit from source myself if I have to start disabling tests to get it to
build.

thanks!
jta

-- 
entia non sunt multiplicanda praeter necessitatem


Re: JCasGen: Import typesystem from Maven dependency jar

2016-02-05 Thread José Tomás Atria
Ok, I'll produce a stack trace and file the bug report.

By deploying locally, I meant running JCasGen from local sources so I can
step through it in a debugger. The NPE occurs when using
jcasgen-maven-plugin, and I am completely ignorant about the execution
model of maven plugins. But I'll see if I can find out more on my own,
don't worry. Oh, and I'm using netbeans.

Best,
jta


On Thu, Feb 4, 2016 at 6:37 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

>
> > On 05.02.2016, at 00:27, José Tomás Atria <jtat...@gmail.com> wrote:
> >
> > Thanks for the tip Richard! I can't believe I did not see that paragraph
> in
> > the reference guide, I read it immediately before sending my email...
> >
> > By the way, should that NullPointerException be looked into? I would love
> > to give more information, but I have no idea about how to deploy maven
> > plugins locally, so I wouldn't know how to produce a more useful bug
> > report... Would be happy to help if given a few pointers and if it's
> useful.
>
> A stack trace would be helpful. You can report this in the issue tracker at
>
>   https://issues.apache.org/jira/
>
> Use "Create", Choose "UIMA" as the product and "Core Java Framework" as
> component,
> enter description how to reproduce and stack trace.
>
> I don't think it is a huge problem that should be investigated
> immediately, but
> it is a good thing to at least collect and document known problems.
>
> Not sure what you mean by "how to deploy maven plugins locally". By
> "plugin"
> you mean a pre-built JAR, a locally checked out and built Maven project,
> or an actual Maven plugin? Btw. what IDE are you using?
>
> Cheers,
>
> -- Richard




-- 
entia non sunt multiplicanda praeter necessitatem


Re: JCasGen: Import typesystem from Maven dependency jar

2016-02-04 Thread José Tomás Atria
Thanks for the tip Richard! I can't believe I did not see that paragraph in
the reference guide, I read it immediately before sending my email...

By the way, should that NullPointerException be looked into? I would love
to give more information, but I have no idea about how to deploy maven
plugins locally, so I wouldn't know how to produce a more useful bug
report... Would be happy to help if given a few pointers and if it's useful.

Best,
jta

On Thu, Feb 4, 2016 at 5:26 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

> Hi,
>
> use an import-by-name that looks up the resource in the classpath, e.g.
>
>   
> 
> 
>   
>
> Looks for the "desc/type/POS.xml" and "desc/type/Morpheme.xml" in the
> classpath (i.e. within your Maven dependencies).
>
> See also
> https://uima.apache.org/d/uimaj-current/references.html#ugr.ref.xml.component_descriptor.imports
>
> Cheers,
>
> -- Richard
>
> > On 04.02.2016, at 21:25, José Tomás Atria <jtat...@gmail.com> wrote:
> >
> > Hello All,
> >
> > I'm using JCasGen to generate the type system for a specific research
> > project. This type system is an extension of a generic type system that
> we
> > use internally in our lab.
> >
> > Until now, I have been doing this by copying the XML descriptor for the
> > generic type system to a file in my development machine, and then
> including
> > a hard coded reference to its path in the specified type system, but I
> > would like to remove that hard-coded reference and reference instead the
> > copy of the type system description that is included in the resources of
> > our lab's API library, distributed over maven.
> >
> > e.g. right now, my type system descriptor includes the line:
> >
> > 
> >
> > I want to remove that reference and instead pull "LabTypeSystem.xml" from
> > the resources contained in the jar of a maven dependency.
> >
> > I tried doing this instead:
> >
> > 
> >
> > (i.e. the URL I got from
> > doing getClass().getResource("desc/type/LabTypeSystem.xml").toString() )
> >
> > but this resulted in a NullPointerException (which is surely a bug in
> > JCasGen...), but in any case, that would just replace one hardcoded
> > reference by another (i.e. to the local maven repo).
> >
> > Is there any way to make JCasGen resolve resources in mavenized jars?
> >
> > Hope I made myself clear...
> >
> > Thanks!
> > jta.
>
>


-- 
entia non sunt multiplicanda praeter necessitatem


JCasGen: Import typesystem from Maven dependency jar

2016-02-04 Thread José Tomás Atria
Hello All,

I'm using JCasGen to generate the type system for a specific research
project. This type system is an extension of a generic type system that we
use internally in our lab.

Until now, I have been doing this by copying the XML descriptor for the
generic type system to a file in my development machine, and then including
a hard coded reference to its path in the specified type system, but I
would like to remove that hard-coded reference and reference instead the
copy of the type system description that is included in the resources of
our lab's API library, distributed over maven.

e.g. right now, my type system descriptor includes the line:



I want to remove that reference and instead pull "LabTypeSystem.xml" from
the resources contained in the jar of a maven dependency.

I tried doing this instead:



(i.e. the URL I got from
doing getClass().getResource("desc/type/LabTypeSystem.xml").toString() )

but this resulted in a NullPointerException (which is surely a bug in
JCasGen...), but in any case, that would just replace one hardcoded
reference by another (i.e. to the local maven repo).

Is there any way to make JCasGen resolve resources in mavenized jars?

Hope I made myself clear...

Thanks!
jta.

-- 
entia non sunt multiplicanda praeter necessitatem


Re: Selecting all connected annotations by type.

2015-10-26 Thread José Tomás Atria
Hi Jens,

I did indeed use those methods for a while and they were working fine, but
I was mostly using them to perform sanity checks on some arbitrary span
annotations, and once I made sure those were being created OK I reverted
back to stock uima-fit.

They should be ok, though; the patch basically adds very little variation
on Richard's methods for indexing covered/covering annotations, and while I
was using them, they worked.

Funny coincidence: Just a few days ago I was thinking that it would be
possible to provide most of the indexing features of CASUtil via interval
trees. No idea about how expensive this would be, though. This may be a
good direction to look into if you happen to feel inclined to rewrite those
methods :)

Best,
jta



On Mon, Oct 26, 2015 at 11:58 AM, Jens Grivolla  wrote:

> Ok Richard, I'll look into it, but I don't promise anything at this point
> (tons of project deliverables coming up)...
>
> -- Jens
>
> On Fri, Oct 23, 2015 at 2:03 PM, Richard Eckart de Castilho <
> r...@apache.org>
> wrote:
>
> > Hi Jens,
> >
> > :) don't you want to test and apply it? My next projected time slot for
> > uimaFIT is in December.
> >
> > Best,
> >
> > -- Richard
> >
> > > On 23.10.2015, at 11:09, Jens Grivolla  wrote:
> > >
> > > I'd really like to have that functionality also (we'll need to do
> > something
> > > like that quite soon), so I just voted on the issue...
> > >
> > > I haven't tested the patch yet. José, have you been using this over the
> > > last few months?
> > >
> > > -- Jens
> >
> >
>



-- 
entia non sunt multiplicanda praeter necessitatem


Re: nconsistent API for engine and resource creation?

2015-10-21 Thread José Tomás Atria
Hi Richard,

Thanks for your response. I'll take a look into it and see if there is
non-too-hackish way of working around that limitation in the uimafit
factory that does not involve changes to the underlying uima code... I'll
report if I come up with something :)

best,
jta

On Tue, Oct 20, 2015 at 2:22 PM, Richard Eckart de Castilho <r...@apache.org>
wrote:

> Hi,
>
> UIMA supports different types of resource specifiers that can be used
> for external resources. Some of them support the same types of
> parameters as regular UIMA components, other support only String
> parameters.
>
> If you look a bit up from line 177, you'll see another part of an if
> statement which does not the cast - which is for resources created through
> a ConfigurableDataResourceSpecifier.
>
> At the time I wrote this, I didn't find a way to convince UIMA to accept
> non-String parameters on other kinds of resources... unless I guess I
> would have had to make changes to the factoryConfig.xml file and actually
> implement a new kind of specifier.
>
> See also https://issues.apache.org/jira/browse/UIMA-2978
>
> Maybe you have an idea how to solve this ;)
>
> Best,
>
> -- Richard
>
> > On 20.10.2015, at 19:01, José Tomás Atria <jtat...@gmail.com> wrote:
> >
> > I had posted the message below to the old uimafit-users list and didn't
> > notice it was no onger being used. See message below.
> >
> > =
> >
> > Hello,
> >
> > i just noticed that the method for creation of analysis engines and
> > external resources is different.
> >
> > For AE's, this works:
> >
> > AnalysisEngineFactory.createEngineDescription(
> >   SomeEngine.class,SomeEngine.PARAM_BOOLEAN, true
> > )
> >
> > But for external resources, the same syntax fails with a
> ClassCastException
> >
> > ExternalResourceFactory.createExternalResourceDescription(
> >SomeResource.class, SomeResource.PARAM_BOOLEAN, true
> > )
> >
> > Looking at the code, I see that
> > ExternalResourceFactory.createExternalResourceDescription(String,Class > extends Resource>,Object...), which is called by the method above,
> actually
> > casts parameter values to String on line 177.
> >
> > Why is this so? Wouldn't it be preferable to have a consistent interface
> > for all component types?
> >
> > Thanks!
> > jta
>
>


-- 
entia non sunt multiplicanda praeter necessitatem


nconsistent API for engine and resource creation?

2015-10-20 Thread José Tomás Atria
I had posted the message below to the old uimafit-users list and didn't
notice it was no onger being used. See message below.

=

Hello,

i just noticed that the method for creation of analysis engines and
external resources is different.

For AE's, this works:

AnalysisEngineFactory.createEngineDescription(
   SomeEngine.class,SomeEngine.PARAM_BOOLEAN, true
)

But for external resources, the same syntax fails with a ClassCastException

ExternalResourceFactory.createExternalResourceDescription(
SomeResource.class, SomeResource.PARAM_BOOLEAN, true
)

Looking at the code, I see that
ExternalResourceFactory.createExternalResourceDescription(String,Class,Object...), which is called by the method above, actually
casts parameter values to String on line 177.

Why is this so? Wouldn't it be preferable to have a consistent interface
for all component types?

Thanks!
jta


-- 
entia non sunt multiplicanda praeter necessitatem


How to correclty implement delta serialization in locally deployed CPE pipeline?

2015-09-29 Thread José Tomás Atria
Hello all,

I've been trying to wrap my head around this for a while, and I can't seem
to get it to work. Could someone please explain what is the most
straightforward way of implementing delta serialization in a local,
multithreaded CPE pipeline?

So far, I've tried using a collection reader that uses a
SharedSerializationData that is stored in the current UIMA session, and
creates a CAS marker that is also stored in a map in the current UIMA
session under a CAS identifier key, and then using this
SharedSerializationData oject and the marker retrieved from the UIMA
session from the CAS identifier to serialize the delta to disk, but this
procedure causes an OutOfMemory exception if I try to process all of my
data (Not that much in my opinion, ~2000 CASes).

I assume that I'm missing some basic aspect of the API, but after trying to
deal with it for a while I just gave up...

A more specific version, as far as I could understand: Delta serialization
requires a SharedSerializationData object and a CAS marker. What is the
correct way to create, store and retrieve these in a simple,
multi-threaded, locally deployed CPE processing pipeline? (i.e. No need to
support AS or DUCC facilities, etc).

Any help would be greatly appreciated.
Thanks!
jta

-- 
entia non sunt multiplicanda praeter necessitatem


Re: Selecting all connected annotations by type.

2015-01-31 Thread José Tomás Atria
Issue created, patch submitted.

https://issues.apache.org/jira/browse/UIMA-4212

On Sat Jan 31 2015 at 3:12:33 AM Richard Eckart de Castilho r...@apache.org
wrote:

 Dear José,

 could you please re-submit the patch via the Apache UIMA issue tracker:

 Thanks!

 -- Richard

 https://issues.apache.org/jira/browse/UIMA

 On 31.01.2015, at 05:38, José Tomás Atria jtat...@gmail.com wrote:

  Please disregard the previous patch, apparently I managed to corrupt it
 while creating it over ssh.
 
  The version in this email should be correct, I hope.
 
  Best,
  jta




Re: Selecting all connected annotations by type.

2015-01-30 Thread José Tomás Atria
Dear Richard:

I am attaching a patch with a series of selectIntersects and
indexIntersects methods. There's more signatures than the corresponding
selectCovered/Covering methods, as intersects could include covering
annotations or not. If they are excluded, the methods use the same approach
you used in selecCovered, advancing the iterator. Otherwise, they defer to
the int interval method used in selectCovering.

Maybe this is useful to someone else besides myself?

Also, I have no experience with unit testing, so I didn't even try adding
to add tests for the new methods. I did some naive testing by hand, and it
seems to work... but I'm particularly bad with interval operations, so I
wouldn't be surprised if I made some egregious error. My apologies in
advance.

Best,
jta


On Mon Jan 26 2015 at 3:19:37 PM José Tomás Atria jtat...@gmail.com wrote:

 Cool, I'll look into ti and let you know if I manage to make something
 useful. Thanks for the tips.

 On Sun Jan 25 2015 at 12:47:52 PM Richard Eckart de Castilho 
 r...@apache.org wrote:

 Hi José,

 we had no need for such a method so far ;) The easiest way would probably
 be to copy the
 selectCovering method from uimaFIT and adjust it to catch all
 intersecting annotations.
 You can probably add an optimization to a selectIntersecting method which
 breaks the loop as soon as the begin offset of an annotation is larger than
 the end offset of your intersection range.

 Cheers,

 -- Richard

 On 24.01.2015, at 22:25, José Tomás Atria jtat...@gmail.com wrote:

  Hello all,
 
  I am looking for the best approach to select all annotations of a given
  type that intersect an annotation of a different type.
 
  I am aware of selectCovered and selectCovering, which, as far as I
  understand, will select all annotations (of a given type) that cover
 ranges
  of text which are, respectively, subsets or supersets of another
  annotation. Is there a similar method for annotations that cover ranges
  which merely _intersect_ with the range covered by a given annotation?
 
  What would the recommended way of achieving this?
 
  Any help would be apreciated. Thanks!
  jta.
 
  --
  entia non sunt multiplicanda praeter necessitatem


Index: src/main/java/org/apache/uima/fit/util/CasUtil.java
===
--- src/main/java/org/apache/uima/fit/util/CasUtil.java	(revision 1656160)
+++ src/main/java/org/apache/uima/fit/util/CasUtil.java	(working copy)
@@ -616,6 +616,237 @@
   }
 
   /**
+   * Get a list of annotations of the given type that intersect a certain annotation.
+   * Iterates over all annotations of the given type to find intersecting annotations. Does not use
+   * subiterators and does not respect type priorities. Was adapted from {@link Subiterator}. Uses
+   * the same approach except that type priorities are ignored.
+   * p
+   * The intersecting annotation is never returned itself, even if it is of the queried-for type or
+   * a subtype of that type.
+   *
+   * Covering annotations are excluded. Use {@link selectIntersects(Type, AnnotationFS, boolean)} if
+   * you want to include covering annotations, but this is significantly slower.
+   *
+   * @param type
+   *the UIMA type of annotations to select
+   * @param intersect
+   *the annotation to select intersects for
+   * @return
+   *a list of annotations of the given type that intersect the given annotation.
+   * @see Subiterator
+   * @see a href=package-summary.html#SortOrderOrder of selected feature structures/a
+   */
+  public static ListAnnotationFS selectIntersects(Type type, AnnotationFS intersect) {
+return selectIntersects(intersect.getView(), type, intersect, false);
+  }
+
+  /**
+   * Get a list of annotations of the given type that intersect a certain annotation.
+   * Iterates over all annotations of the given type to find intersecting annotations. Does not use
+   * subiterators and does not respect type priorities. Was adapted from {@link Subiterator}. Uses
+   * the same approach except that type priorities are ignored.
+   * p
+   * The intersecting annotation is never returned itself, even if it is of the queried-for type or
+   * a subtype of that type.
+   *
+   * @param type
+   *the UIMA type of annotations to select
+   * @param intersect
+   *the annotation to select intersects for
+   * @param covering
+   *if true, covering annotations are included, but this will be slower.
+   * @return
+   *a list of annotations of the given type that intersect the given annotation.
+   * @see Subiterator
+   * @see a href=package-summary.html#SortOrderOrder of selected feature structures/a
+   */
+  public static ListAnnotationFS selectIntersects(Type type, AnnotationFS intersect, boolean covering) {
+return selectIntersects(intersect.getView(), type, intersect, covering);
+  }
+
+  /**
+   * Get a list of annotations of the given type that intersect a certain

Re: Selecting all connected annotations by type.

2015-01-30 Thread José Tomás Atria
Please disregard the previous patch, apparently I managed to corrupt it
while creating it over ssh.

The version in this email should be correct, I hope.

Best,
jta

On Fri Jan 30 2015 at 11:09:10 PM José Tomás Atria jtat...@gmail.com
wrote:

 Dear Richard:

 I am attaching a patch with a series of selectIntersects and
 indexIntersects methods. There's more signatures than the corresponding
 selectCovered/Covering methods, as intersects could include covering
 annotations or not. If they are excluded, the methods use the same approach
 you used in selecCovered, advancing the iterator. Otherwise, they defer to
 the int interval method used in selectCovering.

 Maybe this is useful to someone else besides myself?

 Also, I have no experience with unit testing, so I didn't even try adding
 to add tests for the new methods. I did some naive testing by hand, and it
 seems to work... but I'm particularly bad with interval operations, so I
 wouldn't be surprised if I made some egregious error. My apologies in
 advance.

 Best,
 jta


 On Mon Jan 26 2015 at 3:19:37 PM José Tomás Atria jtat...@gmail.com
 wrote:

 Cool, I'll look into ti and let you know if I manage to make something
 useful. Thanks for the tips.

 On Sun Jan 25 2015 at 12:47:52 PM Richard Eckart de Castilho 
 r...@apache.org wrote:

 Hi José,

 we had no need for such a method so far ;) The easiest way would
 probably be to copy the
 selectCovering method from uimaFIT and adjust it to catch all
 intersecting annotations.
 You can probably add an optimization to a selectIntersecting method
 which breaks the loop as soon as the begin offset of an annotation is
 larger than the end offset of your intersection range.

 Cheers,

 -- Richard

 On 24.01.2015, at 22:25, José Tomás Atria jtat...@gmail.com wrote:

  Hello all,
 
  I am looking for the best approach to select all annotations of a given
  type that intersect an annotation of a different type.
 
  I am aware of selectCovered and selectCovering, which, as far as I
  understand, will select all annotations (of a given type) that cover
 ranges
  of text which are, respectively, subsets or supersets of another
  annotation. Is there a similar method for annotations that cover ranges
  which merely _intersect_ with the range covered by a given annotation?
 
  What would the recommended way of achieving this?
 
  Any help would be apreciated. Thanks!
  jta.
 
  --
  entia non sunt multiplicanda praeter necessitatem


Index: src/main/java/org/apache/uima/fit/util/CasUtil.java
===
--- src/main/java/org/apache/uima/fit/util/CasUtil.java	(revision 1656160)
+++ src/main/java/org/apache/uima/fit/util/CasUtil.java	(working copy)
@@ -616,6 +616,239 @@
   }
 
   /**
+   * Get a list of annotations of the given type that intersect a certain annotation.
+   * Iterates over all annotations of the given type to find intersecting annotations. Does not use
+   * subiterators and does not respect type priorities. Was adapted from {@link Subiterator}. Uses
+   * the same approach except that type priorities are ignored.
+   * p
+   * The intersecting annotation is never returned itself, even if it is of the queried-for type or
+   * a subtype of that type.
+   *
+   * Covering annotations are excluded. Use {@link selectIntersects(Type, AnnotationFS, boolean)} if
+   * you want to include covering annotations, but this is significantly slower.
+   *
+   * @param type
+   *the UIMA type of annotations to select
+   * @param intersect
+   *the annotation to select intersects for
+   * @return
+   *a list of annotations of the given type that intersect the given annotation.
+   * @see Subiterator
+   * @see a href=package-summary.html#SortOrderOrder of selected feature structures/a
+   */
+  public static ListAnnotationFS selectIntersects(Type type, AnnotationFS intersect) {
+return selectIntersects(intersect.getView(), type, intersect, false);
+  }
+
+  /**
+   * Get a list of annotations of the given type that intersect a certain annotation.
+   * Iterates over all annotations of the given type to find intersecting annotations. Does not use
+   * subiterators and does not respect type priorities. Was adapted from {@link Subiterator}. Uses
+   * the same approach except that type priorities are ignored.
+   * p
+   * The intersecting annotation is never returned itself, even if it is of the queried-for type or
+   * a subtype of that type.
+   *
+   * @param type
+   *the UIMA type of annotations to select
+   * @param intersect
+   *the annotation to select intersects for
+   * @param covering
+   *if true, covering annotations are included, but this will be slower.
+   * @return
+   *a list of annotations of the given type that intersect the given annotation.
+   * @see Subiterator
+   * @see a href=package-summary.html#SortOrderOrder of selected feature structures/a
+   */
+  public static

Re: Selecting all connected annotations by type.

2015-01-26 Thread José Tomás Atria
Cool, I'll look into ti and let you know if I manage to make something
useful. Thanks for the tips.

On Sun Jan 25 2015 at 12:47:52 PM Richard Eckart de Castilho r...@apache.org
wrote:

 Hi José,

 we had no need for such a method so far ;) The easiest way would probably
 be to copy the
 selectCovering method from uimaFIT and adjust it to catch all intersecting
 annotations.
 You can probably add an optimization to a selectIntersecting method which
 breaks the loop as soon as the begin offset of an annotation is larger than
 the end offset of your intersection range.

 Cheers,

 -- Richard

 On 24.01.2015, at 22:25, José Tomás Atria jtat...@gmail.com wrote:

  Hello all,
 
  I am looking for the best approach to select all annotations of a given
  type that intersect an annotation of a different type.
 
  I am aware of selectCovered and selectCovering, which, as far as I
  understand, will select all annotations (of a given type) that cover
 ranges
  of text which are, respectively, subsets or supersets of another
  annotation. Is there a similar method for annotations that cover ranges
  which merely _intersect_ with the range covered by a given annotation?
 
  What would the recommended way of achieving this?
 
  Any help would be apreciated. Thanks!
  jta.
 
  --
  entia non sunt multiplicanda praeter necessitatem