Re: Adding methods to UIMA annotation types defined in XML

2019-11-18 Thread Olga Patterson
If you just want an easier way to manipulate annotations, you can have a 
utility class that would have all methods separate from the annotation types
Example code from 
http://decipher.chpc.utah.edu/nexus/index.html#nexus-search;classname~Annotationlibrarian

  gov.va.vinci
  leo-annotation-librarian
  2018.01.0




On Nov 15, 2019, at 10:58 AM, Marshall Schor  wrote:

Even though the JCas classes can be generated from the XML file, you are 
allowed
to add additional things to those source files, including

   - additional fields

   - additional methods

See
http://uima.apache.org/d/uimaj-2.10.4/references.html#ugr.ref.jcas.augmenting_generated_code

-Marshall

On 11/15/2019 9:50 AM, Alain Désilets wrote:
On Thu, Nov 14, 2019 at 4:51 PM Richard Eckart de Castilho 
wrote:

Sure. You generate the JCas classes once and then you add the methods you
want
to them. Cf. e.g.


https://github.com/dkpro/dkpro-core/blob/8043e10bf10a61fe47e21946ea609bda9f2278a0/dkpro-core-api-metadata-asl/src/main/java/de/tudarmstadt/ukp/dkpro/core/api/metadata/type/DocumentMetaData.java#L290-L447

I know how to create a subclass of Annotation, RelationAnnotation in my
case. The problem is that if I try to use this subclass in an Annotator,
UIMA complains that RelationAnnotation is not in the UIMA type system, and
it lists the available types. This list is essentially the list of types
defined in some UIMA xml file. This tells me that only those annotation
classes defined in the xml file can be used in an Annotator. Or at least,
that I am missing a step for registering my RelationAnnotation class with
the UIMA type system.

On the other hand, if I define the RelationAnnotation in the xml file, I
can use it in an Annotator but then I can't figure out how to add methods
to it, since the Java source for that class is generated automatically (by
some UIMA maven plugin I presume).

But the question is: why do you want to add new methods? (and is it really
a good idea?)

Essentially, I want to add methods for "derived attributes", i.e.
attributes whose values are computed from primitive attributes defined in
the xml file.

I guess I could make those attributes be primitve (i.e. defined in the xml
file), but then, any annotator that creates a RelationAnnotation would have
to make sure to set those other attributes correctly. I would much rather
have the RelationAnnotation class compute those derived attributes itself,
as it garantees that they will always be computed the same way.

Alain



Re: Deploy Async Service without XML

2018-05-15 Thread Olga Patterson
We have developed a framework to deal with UIMA AS programmatically. No dealing 
with XML ever. 

Feel free to check it out:
http://department-of-veterans-affairs.github.io/Leo/

Olga


-Original Message-
From:  on behalf of Jaroslaw Cwiklik 
Reply-To: "user@uima.apache.org" 
Date: Tuesday, May 15, 2018 at 2:58 AM
To: "user@uima.apache.org" 
Subject: Re: Deploy Async Service without XML

You can generate UIMA-AS deployment descriptor programmatically. You still
need AE descriptor though.

Please check UIMA-AS documentation:

https://uima.apache.org/d/uima-as-2.10.3/uima_async_scaleout.html#ugr.ref.async.api.usage
section 4.10 Generating Deployment Descriptor Programmatically

Jerry

On Mon, May 14, 2018 at 9:47 PM, Osborne, John D  wrote:

> Is it possible to deploy an UIMA-AS service without an XML descriptor
> similar to how UIMA-FIT works? I currently deploy services using
> deployAsyncService.sh
>
> I have multiple long running services that need to work in different
> (production, testing, dev) environments and would prefer to avoid having 
an
> XML file for each service. I realize that with some refactoring (like
> removing environment specific parameters) this number of XML files could 
be
> reduced, but I've become spoiled with UIMA-FIT. :)
>
> I'm looking at the toXML()  function so I can potentially generate the
> aggregate analysis engine with UIMA-FIT.
>
> -John
>
>




Re: Synchonizing Batches AE and StatusCallbackListener

2017-04-21 Thread Olga Patterson
Erik,

My team at the VA have developed an easy way of implementing UIMA AS pipelines 
and scaling them to a large number of nodes - using Leo framework that extends 
UIMA AS 2.8.1. We have run pipelines on over 200M documents scaled across 
multiple nodes with dozens of service instances and it performs great.

Here is some info:
http://department-of-veterans-affairs.github.io/Leo/

The documentation for Leo reflects an earlier version of Leo, but if you are 
interested in using it with Java 8 and UIMA 2.8.1, we have not released the 
latest version in on the VA github yet but we can share it with you so that you 
can test it out and possibly provide your comments back to us.

Leo has simple-to-use functionality for flexible batch read and write and it 
can work with any UIMA AEs and existing descriptor files and type system 
descriptions, so if you already have a pipeline, wrapping it with Leo would 
take just a few lines of code.

Let me know if you are interested and I can help you to get started.

Olga Patterson 







-Original Message-
From: Jaroslaw Cwiklik 
Reply-To: "user@uima.apache.org" 
Date: Friday, April 21, 2017 at 8:08 AM
To: "user@uima.apache.org" 
Subject: Re: Synchonizing Batches AE and StatusCallbackListener

Erik, thanks. This is more clear what you are trying to accomplish. First,
there are no plans to retire the CPE. It is supported and I don't know of
any plans to retire it. The only issue is ongoing development. My efforts
are focused on extending and improving UIMA-AS.

I don't have an answer yet how to handle the CPE crash scenario with
respect to batching and subsequent restart from the last known good batch.
Seems like some coordination would be needed to avoid redoing the whole
collection after a crash. Its been awhile since I've looked at the CPE.
Will take a look and see what is possible if anything.

There is another Apache UIMA project called DUCC which stands for
Distributed Uima Cluster Computing. From your email it looks like you have
a cluster of machines available. Here is a quick description of DUCC:

DUCC is a Linux cluster controller designed to scale out any UIMA pipeline
for high throughput collection processing jobs as well as for low latency
real-tme applications. Building on UIMA-AS, DUCC is particularly well
suited to run large memory Java analytics in multiple threads in order to
fully utilize multicore machines. DUCC manages the life cycle of all
processes deployed across the cluster, including non-UIMA processes such as
tomcat servers or VNC sessions.

 You can find more info on this here:
https://uima.apache.org/doc-uimaducc-whatitam.html

In UIMA-AS batching is an application concern. I am a bit fuzzy on
implementation so perhaps someone else can comment how to implement
batching and how to handle errors. You can use a CasMultipler and a custom
FlowController to manage CASes and react to errors.The UIMA-AS service can
take an input CAS representing your batch, pass it on to the CasMultiplier,
generate CASes for each piece of work and deliver results to the
CasConsumer with a FlowController in the middle orchestrating the flow. I
defer to application deployment experts to provide you with more detail.

Jerry







On Fri, Apr 21, 2017 at 2:21 AM, Erik Fäßler 
wrote:

> Hi Jerry,
>
> thanks a lot for your answer! I’m sorry that I didn’t make myself clearer.
> I will try again! :-)
> Here comes a lot of text, sorry for that. The post actually has two parts:
> The first explaining my issue, the second responding to the pointer to
> UIMA-AS.
>
> First: Yes, I use a CPE. I process text documents. Tens of millions of
> them.
> So, I have the following components to my issue, running with the CPE.
>
> 1. A CAS-Consumer (just an AnalysisEngine internally, of course). This
> consumer is responsible to serialise the document CAS into XMI and send 
the
> XMI to a database. It is a XMI-to-database consumer. For performance
> reasons, the XMI of multiple CASes is buffered and then sent as a batch,
> lets say, 50 CAS XMIs at a time.
> 2. A CPE StatusCallbackListener which also writes to the same database,
> but in another table. It logs into the database which documents have been
> successfully processed by the CPE. It also works on a batch basis.
>
> The goal: The CallbackListener should only mark those documents as
> successfully processed (i.e. as “finished”) where the CAS-Consumer 
actually
> has sent the XMI data to the database.
>
> Reason: I don’t want documents marked as “finished” where the XMI data is
> not in the database but still in