Re: UIMAj3 ideas

2015-07-16 Thread Richard Eckart de Castilho

On 16.07.2015, at 18:52, Petr Baudis pa...@ucw.cz wrote:

 On Fri, Jul 10, 2015 at 01:37:27PM -0400, Marshall Schor wrote:
 
  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
UIMA.  It seems to me that UIMA-AS is doing things a bit differently
than what the original UIMA idea of doing scaleout was.  The two
things don't play well together.  I'd love a way to easily take
my plain UIMA pipeline and scale it out, ideally without any code
changes, *and* avoid the terrible XML config files.
 Any specifics of what to change here would be helpful.  UIMA-AS was designed 
 to
 enable scale-out without changing the core UIMA pipeline or it's XML
 descriptor.  THe additional information for UIMA-AS scaleout was put into a
 separate xml descriptor which embeds the original plain UIMA one.
 
  I'm sure Richard would be able to explain this better, but I think one
 of the core issues is that UIMA-AS embeds the XML descriptor instead of
 the AnalysisEngineDescription.  So when I want to use it together with
 AnalysisEngineDescription built with UIMAfit instead, it's time to
 start making crazy workarounds like

Afaik, there is no API in UIMA-AS that allows inject an 
AnalysisEngineDescription
into an UIMA-AS descriptor. UIMA-AS forces one to use an import, so the AED
needs to be serialized and then imported again by UIMA-AS... or I just never
found the right method call or missed when it was added. In fact, I didn't
even find an API to programmatically create a UIMA-AS descriptor and at the
time saw myself forced to implement a AsDeploymentDescription.java myself.

See: 
https://code.google.com/p/dkpro-lab/source/browse/de.tudarmstadt.ukp.dkpro.lab/de.tudarmstadt.ukp.dkpro.lab.uima.engine.uimaas/src/main/java/de/tudarmstadt/ukp/dkpro/lab/uima/engine/uimaas/

  * Connected with the above - I'd love .addToIndexes() to just
disappear.  Right now, the paradigm is that you build an annotation
in an annotator, and the moment it gets saved in a CAS, it becomes
basically read-only.  
 You certainly can modify any of an Annotation's features subsequently.
 I'm guessing you're referring to another idea - adding additional features 
 that were
 not initially defined in the UIMA type system.
 
  Sorry for the confusion, but that's not quite what I had in mind.
 I literally believe that right now, in order to modify value of
 a feature, you need to first remove it from an index, change the
 value, then re-add it back.  Is that a misconception?

Well, yes and no. Yes, it was required for the case where the value that
you changed was on a feature that was part of some index. No, it should
no longer be required as measures have been implemented to handle this
automatically.

See: The curious case of the zombie annotation aka UIMA-4049

https://issues.apache.org/jira/browse/UIMA-4049

  I think that's a bug for the UIMA Tutorial, which mentions FSArray but
 not FSList.  :-)

Then I should tell you also about the uimaFIT FSCollectionFactory which
contains all kinds of helpers to manage FSArray and FSList ;)

Btw. there is also ArrayFS which is the CAS version of FSArray :P

  (Another pain point here - I always ache when I need to work with
 FSArray or I guess FSList, since it does not carry the type information
 that is in the typesystem - I need to manually typecast all the time
 and hope I don't make a mistake.)

Did you know that uimaFIT JCasUtil.select() can also be applied to
FSList and FSArray to avoid casting?

for (Token t : JCasUtil.select(sentence.getTokens(), Token.class) {
  ...
}

CasUtil.select() can work also on ArrayFS

Cheerio,

-- Richard

Travel funding for ApacheCon EU Budapest - need to act today!

2015-07-16 Thread Marshall Schor
From the Apache Travel assistance committee:

HI All,

This is a reminder that currently applications are open for Travel Assistance to
go to ApacheCon EU Budapest 
this coming September/October.

Applications close tomorrow night so if you have not applied yet and intend to
do so, please act now!

For those that have submitted talks for this event and have not heard back as to
whether or not they will be 
accepted or not; and you intend to apply for assistance based on getting your
talks accepted — please DO 
apply for assistance now anyway, should your talk not be accepted, your
assistance application can be 
cancelled.

See apache.org/travel http://apache.org/travel for more info. 
See https://cwiki.apache.org/confluence/display/TAC/Application+Criteria for
more about the process.

Thanks and hope to see you all in Budapest!

Gav… (On behalf of the Travel Assistance Committee)


Re: UIMAj3 ideas

2015-07-16 Thread Petr Baudis
On Fri, Jul 10, 2015 at 01:37:27PM -0400, Marshall Schor wrote:
 On 7/9/2015 6:52 PM, Petr Baudis wrote:
 snip...
 
 https://cwiki.apache.org/confluence/display/UIMA/Ideas+for+UIMAJ+v3
 
I didn't figure out how to edit that wiki page, 
 Due to spammers, we had to turn off public editing.  However, I can add you 
 to a
 list ( to do this, you have to register for a user id on the wiki, and then
 send me offline what that Id is ), but even without being on the list, 
 there's a
 comment button which (I think) lets you add comments at the bottom.
  but a mental summary
  of the things I find currently irritating about UIMA and would love to
  see changed formed in my mind, so I thought I could contribute it for
  discussion.
 Great!
 
* UIMAfit is not part of core UIMA and UIMA-AS is not part of core
  UIMA.  It seems to me that UIMA-AS is doing things a bit differently
  than what the original UIMA idea of doing scaleout was.  The two
  things don't play well together.  I'd love a way to easily take
  my plain UIMA pipeline and scale it out, ideally without any code
  changes, *and* avoid the terrible XML config files.
 Any specifics of what to change here would be helpful.  UIMA-AS was designed 
 to
 enable scale-out without changing the core UIMA pipeline or it's XML
 descriptor.  THe additional information for UIMA-AS scaleout was put into a
 separate xml descriptor which embeds the original plain UIMA one.

  I'm sure Richard would be able to explain this better, but I think one
of the core issues is that UIMA-AS embeds the XML descriptor instead of
the AnalysisEngineDescription.  So when I want to use it together with
AnalysisEngineDescription built with UIMAfit instead, it's time to
start making crazy workarounds like


https://code.google.com/p/dkpro-lab/source/browse/de.tudarmstadt.ukp.dkpro.lab/de.tudarmstadt.ukp.dkpro.lab.uima.engine.uimaas/src/main/java/de/tudarmstadt/ukp/dkpro/lab/uima/engine/uimaas/component/SimpleService.java?name=14aeba50c8c1r=14aeba50c8c18ea4d14c0d099f43c049f806d9db

* Connected with the above - I'd love .addToIndexes() to just
  disappear.  Right now, the paradigm is that you build an annotation
  in an annotator, and the moment it gets saved in a CAS, it becomes
  basically read-only.  
 You certainly can modify any of an Annotation's features subsequently.
 I'm guessing you're referring to another idea - adding additional features 
 that were
 not initially defined in the UIMA type system.

  Sorry for the confusion, but that's not quite what I had in mind.
I literally believe that right now, in order to modify value of
a feature, you need to first remove it from an index, change the
value, then re-add it back.  Is that a misconception?

 UIMA sets up the types and
 features once at the start of the pipeline run (from a merge of all the
 component's type systems), and locks down the type system.  Other frameworks
 sometimes allow an unlocked type system, where you could add (after a Feature
 Structure is created) additional features.  This is usually done by keeping a
 list of feature-name - feature-value pairs (such as your code snippet does,
 below).  We're thinking of including this capability in the version 3, with a
 bit of a twist - the intent would be to keep the compilable aspect of
 locked-down type/features (for high performance), while adding (for those 
 use
 cases that want it) the other style of dynamically added additional features 
 (at
 some cost in performance).  

  Still, this would be awesome and I'd totally make use of it!

  (The code in my original email I guess conflates demonstration of two
issues - the addToIndex and lack of variable-sized lists, i.e. the java
collection support issue.  Even if you decide generic collection / map
support would be too tricky, at least supporting variable-sized lists
would help a lot...)

* I wondered about storing (arbitrary) graphs in the CAS, but the
  issues above make this really impractical.  If you also think about
  integrating microformats, you need to think about how to do this.
 We have had users store arbitrary graphs in the CAS, but, yes, it is not so
 efficient.  The main element UIMA has for collections of references (to
 FeatureStructures) are the FSArray and FSList.  As you point out the FSArray 
 is
 fixed length.  The FSList supports dynamic adding/removing etc. using the
 standard link-list technology.  However, because UIMA data in the CAS
 (currently) is not garbage collected, you have to be careful when using this
 technique.

  ...oh, never mind.  After using UIMA heavily for well over a year,
I managed not to learn that FSList exists at all!  Thanks for this
pointer.

  I think that's a bug for the UIMA Tutorial, which mentions FSArray but
not FSList.  :-)

  (Another pain point here - I always ache when I need to work with
FSArray or I guess FSList, since it does not carry the type information
that is in the typesystem - I 

Re: UIMAj3 ideas

2015-07-16 Thread Petr Baudis
  Hi!

On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
 Good comments which will likely generate lots of responses.
 For now please see comments on scaleout below.
 
 On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis pa...@ucw.cz wrote:
 
* UIMAfit is not part of core UIMA and UIMA-AS is not part of core
  UIMA.  It seems to me that UIMA-AS is doing things a bit differently
  than what the original UIMA idea of doing scaleout was.  The two
  things don't play well together.  I'd love a way to easily take
  my plain UIMA pipeline and scale it out, ideally without any code
  changes, *and* avoid the terrible XML config files.
 
 
 Not clear what you are referring to as the original UIMA idea of doing
 scaleout,
 the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
 is also an embeddable framework that offers flexible vertical
 (multi-threading) and
 horizontal (multi-process) options for deploying an arbitrary pipeline.
 Admittedly
 scaleout with UIMA-AS is complicated and the minimal support for process
 management make it difficult to do scaleout simply. In what ways do you
 think
 UIMA-AS is inconsistent with UIMA or UIMA scaleout?

  Well, my impression after delving into some UIMA internals was that
the original idea was to use the Analysis Structure Broker to control
the pipeline flow and it would seem natural that when doing scale-out,
one would simply provide a different ASB.  Its javadoc even reads

 The Analysis Structure Broker (codeASB/code) is the component
 responsible for the details of communicating with Analysis Engines
 that may potentially be distributed across different physical
 machines.

Of course, maybe I got it wrong.

 DUCC is full cluster management application that will scaleout a plain UIMA
 pipeline with no code changes, assuming that the application code is
 threadsafe.
 But a typical pipeline with a single collection reader creating input CASes
 and
 a single cas consumer will limit scaleout performance pretty quickly. DUCC
 makes it easyto eliminate the input data bottleneck. DUCC sample apps
 show one approach to eliminating the output bottleneck. Have you looked at
 DUCC?

  I use UIMA pipeline for question answering, where each question
currently takes ~30s (single-threaded) to process (a lot of it spent
waiting on databases), so I don't think I'd hit such a bottleneck.
I did spend a few tens of minutes looking at DUCC, but I got the
impression that it's not really trivial to set up.

  One of my goals is to minimize setup hassles for anyone who wants to
run my software - ideally, they should be able to just compile and run.
If I started to use DUCC, I'm not sure to what degree I could preserve
this, but at least it's another element in the already steep learning
curve for anyone who wants to tinker with the system.

  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
resource sharing - though from one of your previous emails, I got the
impression that I could run multiple AEs in threads of a single java
process; but I guess at that point I was already decided that I want
to try something less complex.)

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton


Re: UIMAj3 ideas

2015-07-16 Thread Thomas Ginter
Richard,

There is an API in UIMA for generating Analysis Engine Descriptors as well as 
Aggregates and Type System descriptions.  I use that API to generate the xml 
descriptor at runtime after the configuration has been completed.  I wrote my 
own logic to track the delegates of an Aggregate descriptor in order to 
propagate updates to/from delegates to allow the user to dynamically specify 
Analysis Engine parameters.  I also merged the scale out parameters for UIMA-AS 
into the Analysis Engine object for ease of configuration.  

In addition I wrote my own code to generate the deployment descriptor from the 
programmatic parameters provided.  The resulting XML is what the framework uses 
to generate the Spring Bean file you mentioned.

That being said the existing API definitely has a learning curve which was part 
of the motivation for creating Leo.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Jul 16, 2015, at 1:51 PM, Richard Eckart de Castilho r...@apache.org 
 wrote:
 
 Hi Thomas,
 
 On 16.07.2015, at 21:42, Thomas Ginter thomas.gin...@utah.edu wrote:
 
 Have you looked into using Leo?  It allows you to programmatically create 
 Analysis Engines, Aggregates, the type system, and launch everything in 
 UIMA-AS without having to manage any XML descriptors at all.  Furthermore it 
 is available via Maven so your code can compile an run.  
 
 Did you find an API in UIMA AS to handle the programmatic generation of 
 descriptors, or did you implement that yourself in Leo (as I had tried to in 
 DKPro Lab)? 
 
 If I remember correctly, then UIMA AS loaded plain XML descriptor files, 
 transforms them to a Spring Bean file using XSLT and then used Spring to 
 instantiate it. But I may have missed something.
 
 Cheers,
 
 -- Richard



Re: UIMAj3 ideas

2015-07-16 Thread Petr Baudis
  Hi!

On Thu, Jul 16, 2015 at 07:42:58PM +, Thomas Ginter wrote:
 Have you looked into using Leo?  It allows you to programmatically create 
 Analysis Engines, Aggregates, the type system, and launch everything in 
 UIMA-AS without having to manage any XML descriptors at all.  Furthermore it 
 is available via Maven so your code can compile an run.  
 
 http://department-of-veterans-affairs.github.io/Leo/userguide.html

  I had a look, but got the impression that I'd have to rewrite most
of my pipeline generation code, and it's not small code.  Also, it's
not clear to me from Leo's docs whether and/or how it supports CAS
multipliers and mergers, there seem to be no references to that.

  This impression might have been wrong, but overally I'd just welcome
if I could stick with stock UIMA for scaleout at least in the form
of multi-threading without cluster scaleout (which I think many UIMA
users would welcome, and much smaller percentage wants to deploy to
a cluster), that's what I was trying to say originally.

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton


Re: UIMAj3 ideas

2015-07-16 Thread Petr Baudis
On Thu, Jul 16, 2015 at 08:00:35PM +0200, Richard Eckart de Castilho wrote:
 On 16.07.2015, at 18:52, Petr Baudis pa...@ucw.cz wrote:
   Sorry for the confusion, but that's not quite what I had in mind.
  I literally believe that right now, in order to modify value of
  a feature, you need to first remove it from an index, change the
  value, then re-add it back.  Is that a misconception?
 
 Well, yes and no. Yes, it was required for the case where the value that
 you changed was on a feature that was part of some index. No, it should
 no longer be required as measures have been implemented to handle this
 automatically.
 
 See: The curious case of the zombie annotation aka UIMA-4049
 
 https://issues.apache.org/jira/browse/UIMA-4049

  That's great to hear!  However, when reading the bug report and
looking closely at that part of the release notes, I think it should no
longer be required isn't quite precise as changing indexed features
might cause an exception to be thrown by an iterator that goes through
these at the same time (so the fix for that is to use a snapshot
iterator, and that sounds reasonable, more so when JCasUtil gets support
for them - sorry if it did and I missed it, I'm still stuck on UIMA 2.6
for now anyway until the next release with fixed CasCopier).

   I think that's a bug for the UIMA Tutorial, which mentions FSArray but
  not FSList.  :-)
 
 Then I should tell you also about the uimaFIT FSCollectionFactory which
 contains all kinds of helpers to manage FSArray and FSList ;)
 
 Btw. there is also ArrayFS which is the CAS version of FSArray :P
..
 Did you know that uimaFIT JCasUtil.select() can also be applied to
 FSList and FSArray to avoid casting?
 
 for (Token t : JCasUtil.select(sentence.getTokens(), Token.class) {
   ...
 }
 
 CasUtil.select() can work also on ArrayFS

  So many great news! Thanks so much for these.  We'll certainly start
using them in new code. :-)

-- 
Petr Baudis
If you have good ideas, good data and fast computers,
you can do almost anything. -- Geoffrey Hinton


Re: UIMAj3 ideas

2015-07-16 Thread Thomas Ginter
Hi Petr,

Have you looked into using Leo?  It allows you to programmatically create 
Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS 
without having to manage any XML descriptors at all.  Furthermore it is 
available via Maven so your code can compile an run.  

http://department-of-veterans-affairs.github.io/Leo/userguide.html

The only catch to running UIMA-AS is making sure the broker is running.  A 
manual step that we have not yet automated.  Other than that it can scale most 
pipelines with the notable exception of pipelines that have really large 
resources.

As for ideas for UIMA 3 I would love to see a much simpler CAS system that 
didn’t require a pre-definition of types before execution.  Such as a very 
simple abstract base class that defines an “annotation” and is then extended in 
order to create/use a new type.  It seems like the basic location based indexes 
could still be provided that way as well as the option of extending to provide 
custom indexes.  If the CAS was implemented as a base set of very simple Java 
objects we would also have more serialization options.  Possibly even making it 
possible for the user to plug in a different serializer if required such as 
protobuff.  Just a thought.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




 On Jul 16, 2015, at 10:25 AM, Petr Baudis pa...@ucw.cz wrote:
 
  Hi!
 
 On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
 Good comments which will likely generate lots of responses.
 For now please see comments on scaleout below.
 
 On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis pa...@ucw.cz wrote:
 
  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
UIMA.  It seems to me that UIMA-AS is doing things a bit differently
than what the original UIMA idea of doing scaleout was.  The two
things don't play well together.  I'd love a way to easily take
my plain UIMA pipeline and scale it out, ideally without any code
changes, *and* avoid the terrible XML config files.
 
 
 Not clear what you are referring to as the original UIMA idea of doing
 scaleout,
 the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
 is also an embeddable framework that offers flexible vertical
 (multi-threading) and
 horizontal (multi-process) options for deploying an arbitrary pipeline.
 Admittedly
 scaleout with UIMA-AS is complicated and the minimal support for process
 management make it difficult to do scaleout simply. In what ways do you
 think
 UIMA-AS is inconsistent with UIMA or UIMA scaleout?
 
  Well, my impression after delving into some UIMA internals was that
 the original idea was to use the Analysis Structure Broker to control
 the pipeline flow and it would seem natural that when doing scale-out,
 one would simply provide a different ASB.  Its javadoc even reads
 
 The Analysis Structure Broker (codeASB/code) is the component
 responsible for the details of communicating with Analysis Engines
 that may potentially be distributed across different physical
 machines.
 
 Of course, maybe I got it wrong.
 
 DUCC is full cluster management application that will scaleout a plain UIMA
 pipeline with no code changes, assuming that the application code is
 threadsafe.
 But a typical pipeline with a single collection reader creating input CASes
 and
 a single cas consumer will limit scaleout performance pretty quickly. DUCC
 makes it easyto eliminate the input data bottleneck. DUCC sample apps
 show one approach to eliminating the output bottleneck. Have you looked at
 DUCC?
 
  I use UIMA pipeline for question answering, where each question
 currently takes ~30s (single-threaded) to process (a lot of it spent
 waiting on databases), so I don't think I'd hit such a bottleneck.
 I did spend a few tens of minutes looking at DUCC, but I got the
 impression that it's not really trivial to set up.
 
  One of my goals is to minimize setup hassles for anyone who wants to
 run my software - ideally, they should be able to just compile and run.
 If I started to use DUCC, I'm not sure to what degree I could preserve
 this, but at least it's another element in the already steep learning
 curve for anyone who wants to tinker with the system.
 
  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
 resource sharing - though from one of your previous emails, I got the
 impression that I could run multiple AEs in threads of a single java
 process; but I guess at that point I was already decided that I want
 to try something less complex.)
 
 -- 
   Petr Baudis
   If you have good ideas, good data and fast computers,
   you can do almost anything. -- Geoffrey Hinton



Re: UIMAj3 ideas

2015-07-16 Thread Richard Eckart de Castilho
Hi Thomas,

On 16.07.2015, at 21:42, Thomas Ginter thomas.gin...@utah.edu wrote:

 Have you looked into using Leo?  It allows you to programmatically create 
 Analysis Engines, Aggregates, the type system, and launch everything in 
 UIMA-AS without having to manage any XML descriptors at all.  Furthermore it 
 is available via Maven so your code can compile an run.  

Did you find an API in UIMA AS to handle the programmatic generation of 
descriptors, or did you implement that yourself in Leo (as I had tried to in 
DKPro Lab)? 

If I remember correctly, then UIMA AS loaded plain XML descriptor files, 
transforms them to a Spring Bean file using XSLT and then used Spring to 
instantiate it. But I may have missed something.

Cheers,

-- Richard 

Re: UIMAj3 ideas

2015-07-16 Thread Richard Eckart de Castilho
On 16.07.2015, at 23:10, Jaroslaw Cwiklik uim...@gmail.com wrote:

 The UIMA-AS *does* have an API to generate deployment descriptors although
 its not documented. Its an internal API for now and most likely will be
 documented in the next release of UIMA-AS. The API is implemented by
 DeploymentDescriptorFactory.java. in the uimaj-as-core project.

Cool :) *thumbs up*

-- Richard


Re: UIMAj3 ideas

2015-07-16 Thread Richard Eckart de Castilho
Thomas,

On 16.07.2015, at 22:56, Thomas Ginter thomas.gin...@utah.edu wrote:

 There is an API in UIMA for generating Analysis Engine Descriptors as well as 
 Aggregates and Type System descriptions.  I use that API to generate the xml 
 descriptor at runtime after the configuration has been completed.  I wrote my 
 own logic to track the delegates of an Aggregate descriptor in order to 
 propagate updates to/from delegates to allow the user to dynamically specify 
 Analysis Engine parameters.  I also merged the scale out parameters for 
 UIMA-AS into the Analysis Engine object for ease of configuration.  

we're using the plain UIMA APIs for AED and friends in uimaFIT too - those APIs 
being not too user-friendly and XML being a pain was the major motivation to 
come up with uimaFIT. However, uimaFIT doesn't aspire to drive UIMA AS, just to 
make the core UIMA descriptors easier to handle.

 In addition I wrote my own code to generate the deployment descriptor from 
 the programmatic parameters provided.  The resulting XML is what the 
 framework uses to generate the Spring Bean file you mentioned.


So what you say confirms my findings. I never found a corresponding API for 
UIMA deployment descriptors in UIMA AS. It would have been great if UIMA AS had 
provided at least some basic API for deployment descriptors parallel to what 
UIMA offers for engines and aggregates.

 That being said the existing API definitely has a learning curve which was 
 part of the motivation for creating Leo.

Same for uimaFIT ;) 

Cheers,

-- Richard

Re: UIMAj3 ideas

2015-07-16 Thread Jaroslaw Cwiklik
The UIMA-AS *does* have an API to generate deployment descriptors although
its not documented. Its an internal API for now and most likely will be
documented in the next release of UIMA-AS. The API is implemented by
 DeploymentDescriptorFactory.java. in the uimaj-as-core project.

Jerry

On Thu, Jul 16, 2015 at 4:56 PM, Thomas Ginter thomas.gin...@utah.edu
wrote:

 Richard,

 There is an API in UIMA for generating Analysis Engine Descriptors as well
 as Aggregates and Type System descriptions.  I use that API to generate the
 xml descriptor at runtime after the configuration has been completed.  I
 wrote my own logic to track the delegates of an Aggregate descriptor in
 order to propagate updates to/from delegates to allow the user to
 dynamically specify Analysis Engine parameters.  I also merged the scale
 out parameters for UIMA-AS into the Analysis Engine object for ease of
 configuration.

 In addition I wrote my own code to generate the deployment descriptor from
 the programmatic parameters provided.  The resulting XML is what the
 framework uses to generate the Spring Bean file you mentioned.

 That being said the existing API definitely has a learning curve which was
 part of the motivation for creating Leo.

 Thanks,

 Thomas Ginter
 801-448-7676
 thomas.gin...@utah.edu




  On Jul 16, 2015, at 1:51 PM, Richard Eckart de Castilho r...@apache.org
 wrote:
 
  Hi Thomas,
 
  On 16.07.2015, at 21:42, Thomas Ginter thomas.gin...@utah.edu wrote:
 
  Have you looked into using Leo?  It allows you to programmatically
 create Analysis Engines, Aggregates, the type system, and launch everything
 in UIMA-AS without having to manage any XML descriptors at all.
 Furthermore it is available via Maven so your code can compile an run.
 
  Did you find an API in UIMA AS to handle the programmatic generation of
 descriptors, or did you implement that yourself in Leo (as I had tried to
 in DKPro Lab)?
 
  If I remember correctly, then UIMA AS loaded plain XML descriptor files,
 transforms them to a Spring Bean file using XSLT and then used Spring to
 instantiate it. But I may have missed something.
 
  Cheers,
 
  -- Richard




looking for more informative exception messages when parsing invalid Ruta script

2015-07-16 Thread Renaud Richardet
Hello,

When using Ruta in a non-Workbench setup (in my case, Maven), I don't
manage to catch Ruta script errors in a meaningful way. Here is an example:

aaa\. - MyAnnotation; // fails because of escaped dot


The thrown error is quite uninformative:


java.lang.ArrayIndexOutOfBoundsException: -1

at org.apache.uima.ruta.parser.RutaParser.emitErrorMessage(
RutaParser.java:365)

at org.apache.uima.ruta.parser.RutaParser.reportError(RutaParser.java:386)

at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(
BaseRecognizer.java:603)

at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)

at org.apache.uima.ruta.parser.RutaParser.file_input(RutaParser.java:680)

at org.apache.uima.ruta.engine.RutaEngine.loadScript(RutaEngine.java:1058)

at org.apache.uima.ruta.engine.RutaEngine.initializeScript(
RutaEngine.java:743)

...

Here is the code to reproduce:
https://github.com/renaud/annotate_ruta_example/tree/ruta_error_message

However, if I paste that script line in the Ruta Workbench, it nicely
underlines it in red at the exact location, and even says Mismatched
input. How do I achieve the same programatically (from Java)?

Thanks, Renaud