Re: UIMAj3 ideas

Thomas Ginter Thu, 16 Jul 2015 12:45:13 -0700

Hi Petr,

Have you looked into using Leo?  It allows you to programmatically create 
Analysis Engines, Aggregates, the type system, and launch everything in UIMA-AS 
without having to manage any XML descriptors at all.  Furthermore it is 
available via Maven so your code can compile an run.


http://department-of-veterans-affairs.github.io/Leo/userguide.html

The only catch to running UIMA-AS is making sure the broker is running.  A 
manual step that we have not yet automated.  Other than that it can scale most 
pipelines with the notable exception of pipelines that have really large 
resources.

As for ideas for UIMA 3 I would love to see a much simpler CAS system that 
didn’t require a pre-definition of types before execution.  Such as a very 
simple abstract base class that defines an “annotation” and is then extended in 
order to create/use a new type.  It seems like the basic location based indexes 
could still be provided that way as well as the option of extending to provide 
custom indexes.  If the CAS was implemented as a base set of very simple Java 
objects we would also have more serialization options.  Possibly even making it 
possible for the user to plug in a different serializer if required such as 
protobuff.  Just a thought.

Thanks,

Thomas Ginter
801-448-7676
thomas.gin...@utah.edu




> On Jul 16, 2015, at 10:25 AM, Petr Baudis <pa...@ucw.cz> wrote:
> 
>  Hi!
> 
> On Fri, Jul 10, 2015 at 10:28:08AM -0400, Eddie Epstein wrote:
>> Good comments which will likely generate lots of responses.
>> For now please see comments on scaleout below.
>> 
>> On Thu, Jul 9, 2015 at 6:52 PM, Petr Baudis <pa...@ucw.cz> wrote:
>> 
>>>  * UIMAfit is not part of core UIMA and UIMA-AS is not part of core
>>>    UIMA.  It seems to me that UIMA-AS is doing things a bit differently
>>>    than what the original UIMA idea of doing scaleout was.  The two
>>>    things don't play well together.  I'd love a way to easily take
>>>    my plain UIMA pipeline and scale it out, ideally without any code
>>>    changes, *and* avoid the terrible XML config files.
>>> 
>>> 
>> Not clear what you are referring to as the "original UIMA idea of doing
>> scaleout",
>> the CPE? Core UIMA is a single threaded, embeddable framework. UIMA-AS
>> is also an embeddable framework that offers flexible vertical
>> (multi-threading) and
>> horizontal (multi-process) options for deploying an arbitrary pipeline.
>> Admittedly
>> scaleout with UIMA-AS is complicated and the minimal support for process
>> management make it difficult to do scaleout simply. In what ways do you
>> think
>> UIMA-AS is inconsistent with UIMA or UIMA scaleout?
> 
>  Well, my impression after delving into some UIMA internals was that
> the original idea was to use the Analysis Structure Broker to control
> the pipeline flow and it would seem natural that when doing scale-out,
> one would simply provide a different ASB.  Its javadoc even reads
> 
>> The Analysis Structure Broker (<code>ASB</code>) is the component
>> responsible for the details of communicating with Analysis Engines
>> that may potentially be distributed across different physical
>> machines.
> 
> Of course, maybe I got it wrong.
> 
>> DUCC is full cluster management application that will scaleout a plain UIMA
>> pipeline with no code changes, assuming that the application code is
>> threadsafe.
>> But a typical pipeline with a single collection reader creating input CASes
>> and
>> a single cas consumer will limit scaleout performance pretty quickly. DUCC
>> makes it easyto eliminate the input data bottleneck. DUCC sample apps
>> show one approach to eliminating the output bottleneck. Have you looked at
>> DUCC?
> 
>  I use UIMA pipeline for question answering, where each question
> currently takes ~30s (single-threaded) to process (a lot of it spent
> waiting on databases), so I don't think I'd hit such a bottleneck.
> I did spend a few tens of minutes looking at DUCC, but I got the
> impression that it's not really trivial to set up.
> 
>  One of my goals is to minimize setup hassles for anyone who wants to
> run my software - ideally, they should be able to just compile and run.
> If I started to use DUCC, I'm not sure to what degree I could preserve
> this, but at least it's another element in the already steep learning
> curve for anyone who wants to tinker with the system.
> 
>  (Then there's this whole issue of UIMA-AS vs. UIMAfit and in-memory
> resource sharing - though from one of your previous emails, I got the
> impression that I could run multiple AEs in threads of a single java
> process; but I guess at that point I was already decided that I want
> to try something less complex.)
> 
> -- 
>                               Petr Baudis
>       If you have good ideas, good data and fast computers,
>       you can do almost anything. -- Geoffrey Hinton

Re: UIMAj3 ideas

Reply via email to