Re: Maintaining UIMAj indexing and references while using stable FSIDs in a CAS Store

Richard Eckart de Castilho Wed, 27 Feb 2013 10:14:20 -0800

Hello Neal,

I'm not sure if these comments are helpful, but well, here you go…


Am 27.02.2013 um 18:38 schrieb Neal R Lewis <[email protected]>:

> I posted this a couple weeks ago, and didn't get any traction, so I thought
> I would try one more time for responses :)
> 
> It was brought up recently in a meeting that we have to consider the effect
> of a Feature Structure ID in a CAS / CAS Store on deserialization of a CAS
> into UIMAj and the annotation indexing.
> 
> e.g, How would adding a stable identifier affect indexing and references
> withing jCAS Objects?
> 
> I'd like to throw out a couple scenarios to the community and see if these
> cover all of the possible use cases, and discuss how I currently implement
> it, and hopefully get some comments :)
> 
> First, I'd like to confirm that I'm thinking of a CAS STore operating in
> between different PEARs or full UIMA Applications, not running between an
> Aggregate analytic (although that is definitely something to consider).

I don't understand what your are differentiating here. A PEAR can be a component
in a larger pipeline. I suppose an application would rather stand alone
and not interact with other applications in any way. I would probably embed
some analysis pipelines, PEARs, aggregates, whatever.

> Furthermore, I am assuming that the CAS Store interface retrieves a CAS
> object that agrees to the OASIS spec, and that the CAS store is responsible
> for creating FSIDs.

I suppose you imply here that the FSIDs are not available once the CAS has
been loaded into memory because the OASIS spec does not include FSIDs? It
may also be problematic if the FSIDs are just available after saving the CAS
to the store and not after immediately after adding the FS to the CAS.

> I can think of four scenarios when deserializing a CAS xmi (I'm not sure
> about deserializing from binary) to a  jCAS object, as it comes from the
> CAS Store.
> 
> 1:  A minimal CAS that contains only a sofa and view . This is the simplest
> input to pull from a CAS Store, and doesn't require an modifications in the
> UIMAj deserialization.

If a CAS contains more than one Sofa/View, then I suppose a modification is
necessary because the XmiCasDeserializer restores all sofas/views from XMI,
not only select ones. Furthermore, annotations can be indexed in one view but
refer to annotations in another view. It could be problematic if this other
view is not available.

> 2:  A full CAS with a SOFA and associated annotations in multiple views

That's probably the one where no modifications are necessary.

> 3:  A CAS Fragment (or projection) of a single CAS xmi from the store, that
> contains only the information necessary for this particular Analytic
> Pipeline (there might or might not be a SOFA and view associated with it).

I think it would be problematic to access feature structures unless they are
indexed in a view. Note that I expect result of a retrieval operation is
always a UIMA CAS and not some other data structure or simply a list of FSes.

> 4:  A CAS created from one or more analytics on different artifacts (zero
> or more cas:Sofa elements, and zero or more View elements)

I didn't understand that point. Do you mean you synthesize a CAS from multiple
CASes? It sounds like combining 1, 2 or 3 with a CAS merger.

> Currently, if I use the FSID element, I have to set the deserialization to
> LENIENT, or preprocess them out of the CAS before deserialization. This
> simply removes the unknown attributes.

This sounds like you just patch additional attributes into the XMI and then 
discard
them during deserialization. I think this is problematic. Imagine I want to 
retrieve
a set of FSes identified as A, B and C. I get back a CAS containing A, B and C, 
but I 
have no idea which one is which. In my opinion, the FSID must be accessible 
through
the CAS API. 

> For scenario 1, other than lenient serialization, nothing needs to be
> completed.
> 
> For scenario 2 and 3, the associated Type System of the CAS must be
> registered for serialization.

A CAS always requires a type system. Unless one assumes that the type system is
always provided by the context (e.g. application embedding the analysis, the
runtime environment or automatic discovery mechanisms as used in uimaFIT), then
I would expect is must be possible to ask the store for the type system for any
CAS stored in it. So when a CAS is written to the store, then the some type 
system
must be associated with it.

> For scenerio 4, I haven't implemented yet in UIMAj, but will be working on
> something for this soon.

There has been a post by Marshal (http://markmail.org/thread/6q7demw2h3nzliyb) 
pointing out several important issues that seem to apply in particular to
scenario 4. The post ends in a question how the scenario is envisioned, but so
far no answer has been given. 

Cheers,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
[email protected] 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
-------------------------------------------------------------------

Re: Maintaining UIMAj indexing and references while using stable FSIDs in a CAS Store

Reply via email to