It was brought up recently in a meeting that we have to consider the effect of 
a Feature Structure ID in a CAS / CAS Store on deserialization of a CAS into 
UIMAj and the annotation indexing.  

e.g, How would adding a stable identifier affect indexing and references 
withing jCAS Objects?  

I'd like to throw out a couple scenarios to the community and see if these 
cover all of the possible use cases, and discuss how I currently implement it, 
and hopefully get some comments :)

First, I'd like to confirm that I'm thinking of a CAS STore operating in 
between different PEARs or full UIMA Applications, not running between an 
Aggregate analytic (although that is definitely something to consider).  
Furthermore, I am assuming that the CAS Store interface retrieves a CAS object 
that agrees to the OASIS spec, and that the CAS store is responsible for 
creating FSIDs.

I can think of four scenarios when deserializing a CAS xmi (I'm not sure about 
deserializing from binary) to a  jCAS object, as it comes from the CAS Store.
 
1:  A minimal CAS that contains only a sofa and view . This is the simplest 
input to pull from a CAS Store, and doesn't require an modifications in the 
UIMAj deserialization.

2:  A full CAS with a SOFA and associated annotations in multiple views

3:  A CAS Fragment (or projection) of a single CAS xmi from the store, that 
contains only the information necessary for this particular Analytic Pipeline 
(there might or might not be a SOFA and view associated with it).

4:  A CAS created from one or more analytics on different artifacts (zero or 
more cas:Sofa elements, and zero or more View elements)

Currently, if I use the FSID element, I have to set the deserialization to 
LENIENT, or preprocess them out of the CAS before deserialization. This simply 
removes the unknown attributes. 

For scenario 1, other than lenient serialization, nothing needs to be completed.

For scenario 2 and 3, the associated Type System of the CAS must be registered 
for serialization. 

For scenerio 4, I haven't implemented yet in UIMAj, but will be working on 
something for this soon. 

Now, I haven't dug into the Serialization code yet to see how else this can be 
accomplished, but will be looking into it soon.  I would just like to begin a 
discussion on this topic to make sure that we're covering all our bases :) 

Thanks! 

Neal  


Reply via email to