I posted this a couple weeks ago, and didn't get any traction, so I thought
I would try one more time for responses :)



It was brought up recently in a meeting that we have to consider the effect
of a Feature Structure ID in a CAS / CAS Store on deserialization of a CAS
into UIMAj and the annotation indexing.

e.g, How would adding a stable identifier affect indexing and references
withing jCAS Objects?

I'd like to throw out a couple scenarios to the community and see if these
cover all of the possible use cases, and discuss how I currently implement
it, and hopefully get some comments :)

First, I'd like to confirm that I'm thinking of a CAS STore operating in
between different PEARs or full UIMA Applications, not running between an
Aggregate analytic (although that is definitely something to consider).
Furthermore, I am assuming that the CAS Store interface retrieves a CAS
object that agrees to the OASIS spec, and that the CAS store is responsible
for creating FSIDs.

I can think of four scenarios when deserializing a CAS xmi (I'm not sure
about deserializing from binary) to a  jCAS object, as it comes from the
CAS Store.

1:  A minimal CAS that contains only a sofa and view . This is the simplest
input to pull from a CAS Store, and doesn't require an modifications in the
UIMAj deserialization.

2:  A full CAS with a SOFA and associated annotations in multiple views

3:  A CAS Fragment (or projection) of a single CAS xmi from the store, that
contains only the information necessary for this particular Analytic
Pipeline (there might or might not be a SOFA and view associated with it).

4:  A CAS created from one or more analytics on different artifacts (zero
or more cas:Sofa elements, and zero or more View elements)

Currently, if I use the FSID element, I have to set the deserialization to
LENIENT, or preprocess them out of the CAS before deserialization. This
simply removes the unknown attributes.

For scenario 1, other than lenient serialization, nothing needs to be
completed.

For scenario 2 and 3, the associated Type System of the CAS must be
registered for serialization.

For scenerio 4, I haven't implemented yet in UIMAj, but will be working on
something for this soon.

Now, I haven't dug into the Serialization code yet to see how else this can
be accomplished, but will be looking into it soon.  I would just like to
begin a discussion on this topic to make sure that we're covering all our
bases :)

Thanks!

Neal

Reply via email to