Interesting thread in UIMA core about JSON Serialization CAS and Descriptors.
Begin forwarded message: > From: Marshall Schor <[email protected]> > Subject: Re: [jira] [Created] (UIMA-3969) Add JSON Serialization for CASs and > UIMA Descriptors > Date: August 25, 2014 at 8:33:54 PM PDT > To: [email protected] > Reply-To: [email protected] > > > On 8/25/2014 6:54 PM, Jens Grivolla wrote: >> Is the JSON serialization documented somewhere? > Yes, there's a chapter in the reference book. You can build that > (uima-docbook-references), until it's released. > > There are also lots of Javadocs in the main implementing class: > XmiCasSerializer. (It's in this class because it shares a lot of the > machinery > with Xmi serialization). > >> >> I saw that there appear to be quite a few alternative serializations. It >> seems to include something like a typesystem definition, but only with a >> list of feature names, not their types, if I understood the format >> correctly (@featureRefs has a list of the features that are not of >> primitive types, it seems). > The @featureRefs is only those features which are "references" to other > feature > structures. > > You're correct, in noticing that the feature "range" types are not present. > This is because the serialization is to JSON, which supports a native > representation of things that are collections (JSON arrays) which could be > uima > Arrays or Lists, and ranges that are boolean are representable by JSON true > and > false values. There is no distinction that a number is a byte/short/int/long, > because those are all represented as a JSON "number". And so forth... > > The Json serialization for a CAS can optionally include parts of the type > system: It can include what the supertypes are for serialized types (to enable > iterating over a type and all of its subtypes, like Cas iterators normally > do); > it can also identify which slots which appear to have number values are > actually > to be interpreted as references to other feature structures. Otherwise, the > serialized form might have a slot "foo" : 111 which is a number value, and a > slot "bar" : 112 which is a reference to another feature structure whose ID is > 112. This extra information (in @featureRefs) permits the user of the JSON > serialized form a way to distinguish these two case. > >> >> It would be very useful if the serialization allowed one to easily pull out >> a partial CAS with just a subset of the views (by only including some >> subtrees of the JSON structure), and merge views into it. > Another optional part of the serialization is a list of views, together with > an > array of numbers each one of which represents a serialized Feature Structure > that is indexed in that view. >> This might be >> complicated, as I understand that the views define annotation indices, but >> the same annotation can be indexed in several views, right? > > Feature Structures can be classified into "Annotations" and other types (not a > subtype of Annotation). > > Annotations are special - they have an implied reference to a particular > subject > of analysis. So they are restricted to being indexed in the view that is > associated with that subject-of-analysis. > > Other types (not subtypes of Annotation (or more precisely, AnnotationBase)) > do > not have this restriction, and can be indexed in multiple views. > > See > http://uima.apache.org/d/uimaj-2.6.0/tutorials_and_users_guides.html#ugr.tug.aas.annotations_associated_sofa. > > Let me know where the documentation might be improved :-) > > -Marshall >> >> -- Jens >> >> >> >
