Apologies for the long email.  Short version - it appears that arrays of
specific Feature Structure types (e.g. myFoo[]) have some holes in the support;
some possible ways forward.

-----------------

UIMA has some support for arrays and lists of FeatureStructures (FSs) with the
elements restricted to a particular FS type. This is supported in the type
system descriptors, where you can specify in the "featureDescription" an
"elementType".

One use could be to use these types with indexing; you can get an index over all
instances of arrays of some specific type.

In the implementation, I see further support.  It is possible to create a type
which is a FS array with a component type, using the TypeSystemManager API:
getArrayType(component_type).  This creates (or just retrieves, if already
created) a type whose name is the name of the component_type, suffixed with
"[]".  Example:  "uima.tcas.Annotation[]".

You can also specify these types in the XML type descriptor, but not directly;
you can only specify them in the "feature" description for another type, where
that feature is referencing it.

To actually create instances of these types seems not quite implemented.  To
create an array, the API needs to include the array length.  Looking at the
non-JCas APIs, we have in the CAS Interface methods for creating arrays:

createBooleanArray(length)
createStringArray(length)
  etc.
createArrayFS(length)

but there's no

createArray(type, length)

The LowLevelCAS interface has this though:

ll_createArray(type, length)

I couldn't find any tests that actually create one of these objects, using this 
API.

Modifying a test case to create one of these, and then attempting to serialize
it with both XMI and XCAS serialization produced invalid XML if the array was in
fact serialized as a separate object.  This is the case in XCAS and in XMI when
the array is referenced from a feature description, and that feature description
is marked as "multipleReferencesAllowed". 

In these cases, the convention to serialize a FeatureStructure is to serialize
it using the name of the type as the XML element name.  For example, the type
"Foo" gets serialized as <Foo ... />.  But the name of these types ends in "[]",
e.g. Annotation[].  And the characters "[]" are not legal as part of an XML
element name.

There is some code that in some (but not all) cases serializes this using the
element name "FSArray" instead.  But the deserialization code produces for this
FSArray instances instead of the more specific type instances. When the
deserialized object is referenced from another type via a feature having an
"elementType" specification (in the receiving type system), that information
could be used to fix-up the deserialized array instance type, to the that spec's
component type.

It also appears that the casCopier doesn't support creating these kinds of 
objects.

I've probably missed some things in my analysis of this.  I'm thinking we ought
to fix the CasCopier and XMI and XCAS serialization to work when serializing
these objects (by serializing them as FSArray, although that loses the component
type info).  When deserializing XMI and XCAS, these FSArray objects could be
updated to include the element-type information when and if that was available,
for instance, if there was a reference from some typed feature having an element
type).

This isn't perfect; to be 100% accurate, we would need to be able to record the
element type in the serialized stream for these instances.

I haven't (yet) thought much about JCas for this issue, or support for fslists.

Other thoughts?

-Marshall

Reply via email to