Apologies for the long email. Short version - it appears that arrays of specific Feature Structure types (e.g. myFoo[]) have some holes in the support; some possible ways forward.
----------------- UIMA has some support for arrays and lists of FeatureStructures (FSs) with the elements restricted to a particular FS type. This is supported in the type system descriptors, where you can specify in the "featureDescription" an "elementType". One use could be to use these types with indexing; you can get an index over all instances of arrays of some specific type. In the implementation, I see further support. It is possible to create a type which is a FS array with a component type, using the TypeSystemManager API: getArrayType(component_type). This creates (or just retrieves, if already created) a type whose name is the name of the component_type, suffixed with "[]". Example: "uima.tcas.Annotation[]". You can also specify these types in the XML type descriptor, but not directly; you can only specify them in the "feature" description for another type, where that feature is referencing it. To actually create instances of these types seems not quite implemented. To create an array, the API needs to include the array length. Looking at the non-JCas APIs, we have in the CAS Interface methods for creating arrays: createBooleanArray(length) createStringArray(length) etc. createArrayFS(length) but there's no createArray(type, length) The LowLevelCAS interface has this though: ll_createArray(type, length) I couldn't find any tests that actually create one of these objects, using this API. Modifying a test case to create one of these, and then attempting to serialize it with both XMI and XCAS serialization produced invalid XML if the array was in fact serialized as a separate object. This is the case in XCAS and in XMI when the array is referenced from a feature description, and that feature description is marked as "multipleReferencesAllowed". In these cases, the convention to serialize a FeatureStructure is to serialize it using the name of the type as the XML element name. For example, the type "Foo" gets serialized as <Foo ... />. But the name of these types ends in "[]", e.g. Annotation[]. And the characters "[]" are not legal as part of an XML element name. There is some code that in some (but not all) cases serializes this using the element name "FSArray" instead. But the deserialization code produces for this FSArray instances instead of the more specific type instances. When the deserialized object is referenced from another type via a feature having an "elementType" specification (in the receiving type system), that information could be used to fix-up the deserialized array instance type, to the that spec's component type. It also appears that the casCopier doesn't support creating these kinds of objects. I've probably missed some things in my analysis of this. I'm thinking we ought to fix the CasCopier and XMI and XCAS serialization to work when serializing these objects (by serializing them as FSArray, although that loses the component type info). When deserializing XMI and XCAS, these FSArray objects could be updated to include the element-type information when and if that was available, for instance, if there was a reference from some typed feature having an element type). This isn't perfect; to be 100% accurate, we would need to be able to record the element type in the serialized stream for these instances. I haven't (yet) thought much about JCas for this issue, or support for fslists. Other thoughts? -Marshall
