UIMA V3 implementation includes in many places extra code (takes time / space) whose goal is to make things look closer to version 2. Some of this is for interoperability with version 2 artifacts, like serialized forms.
An example: in v2, many serialization forms include "references" to other Feature Structures (FSs), and for those, the encoding is the "address" in the heap of the FS. In v3, there is no heap, but the FSs have "ids", which are (at the moment) an int which increments by 1. This mis-matches the "address" in v2, so many parts of the serialization code builds a map at serialization time from the v3 id's to v2 "addresses", and uses the latter in the serialization form. Currently, this is done for various binary serializations, so that these can be read back in by v2 code. Currently, it's not done for JSON or XMI (and maybe XCAS - haven't checked). So the serialized forms for these differ between v2 and v3, in that the numbers used to represent references to other FSs are different. The deserialization code for XMI and JSON doesn't depend on these numbers being anything other than unique per FS, so there's no issue in deserializing. But the UIMA community may have built other things that depend on these identifiers not changing. What's your opinion: should the XMI and JSON etc serialization in V3 be changed to reproduce (approximately) the same reference numbers as v2? I say approximately, because other factors might affect these, such as the ordering for things not in "ordered" indexes, etc. between v2 and v3. -Marshall