UIMA V3 implementation includes in many places extra code (takes time / space)
whose goal is to make things look closer to version 2.  Some of this is for
interoperability with version 2 artifacts, like serialized forms.

An example: in v2, many serialization forms include "references" to other
Feature Structures (FSs), and for those, the encoding is the "address" in the
heap of the FS.

In v3, there is no heap, but the FSs have "ids", which are (at the moment) an
int which increments by 1.  This mis-matches the "address" in v2, so many parts
of the serialization code builds a map at serialization time from the v3 id's to
v2 "addresses", and uses the latter in the serialization form.

Currently, this is done for various binary serializations, so that these can be
read back in by v2 code.

Currently, it's not done for JSON or XMI (and maybe XCAS - haven't checked).  So
the serialized forms for these differ between v2 and v3, in that the numbers
used to represent references to other FSs are different.

The deserialization code for XMI and JSON doesn't depend on these numbers being
anything other than unique per FS, so there's no issue in deserializing.  But
the UIMA community may have built other things that depend on these identifiers
not changing. 

What's your opinion: should the XMI and JSON etc serialization in V3 be changed
to reproduce (approximately) the same reference numbers as v2?  I say
approximately, because other factors might affect these, such as the ordering
for things not in "ordered" indexes, etc. between v2 and v3.

-Marshall

Reply via email to