FS IDs are IMHO a very useful thing. Providing out-of-band (i.e. out-of-type-system) unique identifiers for feature structures facilitates handling them in e.g. in editors. We use that quite a bit in WebAnno.
In WebAnno, we do not rely on any heap arithmetics - an ID is just expected to be a unique identifier. However, I could imagine cases where people might rely on the ID to increment monotonically for new FSes. Most binary formats do not preserve the ID across a save/load cycle. However, SERIALIZED and SERIALIZED_TSI *do* preserve the ID, and WebAnno makes used of that. It allows to keep references to FSes without having to keep the CAS in memory all the time. There should continue to be a V3 serialization format which preserves IDs across a load/save cycle. I do presently not see a case where a strong similarity between V2 and V3 IDs would be important. It would be nice if deserializing a V2 SERIALIZED or SERIALIZED_TSI into V3 would restore the V2 IDs - I expect it to be an easy thing to do. Cheers, -- Richard > On 01.09.2016, at 16:09, Marshall Schor <[email protected]> wrote: > > UIMA V3 implementation includes in many places extra code (takes time / space) > whose goal is to make things look closer to version 2. Some of this is for > interoperability with version 2 artifacts, like serialized forms. > > An example: in v2, many serialization forms include "references" to other > Feature Structures (FSs), and for those, the encoding is the "address" in the > heap of the FS. > > In v3, there is no heap, but the FSs have "ids", which are (at the moment) an > int which increments by 1. This mis-matches the "address" in v2, so many > parts > of the serialization code builds a map at serialization time from the v3 id's > to > v2 "addresses", and uses the latter in the serialization form. > > Currently, this is done for various binary serializations, so that these can > be > read back in by v2 code. > > Currently, it's not done for JSON or XMI (and maybe XCAS - haven't checked). > So > the serialized forms for these differ between v2 and v3, in that the numbers > used to represent references to other FSs are different. > > The deserialization code for XMI and JSON doesn't depend on these numbers > being > anything other than unique per FS, so there's no issue in deserializing. But > the UIMA community may have built other things that depend on these > identifiers > not changing. > > What's your opinion: should the XMI and JSON etc serialization in V3 be > changed > to reproduce (approximately) the same reference numbers as v2? I say > approximately, because other factors might affect these, such as the ordering > for things not in "ordered" indexes, etc. between v2 and v3. > > -Marshall >
