Could the id assigned in V3 be the same as the V2 address, as if the offset in a heap? Unique and monotonically increasing.
Burn On Fri, Sep 2, 2016 at 5:36 AM, Peter Klügl <[email protected]> wrote: > Same here. > > > It looks like that we are now also starting to use the address, and I am > also thinking of using it more in Ruta (internal indexing). > > > Btw, I did some simple experiments lately concerning the stability of > the addresses when using CasIOUtils. Can it happens that the addresses > change if you just deserialize the same CAs twice without serializing it > in between? > > > Best, > > > Peter > > > > Am 01.09.2016 um 19:29 schrieb Richard Eckart de Castilho: > > FS IDs are IMHO a very useful thing. Providing out-of-band (i.e. > out-of-type-system) unique identifiers for feature structures facilitates > handling them in e.g. in editors. We use that quite a bit in WebAnno. > > > > In WebAnno, we do not rely on any heap arithmetics - an ID is just > expected to be a unique identifier. However, I could imagine cases where > people might rely on the ID to increment monotonically for new FSes. > > > > Most binary formats do not preserve the ID across a save/load cycle. > However, SERIALIZED and SERIALIZED_TSI *do* preserve the ID, and WebAnno > makes used of that. It allows to keep references to FSes without having to > keep the CAS in memory all the time. > > > > There should continue to be a V3 serialization format which preserves > IDs across a load/save cycle. > > > > I do presently not see a case where a strong similarity between V2 and > V3 IDs would be important. It would be nice if deserializing a V2 > SERIALIZED or SERIALIZED_TSI into V3 would restore the V2 IDs - I expect it > to be an easy thing to do. > > > > Cheers, > > > > -- Richard > > > >> On 01.09.2016, at 16:09, Marshall Schor <[email protected]> wrote: > >> > >> UIMA V3 implementation includes in many places extra code (takes time / > space) > >> whose goal is to make things look closer to version 2. Some of this is > for > >> interoperability with version 2 artifacts, like serialized forms. > >> > >> An example: in v2, many serialization forms include "references" to > other > >> Feature Structures (FSs), and for those, the encoding is the "address" > in the > >> heap of the FS. > >> > >> In v3, there is no heap, but the FSs have "ids", which are (at the > moment) an > >> int which increments by 1. This mis-matches the "address" in v2, so > many parts > >> of the serialization code builds a map at serialization time from the > v3 id's to > >> v2 "addresses", and uses the latter in the serialization form. > >> > >> Currently, this is done for various binary serializations, so that > these can be > >> read back in by v2 code. > >> > >> Currently, it's not done for JSON or XMI (and maybe XCAS - haven't > checked). So > >> the serialized forms for these differ between v2 and v3, in that the > numbers > >> used to represent references to other FSs are different. > >> > >> The deserialization code for XMI and JSON doesn't depend on these > numbers being > >> anything other than unique per FS, so there's no issue in > deserializing. But > >> the UIMA community may have built other things that depend on these > identifiers > >> not changing. > >> > >> What's your opinion: should the XMI and JSON etc serialization in V3 be > changed > >> to reproduce (approximately) the same reference numbers as v2? I say > >> approximately, because other factors might affect these, such as the > ordering > >> for things not in "ordered" indexes, etc. between v2 and v3. > >> > >> -Marshall > >> > >
