Re: opinion on degree of backwards compatibility for Uima V3 experiment

Marshall Schor Fri, 02 Sep 2016 08:56:45 -0700

whew! -M


On 9/2/2016 9:27 AM, Peter Klügl wrote:
> Tested all formats, did not happen for a reasonable complex CAS.
>
>
> Am 02.09.2016 um 15:26 schrieb Marshall Schor:
>> Re: deserializing the same CAS twice shouldn't change the addresses;  if you
>> have a case where it's doing that, I'll investigate (need a small test 
>> case...).
>>
>> -Marshall
>>
>> On 9/2/2016 5:36 AM, Peter Klügl wrote:
>>> Same here.
>>>
>>>
>>> It looks like that we are now also starting to use the address, and I am
>>> also thinking of using it more in Ruta (internal indexing).
>>>
>>>
>>> Btw, I did some simple experiments lately concerning the stability of
>>> the addresses when using CasIOUtils. Can it happens that the addresses
>>> change if you just deserialize the same CAs twice without serializing it
>>> in between?
>>>
>>>
>>> Best,
>>>
>>>
>>> Peter
>>>
>>>
>>>
>>> Am 01.09.2016 um 19:29 schrieb Richard Eckart de Castilho:
>>>> FS IDs are IMHO a very useful thing. Providing out-of-band (i.e. 
>>>> out-of-type-system) unique identifiers for feature structures facilitates 
>>>> handling them in e.g. in editors. We use that quite a bit in WebAnno.
>>>>
>>>> In WebAnno, we do not rely on any heap arithmetics - an ID is just 
>>>> expected to be a unique identifier. However, I could imagine cases where 
>>>> people might rely on the ID to increment monotonically for new FSes.
>>>>
>>>> Most binary formats do not preserve the ID across a save/load cycle. 
>>>> However, SERIALIZED and SERIALIZED_TSI *do* preserve the ID, and WebAnno 
>>>> makes used of that. It allows to keep references to FSes without having to 
>>>> keep the CAS in memory all the time. 
>>>>
>>>> There should continue to be a V3 serialization format which preserves IDs 
>>>> across a load/save cycle. 
>>>>
>>>> I do presently not see a case where a strong similarity between V2 and V3 
>>>> IDs would be important. It would be nice if deserializing a V2 SERIALIZED 
>>>> or SERIALIZED_TSI into V3 would restore the V2 IDs - I expect it to be an 
>>>> easy thing to do.
>>>>
>>>> Cheers,
>>>>
>>>> -- Richard
>>>>
>>>>> On 01.09.2016, at 16:09, Marshall Schor <[email protected]> wrote:
>>>>>
>>>>> UIMA V3 implementation includes in many places extra code (takes time / 
>>>>> space)
>>>>> whose goal is to make things look closer to version 2.  Some of this is 
>>>>> for
>>>>> interoperability with version 2 artifacts, like serialized forms.
>>>>>
>>>>> An example: in v2, many serialization forms include "references" to other
>>>>> Feature Structures (FSs), and for those, the encoding is the "address" in 
>>>>> the
>>>>> heap of the FS.
>>>>>
>>>>> In v3, there is no heap, but the FSs have "ids", which are (at the 
>>>>> moment) an
>>>>> int which increments by 1.  This mis-matches the "address" in v2, so many 
>>>>> parts
>>>>> of the serialization code builds a map at serialization time from the v3 
>>>>> id's to
>>>>> v2 "addresses", and uses the latter in the serialization form.
>>>>>
>>>>> Currently, this is done for various binary serializations, so that these 
>>>>> can be
>>>>> read back in by v2 code.
>>>>>
>>>>> Currently, it's not done for JSON or XMI (and maybe XCAS - haven't 
>>>>> checked).  So
>>>>> the serialized forms for these differ between v2 and v3, in that the 
>>>>> numbers
>>>>> used to represent references to other FSs are different.
>>>>>
>>>>> The deserialization code for XMI and JSON doesn't depend on these numbers 
>>>>> being
>>>>> anything other than unique per FS, so there's no issue in deserializing.  
>>>>> But
>>>>> the UIMA community may have built other things that depend on these 
>>>>> identifiers
>>>>> not changing. 
>>>>>
>>>>> What's your opinion: should the XMI and JSON etc serialization in V3 be 
>>>>> changed
>>>>> to reproduce (approximately) the same reference numbers as v2?  I say
>>>>> approximately, because other factors might affect these, such as the 
>>>>> ordering
>>>>> for things not in "ordered" indexes, etc. between v2 and v3.
>>>>>
>>>>> -Marshall
>>>>>
>

Re: opinion on degree of backwards compatibility for Uima V3 experiment

Reply via email to