So SERIALIZED and SERIALIZED_TS get no header?

Can you try to deserialize the CAS files created by the unit test with
an older version of uima? I cannot get it to work.


Best,


Peter


Am 22.07.2016 um 15:18 schrieb Marshall Schor:
> Re: The java-serialized formats now have also a binary header
>
> Not sure what you mean by java-serialized formats.  Perhaps this means the
> formats created by using standard Java Object serialization on the special
> objects in UIMA built for this.
>
> If so, then it seems this would break backwards compatibility, in that a user
> serializing with UIMA 2.9.0, but not using any new features, could not have 
> that
> "read" by an older version of UIMA.
>
>
> -Marshall
>
> On 7/22/2016 7:43 AM, Peter Klügl wrote:
>> Hi,
>>
>>
>> I changed CasIOUtils to use the Header and I extended the header with a
>> bit (0x08) indicating an included type system. No information about the
>> serialization of the type system yet. The java-serialized formats now
>> have also a binary header as I did not want to make the header
>> serializable as it should be read/written by the same functionality.
>>
>> I have thought that old UIMA versions (e.g., 2.8.1) should be able to
>> load new CAS files, but my tests failed.  No idea yet why. I am overall
>> not very happy with the current solution, but I could live with it.
>>
>> Maybe someone wants to take a look at it?
>>
>>
>> Best,
>>
>> Peter
>>
>> Am 20.07.2016 um 14:30 schrieb Peter Klügl:
>>> Hi,
>>>
>>>
>>> I'll try to find the time to do these changes this week, next week latest.
>>>
>>>
>>> btw, input stream sniffing in order to distinguish XMI and XCAS is
>>> currently not supported. There could be a lot of text before the
>>> relevant element occurs, e.g., license text.
>>>
>>>
>>> Best,
>>>
>>>
>>> Peter
>>>
>>>
>>> Am 20.07.2016 um 14:19 schrieb Marshall Schor:
>>>> Hi,
>>>>
>>>> We can change the header, but:
>>>>
>>>> The changed header ought to be "readable" by previous versions of UIMA.  
>>>>
>>>> For XMI and XCAS, these do not currently have special headers, and if we 
>>>> added
>>>> these, those formats could not be read by older versions of UIMA.  Those 
>>>> formats
>>>> contain sufficient distinguishing initial strings to distinguish them, 
>>>> though. 
>>>>
>>>> The XMI format is specified, also, in an OASIS standard which the UIMA 
>>>> project
>>>> is said to (mostly) follow: http://uima.apache.org/uima-specification.html
>>>>
>>>> For binary serializations, I think there's room in the header for an extra 
>>>> bit,
>>>> which if on, could indicate that a type system was included.  I think it 
>>>> would
>>>> be good to have a header extension, when type systems are included, to 
>>>> specify
>>>> the format and version of the type system serialization.
>>>>
>>>> Most serializations in core UIMA have not included the type system.  The 
>>>> one
>>>> which does is CASCompleteSerializer.  This is  a "serializable" (using 
>>>> standard
>>>> Java serializations) object containing serializable forms of the CAS and 
>>>> Type
>>>> System.
>>>>
>>>> Regarding making methods in CommonSerDes public:
>>>>
>>>> It is fine to make them public in the sense that they are accessible from 
>>>> other
>>>> packages, not in a sub-type hierarchy.  But I think it is best to not 
>>>> include
>>>> CommonSerDes in a package which is intended for end-users, because the end 
>>>> user
>>>> UIMA APIs should be (as much as possible) stable over a long time period. 
>>>> Details of how we evolve headers, etc., should not disturb end users, if
>>>> possible; keeping these as public but in packages with names like xxx.impl 
>>>> or
>>>> xyz.internal.abc etc. is the way this has been traditionally done.  It 
>>>> allows us
>>>> to evolve these without affecting end-user APIs.  
>>>>
>>>> Just to be clear: I would not consider uimaFIT and Ruta to be "end-users", 
>>>> as
>>>> they are developed within the UIMA project, and we are willing to evolve 
>>>> them
>>>> together with UIMA core changes.
>>>>
>>>> We don't have a deadline for the next release, but it's mostly ready to 
>>>> go, and
>>>> will solve a significant issue for people wanting to upgrade their Eclipse 
>>>> to
>>>> Neon :-). 
>>>>
>>>> -Marshall
>>>>
>>>> On 7/20/2016 5:03 AM, Peter Klügl wrote:
>>>>> Ok, after looking at the code I must admit that there is much more to do
>>>>> than I epxected. We first need to discuss several things:
>>>>>
>>>>> - can we change the header at all?
>>>>>
>>>>> - do we support type system inclusion in the header?
>>>>>
>>>>> - do we support type system inclusion in the serialized files?
>>>>>
>>>>> - which serial format are which ones?
>>>>>
>>>>> - can we make the methods in CommonSerDes public?
>>>>>
>>>>>
>>>>> What is the deadline for the release? I am now quite loaded with work
>>>>> until next Wednesday :-(
>>>>>
>>>>>
>>>>> Best,
>>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>>
>>>>> Am 19.07.2016 um 22:39 schrieb Marshall Schor:
>>>>>> Great.
>>>>>>
>>>>>> There's now also common code for writing / reading UIMA serialization 
>>>>>> headers, in
>>>>>>
>>>>>> CommonSerDes (in org.apache.uima.cas.impl )
>>>>>>
>>>>>> This includes the extensions to support versioning the serializations, 
>>>>>> which
>>>>>> start to be needed in the next release because a bug fix is slightly 
>>>>>> changing
>>>>>> the serialized form for **delta binary** CAS.
>>>>>>
>>>>>> So, it would be good to use that rather than have another separate header
>>>>>> reader/writer to maintain.
>>>>>>
>>>>>> -Marshall
>>>>>>
>>>>>>
>>>>>> On 7/19/2016 4:13 PM, Peter Klügl wrote:
>>>>>>> Ah, I didn't know that enum. I'll adapt the code and enum.
>>>>>>>
>>>>>>> Am 19.07.2016 um 20:09 schrieb Marshall Schor:
>>>>>>>> We already have an enum in the core for various serial formats.  The 
>>>>>>>> class is
>>>>>>>>
>>>>>>>> public enum SerialFormat {
>>>>>>>>    UNKNOWN,
>>>>>>>>    XCAS,         // with reachability filtering
>>>>>>>>    XMI,          // with reachability filtering
>>>>>>>>    BINARY,       // no filtering
>>>>>>>>    COMPRESSED,   // no filtering  (form 4)
>>>>>>>>    COMPRESSED_FILTERED,   // with reachability and type and feature 
>>>>>>>> filtering
>>>>>>>> (form 6)
>>>>>>>>    COMPRESSED_PROJECTION, // with subset of views
>>>>>>>> }
>>>>>>>>
>>>>>>>> (I don't think COMPRESSED_PROJECTION is in use...)
>>>>>>>>
>>>>>>>> This has been around for maybe 3 years.  I would be in favor of 
>>>>>>>> considering
>>>>>>>> using and/or extending this as needed, rather than having two formats 
>>>>>>>> (that is,
>>>>>>>> the proposed SerializationFormat class).
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>> On 7/19/2016 2:49 AM, Peter Klügl wrote:
>>>>>>>>
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> yes, the class should be officially available to external code. I
>>>>>>>>> already included it in the CAS Editor and in Ruta. I also plan to use 
>>>>>>>>> it
>>>>>>>>> in our inhouse code. I'll change the enforcer rule.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I can write the docs but any help is welcome since I do not know how
>>>>>>>>> much spare time I have for the rest of the week for this. I'll take a
>>>>>>>>> look where the documentation should be added. Haven't looked to it for
>>>>>>>>> some time ;-)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I just chose the name of the class Richard contributed since I thought
>>>>>>>>> it is really suitable. Then, I also noticed the uimaFIT class. This 
>>>>>>>>> is a
>>>>>>>>> not really good situation, but I would not change the name because of 
>>>>>>>>> it.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I would not split the API form the implementation. I do not see any
>>>>>>>>> advantages right now. The class is just a simple utils class with only
>>>>>>>>> static methods like CasCreationUtils (which is also not separated).
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>> Am 18.07.2016 um 22:26 schrieb Marshall Schor:
>>>>>>>>>> This is OK with me.  I can even volunteer to write the docs (but am 
>>>>>>>>>> happy to
>>>>>>>>>> others do it :-) ).
>>>>>>>>>>
>>>>>>>>>> I'll wait to hear about the split (if any) between the public API 
>>>>>>>>>> and the
>>>>>>>>>> impl.
>>>>>>>>>>
>>>>>>>>>> And, we'll need to change the next version # to 2.9.0, from 2.8.2, 
>>>>>>>>>> due to this
>>>>>>>>>> being that kind of a change.
>>>>>>>>>>
>>>>>>>>>> Is everyone OK with all of this?
>>>>>>>>>>
>>>>>>>>>> -Marshall
>>>>>>>>>>
>>>>>>>>>> On 7/18/2016 2:39 PM, Richard Eckart de Castilho wrote:
>>>>>>>>>>> I believe the intention is that this class becomes part of the 
>>>>>>>>>>> public API.
>>>>>>>>>>>
>>>>>>>>>>> Also, my understanding is that it would do a superset of what the
>>>>>>>>>>> uimaFIT class by the same name does. We could then probably 
>>>>>>>>>>> deprecate
>>>>>>>>>>> the respective uimaFIT class and suggest using the core class 
>>>>>>>>>>> instead.
>>>>>>>>>>>
>>>>>>>>>>> Best,
>>>>>>>>>>>
>>>>>>>>>>> -- Richard
>>>>>>>>>>>
>>>>>>>>>>>> On 18.07.2016, at 20:30, Marshall Schor <[email protected]> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>> This is a new class added to uimaj-core project, in 
>>>>>>>>>>>> org.apache.uima.util
>>>>>>>>>>>> package.  This is fine if this is to be part of the official 
>>>>>>>>>>>> public APIs
>>>>>>>>>>>> supported by UIMA going forward; but if that is the case, it should
>>>>>>>>>>>> probably be
>>>>>>>>>>>> documented in the UIMA docs, and we'd have to change the version 
>>>>>>>>>>>> number
>>>>>>>>>>>> (due to
>>>>>>>>>>>> enforcer rules).
>>>>>>>>>>>>
>>>>>>>>>>>> If this is more of an internal use utilities, then it should be in 
>>>>>>>>>>>> one of
>>>>>>>>>>>> the
>>>>>>>>>>>> internal use packages, such as
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.uima.internal.util
>>>>>>>>>>>>
>>>>>>>>>>>> This class is similarly named to a UIMAFit class; are these 
>>>>>>>>>>>> related?
>>>>>>>>>>>>
>>>>>>>>>>>> If some of the APIs are to be permanent and public and part of the 
>>>>>>>>>>>> official
>>>>>>>>>>>> public APIs, but some are internal implementation details, please
>>>>>>>>>>>> consider using
>>>>>>>>>>>> an interface and an ".impl" (or equivalent) approach; packages 
>>>>>>>>>>>> which support
>>>>>>>>>>>> these are:
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.uima.util  and
>>>>>>>>>>>>
>>>>>>>>>>>>    org.apache.uima.util.impl
>>>>>>>>>>>>
>>>>>>>>>>>> --------------
>>>>>>>>>>>>
>>>>>>>>>>>> If this is only an internal kind of change, not intending to 
>>>>>>>>>>>> affect the
>>>>>>>>>>>> official
>>>>>>>>>>>> UIMA APIs, then moving to the internal.util package will fix the 
>>>>>>>>>>>> "enforcer"
>>>>>>>>>>>> error the build is currently getting.
>>>>>>>>>>>>
>>>>>>>>>>>> -Marshall
>>>>>>>>>>>>

Reply via email to