So SERIALIZED and SERIALIZED_TS get no header?
Can you try to deserialize the CAS files created by the unit test with an older version of uima? I cannot get it to work. Best, Peter Am 22.07.2016 um 15:18 schrieb Marshall Schor: > Re: The java-serialized formats now have also a binary header > > Not sure what you mean by java-serialized formats. Perhaps this means the > formats created by using standard Java Object serialization on the special > objects in UIMA built for this. > > If so, then it seems this would break backwards compatibility, in that a user > serializing with UIMA 2.9.0, but not using any new features, could not have > that > "read" by an older version of UIMA. > > > -Marshall > > On 7/22/2016 7:43 AM, Peter Klügl wrote: >> Hi, >> >> >> I changed CasIOUtils to use the Header and I extended the header with a >> bit (0x08) indicating an included type system. No information about the >> serialization of the type system yet. The java-serialized formats now >> have also a binary header as I did not want to make the header >> serializable as it should be read/written by the same functionality. >> >> I have thought that old UIMA versions (e.g., 2.8.1) should be able to >> load new CAS files, but my tests failed. No idea yet why. I am overall >> not very happy with the current solution, but I could live with it. >> >> Maybe someone wants to take a look at it? >> >> >> Best, >> >> Peter >> >> Am 20.07.2016 um 14:30 schrieb Peter Klügl: >>> Hi, >>> >>> >>> I'll try to find the time to do these changes this week, next week latest. >>> >>> >>> btw, input stream sniffing in order to distinguish XMI and XCAS is >>> currently not supported. There could be a lot of text before the >>> relevant element occurs, e.g., license text. >>> >>> >>> Best, >>> >>> >>> Peter >>> >>> >>> Am 20.07.2016 um 14:19 schrieb Marshall Schor: >>>> Hi, >>>> >>>> We can change the header, but: >>>> >>>> The changed header ought to be "readable" by previous versions of UIMA. >>>> >>>> For XMI and XCAS, these do not currently have special headers, and if we >>>> added >>>> these, those formats could not be read by older versions of UIMA. Those >>>> formats >>>> contain sufficient distinguishing initial strings to distinguish them, >>>> though. >>>> >>>> The XMI format is specified, also, in an OASIS standard which the UIMA >>>> project >>>> is said to (mostly) follow: http://uima.apache.org/uima-specification.html >>>> >>>> For binary serializations, I think there's room in the header for an extra >>>> bit, >>>> which if on, could indicate that a type system was included. I think it >>>> would >>>> be good to have a header extension, when type systems are included, to >>>> specify >>>> the format and version of the type system serialization. >>>> >>>> Most serializations in core UIMA have not included the type system. The >>>> one >>>> which does is CASCompleteSerializer. This is a "serializable" (using >>>> standard >>>> Java serializations) object containing serializable forms of the CAS and >>>> Type >>>> System. >>>> >>>> Regarding making methods in CommonSerDes public: >>>> >>>> It is fine to make them public in the sense that they are accessible from >>>> other >>>> packages, not in a sub-type hierarchy. But I think it is best to not >>>> include >>>> CommonSerDes in a package which is intended for end-users, because the end >>>> user >>>> UIMA APIs should be (as much as possible) stable over a long time period. >>>> Details of how we evolve headers, etc., should not disturb end users, if >>>> possible; keeping these as public but in packages with names like xxx.impl >>>> or >>>> xyz.internal.abc etc. is the way this has been traditionally done. It >>>> allows us >>>> to evolve these without affecting end-user APIs. >>>> >>>> Just to be clear: I would not consider uimaFIT and Ruta to be "end-users", >>>> as >>>> they are developed within the UIMA project, and we are willing to evolve >>>> them >>>> together with UIMA core changes. >>>> >>>> We don't have a deadline for the next release, but it's mostly ready to >>>> go, and >>>> will solve a significant issue for people wanting to upgrade their Eclipse >>>> to >>>> Neon :-). >>>> >>>> -Marshall >>>> >>>> On 7/20/2016 5:03 AM, Peter Klügl wrote: >>>>> Ok, after looking at the code I must admit that there is much more to do >>>>> than I epxected. We first need to discuss several things: >>>>> >>>>> - can we change the header at all? >>>>> >>>>> - do we support type system inclusion in the header? >>>>> >>>>> - do we support type system inclusion in the serialized files? >>>>> >>>>> - which serial format are which ones? >>>>> >>>>> - can we make the methods in CommonSerDes public? >>>>> >>>>> >>>>> What is the deadline for the release? I am now quite loaded with work >>>>> until next Wednesday :-( >>>>> >>>>> >>>>> Best, >>>>> >>>>> >>>>> Peter >>>>> >>>>> >>>>> Am 19.07.2016 um 22:39 schrieb Marshall Schor: >>>>>> Great. >>>>>> >>>>>> There's now also common code for writing / reading UIMA serialization >>>>>> headers, in >>>>>> >>>>>> CommonSerDes (in org.apache.uima.cas.impl ) >>>>>> >>>>>> This includes the extensions to support versioning the serializations, >>>>>> which >>>>>> start to be needed in the next release because a bug fix is slightly >>>>>> changing >>>>>> the serialized form for **delta binary** CAS. >>>>>> >>>>>> So, it would be good to use that rather than have another separate header >>>>>> reader/writer to maintain. >>>>>> >>>>>> -Marshall >>>>>> >>>>>> >>>>>> On 7/19/2016 4:13 PM, Peter Klügl wrote: >>>>>>> Ah, I didn't know that enum. I'll adapt the code and enum. >>>>>>> >>>>>>> Am 19.07.2016 um 20:09 schrieb Marshall Schor: >>>>>>>> We already have an enum in the core for various serial formats. The >>>>>>>> class is >>>>>>>> >>>>>>>> public enum SerialFormat { >>>>>>>> UNKNOWN, >>>>>>>> XCAS, // with reachability filtering >>>>>>>> XMI, // with reachability filtering >>>>>>>> BINARY, // no filtering >>>>>>>> COMPRESSED, // no filtering (form 4) >>>>>>>> COMPRESSED_FILTERED, // with reachability and type and feature >>>>>>>> filtering >>>>>>>> (form 6) >>>>>>>> COMPRESSED_PROJECTION, // with subset of views >>>>>>>> } >>>>>>>> >>>>>>>> (I don't think COMPRESSED_PROJECTION is in use...) >>>>>>>> >>>>>>>> This has been around for maybe 3 years. I would be in favor of >>>>>>>> considering >>>>>>>> using and/or extending this as needed, rather than having two formats >>>>>>>> (that is, >>>>>>>> the proposed SerializationFormat class). >>>>>>>> >>>>>>>> -Marshall >>>>>>>> >>>>>>>> On 7/19/2016 2:49 AM, Peter Klügl wrote: >>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> >>>>>>>>> yes, the class should be officially available to external code. I >>>>>>>>> already included it in the CAS Editor and in Ruta. I also plan to use >>>>>>>>> it >>>>>>>>> in our inhouse code. I'll change the enforcer rule. >>>>>>>>> >>>>>>>>> >>>>>>>>> I can write the docs but any help is welcome since I do not know how >>>>>>>>> much spare time I have for the rest of the week for this. I'll take a >>>>>>>>> look where the documentation should be added. Haven't looked to it for >>>>>>>>> some time ;-) >>>>>>>>> >>>>>>>>> >>>>>>>>> I just chose the name of the class Richard contributed since I thought >>>>>>>>> it is really suitable. Then, I also noticed the uimaFIT class. This >>>>>>>>> is a >>>>>>>>> not really good situation, but I would not change the name because of >>>>>>>>> it. >>>>>>>>> >>>>>>>>> >>>>>>>>> I would not split the API form the implementation. I do not see any >>>>>>>>> advantages right now. The class is just a simple utils class with only >>>>>>>>> static methods like CasCreationUtils (which is also not separated). >>>>>>>>> >>>>>>>>> >>>>>>>>> Best, >>>>>>>>> >>>>>>>>> Peter >>>>>>>>> >>>>>>>>> Am 18.07.2016 um 22:26 schrieb Marshall Schor: >>>>>>>>>> This is OK with me. I can even volunteer to write the docs (but am >>>>>>>>>> happy to >>>>>>>>>> others do it :-) ). >>>>>>>>>> >>>>>>>>>> I'll wait to hear about the split (if any) between the public API >>>>>>>>>> and the >>>>>>>>>> impl. >>>>>>>>>> >>>>>>>>>> And, we'll need to change the next version # to 2.9.0, from 2.8.2, >>>>>>>>>> due to this >>>>>>>>>> being that kind of a change. >>>>>>>>>> >>>>>>>>>> Is everyone OK with all of this? >>>>>>>>>> >>>>>>>>>> -Marshall >>>>>>>>>> >>>>>>>>>> On 7/18/2016 2:39 PM, Richard Eckart de Castilho wrote: >>>>>>>>>>> I believe the intention is that this class becomes part of the >>>>>>>>>>> public API. >>>>>>>>>>> >>>>>>>>>>> Also, my understanding is that it would do a superset of what the >>>>>>>>>>> uimaFIT class by the same name does. We could then probably >>>>>>>>>>> deprecate >>>>>>>>>>> the respective uimaFIT class and suggest using the core class >>>>>>>>>>> instead. >>>>>>>>>>> >>>>>>>>>>> Best, >>>>>>>>>>> >>>>>>>>>>> -- Richard >>>>>>>>>>> >>>>>>>>>>>> On 18.07.2016, at 20:30, Marshall Schor <[email protected]> wrote: >>>>>>>>>>>> >>>>>>>>>>>> This is a new class added to uimaj-core project, in >>>>>>>>>>>> org.apache.uima.util >>>>>>>>>>>> package. This is fine if this is to be part of the official >>>>>>>>>>>> public APIs >>>>>>>>>>>> supported by UIMA going forward; but if that is the case, it should >>>>>>>>>>>> probably be >>>>>>>>>>>> documented in the UIMA docs, and we'd have to change the version >>>>>>>>>>>> number >>>>>>>>>>>> (due to >>>>>>>>>>>> enforcer rules). >>>>>>>>>>>> >>>>>>>>>>>> If this is more of an internal use utilities, then it should be in >>>>>>>>>>>> one of >>>>>>>>>>>> the >>>>>>>>>>>> internal use packages, such as >>>>>>>>>>>> >>>>>>>>>>>> org.apache.uima.internal.util >>>>>>>>>>>> >>>>>>>>>>>> This class is similarly named to a UIMAFit class; are these >>>>>>>>>>>> related? >>>>>>>>>>>> >>>>>>>>>>>> If some of the APIs are to be permanent and public and part of the >>>>>>>>>>>> official >>>>>>>>>>>> public APIs, but some are internal implementation details, please >>>>>>>>>>>> consider using >>>>>>>>>>>> an interface and an ".impl" (or equivalent) approach; packages >>>>>>>>>>>> which support >>>>>>>>>>>> these are: >>>>>>>>>>>> >>>>>>>>>>>> org.apache.uima.util and >>>>>>>>>>>> >>>>>>>>>>>> org.apache.uima.util.impl >>>>>>>>>>>> >>>>>>>>>>>> -------------- >>>>>>>>>>>> >>>>>>>>>>>> If this is only an internal kind of change, not intending to >>>>>>>>>>>> affect the >>>>>>>>>>>> official >>>>>>>>>>>> UIMA APIs, then moving to the internal.util package will fix the >>>>>>>>>>>> "enforcer" >>>>>>>>>>>> error the build is currently getting. >>>>>>>>>>>> >>>>>>>>>>>> -Marshall >>>>>>>>>>>>
