I'll take a look now; thanks for the work!

-Marshall

On 8/2/2016 7:40 AM, Peter Klügl wrote:
> Hi,
>
>
> the errors where on my side. Reading the CASes created by the unit test
> of CasIOUtils with uima 2.8.1 works fine now.
>
>
> Can I do something else for this ticket?
>
>
> Best,
>
>
> Peter
>
>
> Am 25.07.2016 um 08:43 schrieb Peter Klügl:
>> Yeah, I know java serialization.
>>
>> I think it depends on the perspective and the use case. I added a header
>> to the serialized outputs since I see them as binary fomats and I
>> thought that all binary formats should get the same header. Then, I
>> removed it again, then I added it again. I will remove it again now.
>>
>>
>> I don't think that we will get an optimal solution, e.g., the header is
>> read twice, the previous uimaj method should return the format and so
>> on. We should get this up and running for the release without breaking
>> backwards compatibility and then think what it should look like, and if
>> further functionality/refactoring is required.
>>
>>
>> I used uimaj-core 2.8.1. Here are some errors:
>>
>> simpleCas.bins0
>> org.apache.uima.cas.CASRuntimeException: No sofaFS for specified sofaRef
>> found.simpleCas.bins4
>>     at org.apache.uima.cas.impl.CASImpl.getSofa(CASImpl.java:806)
>>     at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS_common(FSIndexRepositoryImpl.java:2781)
>>     at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.ll_addFS(FSIndexRepositoryImpl.java:2763)
>>     at
>> org.apache.uima.cas.impl.FSIndexRepositoryImpl.addFS(FSIndexRepositoryImpl.java:2068)
>>     at org.apache.uima.cas.impl.CASImpl.reinitIndexedFSs(CASImpl.java:1765)
>>     at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1488)
>>     at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1344)
>>     at
>> org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>>     at tutorial.entity.LoadCas.main(LoadCas.java:55)
>> org.apache.uima.cas.CASRuntimeException: Error trying to read BLOB data
>> from an input stream and deserialize into a CAS.
>>     at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1591)
>>     at org.apache.uima.cas.impl.CASImpl.reinit(CASImpl.java:1344)
>>     at
>> org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:171)
>>     at tutorial.entity.LoadCas.main(LoadCas.java:39)
>>
>> simpleCas.bins6
>> java.io.EOFException
>>     at java.io.DataInputStream.readUnsignedByte(DataInputStream.java:290)
>>     at org.apache.uima.util.impl.DataIO.readVlong(DataIO.java:355)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.readVlong(BinaryCasSerDes6.java:2193)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.readDiff(BinaryCasSerDes6.java:2102)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.readLongOrDouble(BinaryCasSerDes6.java:2128)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.readByKind(BinaryCasSerDes6.java:1920)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.deserializeAfterVersion(BinaryCasSerDes6.java:1748)
>>     at
>> org.apache.uima.cas.impl.BinaryCasSerDes6.deserialize(BinaryCasSerDes6.java:1596)
>>     at
>> org.apache.uima.cas.impl.Serialization.deserializeCAS(Serialization.java:270)
>>     at tutorial.entity.LoadCas.main(LoadCas.java:47)
>>
>>
>>
>> Am 22.07.2016 um 21:17 schrieb Marshall Schor:
>>> I think the model for these two formats is more general than what you are
>>> imagining.  These are formats that follow the standard Java serialization
>>> standard, see for example,
>>> https://docs.oracle.com/javase/7/docs/platform/serialization/spec/serialTOC.html
>>>
>>> The bytes corresponding to the serialized form are expected to (in general) 
>>> be
>>> written anywhere in a data output stream, perhaps preceded or followed by 
>>> (maybe
>>> many) other serialized objects; the overall format of that stream is up to 
>>> the
>>> user designing it, including any headers the user might decide on.
>>>
>>> In the data output stream, each data object, including one representing the 
>>> CAS,
>>> for example, has a format dictated by the Java standard for object 
>>> serialization.
>>>
>>> What error do you get when you try to deserialize a CAS object in a data 
>>> stream
>>> with an older version of UIMA?
>>>
>>> -Marshall
>>>
>>> On 7/22/2016 9:31 AM, Peter Klügl wrote:
>>>> So SERIALIZED and SERIALIZED_TS get no header?
>>>>
>>>>
>>>> Can you try to deserialize the CAS files created by the unit test with
>>>> an older version of uima? I cannot get it to work.
>>>>
>>>>
>>>> Best,
>>>>
>>>>
>>>> Peter
>>>>
>>>>
>>>> Am 22.07.2016 um 15:18 schrieb Marshall Schor:
>>>>> Re: The java-serialized formats now have also a binary header
>>>>>
>>>>> Not sure what you mean by java-serialized formats.  Perhaps this means the
>>>>> formats created by using standard Java Object serialization on the special
>>>>> objects in UIMA built for this.
>>>>>
>>>>> If so, then it seems this would break backwards compatibility, in that a 
>>>>> user
>>>>> serializing with UIMA 2.9.0, but not using any new features, could not 
>>>>> have that
>>>>> "read" by an older version of UIMA.
>>>>>
>>>>>
>>>>> -Marshall
>>>>>
>>>>> On 7/22/2016 7:43 AM, Peter Klügl wrote:
>>>>>> Hi,
>>>>>>
>>>>>>
>>>>>> I changed CasIOUtils to use the Header and I extended the header with a
>>>>>> bit (0x08) indicating an included type system. No information about the
>>>>>> serialization of the type system yet. The java-serialized formats now
>>>>>> have also a binary header as I did not want to make the header
>>>>>> serializable as it should be read/written by the same functionality.
>>>>>>
>>>>>> I have thought that old UIMA versions (e.g., 2.8.1) should be able to
>>>>>> load new CAS files, but my tests failed.  No idea yet why. I am overall
>>>>>> not very happy with the current solution, but I could live with it.
>>>>>>
>>>>>> Maybe someone wants to take a look at it?
>>>>>>
>>>>>>
>>>>>> Best,
>>>>>>
>>>>>> Peter
>>>>>>
>>>>>> Am 20.07.2016 um 14:30 schrieb Peter Klügl:
>>>>>>> Hi,
>>>>>>>
>>>>>>>
>>>>>>> I'll try to find the time to do these changes this week, next week 
>>>>>>> latest.
>>>>>>>
>>>>>>>
>>>>>>> btw, input stream sniffing in order to distinguish XMI and XCAS is
>>>>>>> currently not supported. There could be a lot of text before the
>>>>>>> relevant element occurs, e.g., license text.
>>>>>>>
>>>>>>>
>>>>>>> Best,
>>>>>>>
>>>>>>>
>>>>>>> Peter
>>>>>>>
>>>>>>>
>>>>>>> Am 20.07.2016 um 14:19 schrieb Marshall Schor:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> We can change the header, but:
>>>>>>>>
>>>>>>>> The changed header ought to be "readable" by previous versions of 
>>>>>>>> UIMA.  
>>>>>>>>
>>>>>>>> For XMI and XCAS, these do not currently have special headers, and if 
>>>>>>>> we added
>>>>>>>> these, those formats could not be read by older versions of UIMA.  
>>>>>>>> Those formats
>>>>>>>> contain sufficient distinguishing initial strings to distinguish them, 
>>>>>>>> though. 
>>>>>>>>
>>>>>>>> The XMI format is specified, also, in an OASIS standard which the UIMA 
>>>>>>>> project
>>>>>>>> is said to (mostly) follow: 
>>>>>>>> http://uima.apache.org/uima-specification.html
>>>>>>>>
>>>>>>>> For binary serializations, I think there's room in the header for an 
>>>>>>>> extra bit,
>>>>>>>> which if on, could indicate that a type system was included.  I think 
>>>>>>>> it would
>>>>>>>> be good to have a header extension, when type systems are included, to 
>>>>>>>> specify
>>>>>>>> the format and version of the type system serialization.
>>>>>>>>
>>>>>>>> Most serializations in core UIMA have not included the type system.  
>>>>>>>> The one
>>>>>>>> which does is CASCompleteSerializer.  This is  a "serializable" (using 
>>>>>>>> standard
>>>>>>>> Java serializations) object containing serializable forms of the CAS 
>>>>>>>> and Type
>>>>>>>> System.
>>>>>>>>
>>>>>>>> Regarding making methods in CommonSerDes public:
>>>>>>>>
>>>>>>>> It is fine to make them public in the sense that they are accessible 
>>>>>>>> from other
>>>>>>>> packages, not in a sub-type hierarchy.  But I think it is best to not 
>>>>>>>> include
>>>>>>>> CommonSerDes in a package which is intended for end-users, because the 
>>>>>>>> end user
>>>>>>>> UIMA APIs should be (as much as possible) stable over a long time 
>>>>>>>> period. 
>>>>>>>> Details of how we evolve headers, etc., should not disturb end users, 
>>>>>>>> if
>>>>>>>> possible; keeping these as public but in packages with names like 
>>>>>>>> xxx.impl or
>>>>>>>> xyz.internal.abc etc. is the way this has been traditionally done.  It 
>>>>>>>> allows us
>>>>>>>> to evolve these without affecting end-user APIs.  
>>>>>>>>
>>>>>>>> Just to be clear: I would not consider uimaFIT and Ruta to be 
>>>>>>>> "end-users", as
>>>>>>>> they are developed within the UIMA project, and we are willing to 
>>>>>>>> evolve them
>>>>>>>> together with UIMA core changes.
>>>>>>>>
>>>>>>>> We don't have a deadline for the next release, but it's mostly ready 
>>>>>>>> to go, and
>>>>>>>> will solve a significant issue for people wanting to upgrade their 
>>>>>>>> Eclipse to
>>>>>>>> Neon :-). 
>>>>>>>>
>>>>>>>> -Marshall
>>>>>>>>
>>>>>>>> On 7/20/2016 5:03 AM, Peter Klügl wrote:
>>>>>>>>> Ok, after looking at the code I must admit that there is much more to 
>>>>>>>>> do
>>>>>>>>> than I epxected. We first need to discuss several things:
>>>>>>>>>
>>>>>>>>> - can we change the header at all?
>>>>>>>>>
>>>>>>>>> - do we support type system inclusion in the header?
>>>>>>>>>
>>>>>>>>> - do we support type system inclusion in the serialized files?
>>>>>>>>>
>>>>>>>>> - which serial format are which ones?
>>>>>>>>>
>>>>>>>>> - can we make the methods in CommonSerDes public?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> What is the deadline for the release? I am now quite loaded with work
>>>>>>>>> until next Wednesday :-(
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Best,
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Peter
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Am 19.07.2016 um 22:39 schrieb Marshall Schor:
>>>>>>>>>> Great.
>>>>>>>>>>
>>>>>>>>>> There's now also common code for writing / reading UIMA 
>>>>>>>>>> serialization headers, in
>>>>>>>>>>
>>>>>>>>>> CommonSerDes (in org.apache.uima.cas.impl )
>>>>>>>>>>
>>>>>>>>>> This includes the extensions to support versioning the 
>>>>>>>>>> serializations, which
>>>>>>>>>> start to be needed in the next release because a bug fix is slightly 
>>>>>>>>>> changing
>>>>>>>>>> the serialized form for **delta binary** CAS.
>>>>>>>>>>
>>>>>>>>>> So, it would be good to use that rather than have another separate 
>>>>>>>>>> header
>>>>>>>>>> reader/writer to maintain.
>>>>>>>>>>
>>>>>>>>>> -Marshall
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 7/19/2016 4:13 PM, Peter Klügl wrote:
>>>>>>>>>>> Ah, I didn't know that enum. I'll adapt the code and enum.
>>>>>>>>>>>
>>>>>>>>>>> Am 19.07.2016 um 20:09 schrieb Marshall Schor:
>>>>>>>>>>>> We already have an enum in the core for various serial formats.  
>>>>>>>>>>>> The class is
>>>>>>>>>>>>
>>>>>>>>>>>> public enum SerialFormat {
>>>>>>>>>>>>    UNKNOWN,
>>>>>>>>>>>>    XCAS,         // with reachability filtering
>>>>>>>>>>>>    XMI,          // with reachability filtering
>>>>>>>>>>>>    BINARY,       // no filtering
>>>>>>>>>>>>    COMPRESSED,   // no filtering  (form 4)
>>>>>>>>>>>>    COMPRESSED_FILTERED,   // with reachability and type and 
>>>>>>>>>>>> feature filtering
>>>>>>>>>>>> (form 6)
>>>>>>>>>>>>    COMPRESSED_PROJECTION, // with subset of views
>>>>>>>>>>>> }
>>>>>>>>>>>>
>>>>>>>>>>>> (I don't think COMPRESSED_PROJECTION is in use...)
>>>>>>>>>>>>
>>>>>>>>>>>> This has been around for maybe 3 years.  I would be in favor of 
>>>>>>>>>>>> considering
>>>>>>>>>>>> using and/or extending this as needed, rather than having two 
>>>>>>>>>>>> formats (that is,
>>>>>>>>>>>> the proposed SerializationFormat class).
>>>>>>>>>>>>
>>>>>>>>>>>> -Marshall
>>>>>>>>>>>>
>>>>>>>>>>>> On 7/19/2016 2:49 AM, Peter Klügl wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> yes, the class should be officially available to external code. I
>>>>>>>>>>>>> already included it in the CAS Editor and in Ruta. I also plan to 
>>>>>>>>>>>>> use it
>>>>>>>>>>>>> in our inhouse code. I'll change the enforcer rule.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I can write the docs but any help is welcome since I do not know 
>>>>>>>>>>>>> how
>>>>>>>>>>>>> much spare time I have for the rest of the week for this. I'll 
>>>>>>>>>>>>> take a
>>>>>>>>>>>>> look where the documentation should be added. Haven't looked to 
>>>>>>>>>>>>> it for
>>>>>>>>>>>>> some time ;-)
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I just chose the name of the class Richard contributed since I 
>>>>>>>>>>>>> thought
>>>>>>>>>>>>> it is really suitable. Then, I also noticed the uimaFIT class. 
>>>>>>>>>>>>> This is a
>>>>>>>>>>>>> not really good situation, but I would not change the name 
>>>>>>>>>>>>> because of it.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I would not split the API form the implementation. I do not see 
>>>>>>>>>>>>> any
>>>>>>>>>>>>> advantages right now. The class is just a simple utils class with 
>>>>>>>>>>>>> only
>>>>>>>>>>>>> static methods like CasCreationUtils (which is also not 
>>>>>>>>>>>>> separated).
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>
>>>>>>>>>>>>> Am 18.07.2016 um 22:26 schrieb Marshall Schor:
>>>>>>>>>>>>>> This is OK with me.  I can even volunteer to write the docs (but 
>>>>>>>>>>>>>> am happy to
>>>>>>>>>>>>>> others do it :-) ).
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I'll wait to hear about the split (if any) between the public 
>>>>>>>>>>>>>> API and the
>>>>>>>>>>>>>> impl.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> And, we'll need to change the next version # to 2.9.0, from 
>>>>>>>>>>>>>> 2.8.2, due to this
>>>>>>>>>>>>>> being that kind of a change.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Is everyone OK with all of this?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> -Marshall
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On 7/18/2016 2:39 PM, Richard Eckart de Castilho wrote:
>>>>>>>>>>>>>>> I believe the intention is that this class becomes part of the 
>>>>>>>>>>>>>>> public API.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Also, my understanding is that it would do a superset of what 
>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>> uimaFIT class by the same name does. We could then probably 
>>>>>>>>>>>>>>> deprecate
>>>>>>>>>>>>>>> the respective uimaFIT class and suggest using the core class 
>>>>>>>>>>>>>>> instead.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Best,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> -- Richard
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On 18.07.2016, at 20:30, Marshall Schor <[email protected]> wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This is a new class added to uimaj-core project, in 
>>>>>>>>>>>>>>>> org.apache.uima.util
>>>>>>>>>>>>>>>> package.  This is fine if this is to be part of the official 
>>>>>>>>>>>>>>>> public APIs
>>>>>>>>>>>>>>>> supported by UIMA going forward; but if that is the case, it 
>>>>>>>>>>>>>>>> should
>>>>>>>>>>>>>>>> probably be
>>>>>>>>>>>>>>>> documented in the UIMA docs, and we'd have to change the 
>>>>>>>>>>>>>>>> version number
>>>>>>>>>>>>>>>> (due to
>>>>>>>>>>>>>>>> enforcer rules).
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If this is more of an internal use utilities, then it should 
>>>>>>>>>>>>>>>> be in one of
>>>>>>>>>>>>>>>> the
>>>>>>>>>>>>>>>> internal use packages, such as
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    org.apache.uima.internal.util
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> This class is similarly named to a UIMAFit class; are these 
>>>>>>>>>>>>>>>> related?
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If some of the APIs are to be permanent and public and part of 
>>>>>>>>>>>>>>>> the official
>>>>>>>>>>>>>>>> public APIs, but some are internal implementation details, 
>>>>>>>>>>>>>>>> please
>>>>>>>>>>>>>>>> consider using
>>>>>>>>>>>>>>>> an interface and an ".impl" (or equivalent) approach; packages 
>>>>>>>>>>>>>>>> which support
>>>>>>>>>>>>>>>> these are:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    org.apache.uima.util  and
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>    org.apache.uima.util.impl
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> --------------
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> If this is only an internal kind of change, not intending to 
>>>>>>>>>>>>>>>> affect the
>>>>>>>>>>>>>>>> official
>>>>>>>>>>>>>>>> UIMA APIs, then moving to the internal.util package will fix 
>>>>>>>>>>>>>>>> the "enforcer"
>>>>>>>>>>>>>>>> error the build is currently getting.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> -Marshall
>>>>>>>>>>>>>>>>
>

Reply via email to