On 10.08.2016, at 22:54, Richard Eckart de Castilho <r...@apache.org> wrote: > > On 10.08.2016, at 22:37, Richard Eckart de Castilho <r...@apache.org> wrote: >> >> On 05.08.2016, at 14:18, Richard Eckart de Castilho <r...@apache.org> wrote: >>> >>> Ok, then I think we there is agreement that we keep COMPRESSED_FILTERED >>> (form 6) >>> and COMPRESSED_FILTERED_TSI (form 6 + TS). >> >> Hm, I don't see a COMPRESSED_FILTERED_TSI in the SerialFormat. >> >>> But for the time being we only support lenient loading (filter on load) - >>> i.e. the TS in the serialized form corresponds to the original TS from the >>> CAS. >> >> It looks like the following features are currently not supported by >> CasIOUtils: >> >> - storing TS along with form 6 in a single file: I see no code path where a >> TS is stored in a COMPRESSED_FILTERED binary file and none where it is >> loaded from a COMPRESSED_FILTERED binary file >> >> - lenient loading of COMPRESSED_FILTERED: I do not even see a path where a >> separately specific TS input stream is used when reading a >> COMPRESSED_FILTERED file. The TS only seems to be used when reading a >> SERIALIZED file. >> >> Am I missing something? If we still miss the two features listed above, then >> we kind of lost them in translation. I am pretty sure they were there in the >> initial code that I provided. Why did we loose them? > > Looks like they were removed in #1755237: > > [UIMA-4685] refactoring, moving some common stuff into more core UIMA, > augmenting JavaDocs, removing compressed form 6 with type system and > definitions, correcting deserialization process when installing type system > (and index defs) - by using the core code paths for this. Fixed the test > cases to account for some renaming and removal of compressed form 6 with type > sys. Made error message using standard UIMA msg things. > > Can we undo the removal of COMPRESSED_FILTERED_TS from that commit please? > It is basically *the* core functionality from my perspective since it enables > lenient loading of binary CASes.
We are also lacking a method with a signature like this: load(InputStream casInputStream, CASMgrSerializer tsi, CAS aCAS, boolean leniently) Such a method is important when bulk-reading multiple CASes in order to avoid having to read the TSI over and over again. -- Richard