+1 to the idea that users would like to be able to conveniently include the type
system in (some) serialized formats.

-Marshall


On 7/20/2016 5:12 AM, Richard Eckart de Castilho wrote:
> On 20.07.2016, at 11:03, Peter Klügl <[email protected]> wrote:
>> Ok, after looking at the code I must admit that there is much more to do
>> than I epxected. We first need to discuss several things:
>>
>> - can we change the header at all?
> Afaik Marshall added a version field to the header, so it should be possible 
> to change
> define a new version of the file format with an extended header.
>
>> - do we support type system inclusion in the header?
> Not sure what you mean by "in the header" vs "in the serialized files".
>
>> - do we support type system inclusion in the serialized files?
> With "serialized", do you mean the "Java serialized files" - or any of the 
> binary files?
>
> I'm strongly in favor of allowing to have typesystem information embedded in 
> the
> binary/serialized files. Having the type system separate is very useful to 
> save space, but highly inconvenient when e.g. sending annotated documents 
> around. There are regularly posts
> on the mailing list where people try to recover typesystem information from 
> XMI files because
> they lost the original type system description.
>
>> - which serial format are which ones?
> Not sure what your question is since you already added the new constants - 
> and the new ones make sense to me.
>
> I believe the format IDs used in DKPro Core map as follows:
>
> "S"  -> SERILALIZED
> "S+" -> SERILALIZED_TS
> "0"  -> BINARY,                // no filtering
> "4"  -> COMPRESSED,            // no filtering  (form 4)
> "6"  -> COMPRESSED_FILTERED,   // with reachability and type and feature 
> filtering (form 6)
> "6+" -> COMPRESSED_FILTERED_TS // ~probably similar, not the same
> n/a  -> COMPRESSED_PROJECTION, // with subset of views
>
> Cheers,
>
> -- Richard
>
>

Reply via email to