On 20.07.2016, at 11:03, Peter Klügl <[email protected]> wrote:
> 
> Ok, after looking at the code I must admit that there is much more to do
> than I epxected. We first need to discuss several things:
> 
> - can we change the header at all?

Afaik Marshall added a version field to the header, so it should be possible to 
change
define a new version of the file format with an extended header.

> - do we support type system inclusion in the header?

Not sure what you mean by "in the header" vs "in the serialized files".

> - do we support type system inclusion in the serialized files?

With "serialized", do you mean the "Java serialized files" - or any of the 
binary files?

I'm strongly in favor of allowing to have typesystem information embedded in the
binary/serialized files. Having the type system separate is very useful to save 
space, but highly inconvenient when e.g. sending annotated documents around. 
There are regularly posts
on the mailing list where people try to recover typesystem information from XMI 
files because
they lost the original type system description.

> - which serial format are which ones?

Not sure what your question is since you already added the new constants - and 
the new ones make sense to me.

I believe the format IDs used in DKPro Core map as follows:

"S"  -> SERILALIZED
"S+" -> SERILALIZED_TS
"0"  -> BINARY,                // no filtering
"4"  -> COMPRESSED,            // no filtering  (form 4)
"6"  -> COMPRESSED_FILTERED,   // with reachability and type and feature 
filtering (form 6)
"6+" -> COMPRESSED_FILTERED_TS // ~probably similar, not the same
n/a  -> COMPRESSED_PROJECTION, // with subset of views

Cheers,

-- Richard

Reply via email to