Hi,
Am 20.07.2016 um 11:12 schrieb Richard Eckart de Castilho: > On 20.07.2016, at 11:03, Peter Klügl <[email protected]> wrote: >> Ok, after looking at the code I must admit that there is much more to do >> than I epxected. We first need to discuss several things: >> >> - can we change the header at all? > Afaik Marshall added a version field to the header, so it should be possible > to change > define a new version of the file format with an extended header. > >> - do we support type system inclusion in the header? > Not sure what you mean by "in the header" vs "in the serialized files". We had a header specifying that there is also a serialized type system. If I exchange the string header with Header, the information should still be present, I think. >> - do we support type system inclusion in the serialized files? > With "serialized", do you mean the "Java serialized files" - or any of the > binary files? Sp and S6p right now. We could also include it optinally in all formats. > I'm strongly in favor of allowing to have typesystem information embedded in > the > binary/serialized files. Having the type system separate is very useful to > save space, but highly inconvenient when e.g. sending annotated documents > around. There are regularly posts > on the mailing list where people try to recover typesystem information from > XMI files because > they lost the original type system description. > >> - which serial format are which ones? > Not sure what your question is since you already added the new constants - > and the new ones make sense to me. I just guessed the mapping by looking at the implementation but I got a bit confused right now. If this mapping is correct, then we should maybe talk about the naming. Best, Peter > I believe the format IDs used in DKPro Core map as follows: > > "S" -> SERILALIZED > "S+" -> SERILALIZED_TS > "0" -> BINARY, // no filtering > "4" -> COMPRESSED, // no filtering (form 4) > "6" -> COMPRESSED_FILTERED, // with reachability and type and feature > filtering (form 6) > "6+" -> COMPRESSED_FILTERED_TS // ~probably similar, not the same > n/a -> COMPRESSED_PROJECTION, // with subset of views > > Cheers, > > -- Richard >
