I'm wondering how to add type system info to the current XCAS and XMI formats.
One idea: embed the XML elements corresponding to the type system descriptor for the Type System. There may be some conventions in the XMI format to accommodate. If the convention for including this was to have it come first, then a prescan could detect if the type system was part of the serailized form. -Marshall On 7/20/2016 5:12 AM, Richard Eckart de Castilho wrote: > On 20.07.2016, at 11:03, Peter Klügl <[email protected]> wrote: >> Ok, after looking at the code I must admit that there is much more to do >> than I epxected. We first need to discuss several things: >> >> - can we change the header at all? > Afaik Marshall added a version field to the header, so it should be possible > to change > define a new version of the file format with an extended header. > >> - do we support type system inclusion in the header? > Not sure what you mean by "in the header" vs "in the serialized files". > >> - do we support type system inclusion in the serialized files? > With "serialized", do you mean the "Java serialized files" - or any of the > binary files? > > I'm strongly in favor of allowing to have typesystem information embedded in > the > binary/serialized files. Having the type system separate is very useful to > save space, but highly inconvenient when e.g. sending annotated documents > around. There are regularly posts > on the mailing list where people try to recover typesystem information from > XMI files because > they lost the original type system description. > >> - which serial format are which ones? > Not sure what your question is since you already added the new constants - > and the new ones make sense to me. > > I believe the format IDs used in DKPro Core map as follows: > > "S" -> SERILALIZED > "S+" -> SERILALIZED_TS > "0" -> BINARY, // no filtering > "4" -> COMPRESSED, // no filtering (form 4) > "6" -> COMPRESSED_FILTERED, // with reachability and type and feature > filtering (form 6) > "6+" -> COMPRESSED_FILTERED_TS // ~probably similar, not the same > n/a -> COMPRESSED_PROJECTION, // with subset of views > > Cheers, > > -- Richard > >
