To allow for Techievena's weights, a new breaking binary format was needed
for lttoolbox. I've now implemented something that should be fairly
future-proof - if we're going to break it, might as well break it in a way
that won't bite us later.

https://github.com/apertium/lttoolbox/blob/master/lttoolbox/compression.h#L27

Each FST-file gets a header where the first 4 bytes are LTTB, followed by a
compressed integer denoting which features are used in this FST. Currently
this value is 0, because there are no FST-global features.

Each transducer also gets a header, where the first 4 bytes are LTTD, also
followed by a feature bit field, which if there are weights is set to 1.
Transducers without weights thus don't waste space on them.

This means we can add features, fluff, and versioning to the binary format
later with less pain.

Anyone got any bikeshedding for the format break?

-- Tino Didriksen
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to