Hey guys, As part of the project to eliminate trimming, I had to come up with a way to include the surface form in the lexical unit and hence modifying the apertium stream format. To do this I would have to modify the parsers of every program in the pipeline, and if that has to happen, we discussed on the IRC that *it might be a good idea to modify the stream in such a way that we can include an arbitrary amount of information in a lexical unit, and each program can use whatever information they need.*
The current information in the lexical unit would be primary information, and then we would have optional secondary information which could contain the surface form, but also literally anything you can think of (case, sentiment, pragmatic info, etc.). This would open up a lot of possibilities for each program, and it would strengthen the apertium stream format considerably. We discussed several possible syntax for this new stream format, and the one that seems the best is something like this: ^potato<n><pl><case:aa><sf:potatoes><other-prefix:other-value>/patata<n><f><pl><more:other>$ This doesn't mess with the current stream format too much. The number of tags is already arbitrary so that helps. The secondary tags contain a ":" that would help distinguish them from primary tags. To implement this a modification would still be needed to all the parsers but the benefits far outweigh the amount of work needed to pull this off. Since this would be a major fundamental change to Apertium, I request you all to contribute with your views, any pros, cons, suggestions - to the idea, to the syntax, anything. Thanks and Regards, Tanmai Khanna -- *Khanna, Tanmai*
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff