Hi, I'm not sure if default trimming or default non-trimming should be the right decision (probably, to be safer, a default-trimming approach would be better as the starting point), but I want to bring up a few comments on your list.
* In the trimming disadvantages number 1, we're stating that we're OK having crappy monodixes because we *fix* that later on with trimming. I'm sure that's where we are now, but as a project that focuses a lot on provided free (as in speech) language resources that are later used for many other use cases, I don't feel comfortable with that status. I think we should aim to have as correct as possible dictionaries. And if we did that, disadvantage number 1 would be smaller (even if not disappearing completely). * Advantadge number 2 is the main reason I would want to skip trimming in some of the language pairs I'm more involved into. Hèctor can weight in, as he's one a lot of work in these pairs but I can give as an example the pairs por-cat and ita-cat. Basically, when we have a very good monodix (because we have a very good language pair using it), not-trimming could greatly improve much less developed language pairs using. As an example, Catalan monodix is one of the best maintained monodixes we have as a project (being mainly developed for spa-cat and, to a lesser extent, fra-cat and eng-cat). There are a ton of proper and common nouns in apertium-cat that will probably never be in por-cat or ita-cat bidixes, but just by not trimming them, transfer rules would benefit greatly, and translations would be much better because of that. So even if trimming is off by default, I'd push to have it enabled for this type of pairs. My two cents, -- < Xavi Ivars > < http://xavi.ivars.me >
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff