Hi,

I'm not sure if default trimming or default non-trimming should be the
right decision (probably, to be safer, a default-trimming approach would be
better as the starting point), but I want to bring up a few comments on
your list.

* In the trimming disadvantages number 1, we're stating that we're OK
having crappy monodixes because we *fix* that later on with trimming. I'm
sure that's where we are now, but as a project that focuses a lot on
provided free (as in speech) language resources that are later used for
many other use cases, I don't feel comfortable with that status. I think we
should aim to have as correct as possible dictionaries. And if we did that,
disadvantage number 1 would be smaller (even if not disappearing
completely).

* Advantadge number 2 is the main reason I would want to skip trimming in
some of the language pairs I'm more involved into. Hèctor can weight in, as
he's one a lot of work in these pairs but I can give as an example the
pairs por-cat and ita-cat. Basically, when we have a very good
monodix (because we have a very good language pair using it), not-trimming
could greatly improve much less developed language pairs using. As an
example, Catalan monodix is one of the best maintained monodixes we have as
a project (being mainly developed for spa-cat and, to a lesser extent,
fra-cat and eng-cat). There are a ton of proper and common nouns in
apertium-cat that will probably never be in por-cat or ita-cat bidixes, but
just by not trimming them, transfer rules would benefit greatly, and
translations would be much better because of that. So even if trimming is
off by default, I'd push to have it enabled for this type of pairs.

My two cents,

-- 
< Xavi Ivars >
< http://xavi.ivars.me >
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to