El dl., 19 d’oct. 2020, 22:10, Jaume Ortolà i Font <jaumeort...@gmail.com>
va escriure:

> No. The corpus was not postedited. It has 2 million sentences. I tried to
> get a Catalan translation as good as possible. What I did was:
> - Try to cover all relevant vocabulary: all non-capitalized words that
> appear at least 4-5 times in the corpus.
> - Fix spelling and grammar errors in the Spanish corpus using LanguageTool
> (for example, missing diacritics or agreement errors). The Spanish text is
> worse than expected.
> - Fix many common errors in spa-cat Apertium translation.
> This work is not complete. To finish it, we'll need probably 3-4 months of
> full-time work or more. Anyway, a neural translator can work even if a
> percentage of the corpus is not perfect.

Sorry, I said postedited, but I meant "tweaked". Jaume (who did most, if
not all, the work) has explained it.
Apertium-stuff mailing list

Reply via email to