El dl., 19 d’oct. 2020, 22:10, Jaume Ortolà i Font <jaumeort...@gmail.com>
va escriure:

>
> No. The corpus was not postedited. It has 2 million sentences. I tried to
> get a Catalan translation as good as possible. What I did was:
>
> - Try to cover all relevant vocabulary: all non-capitalized words that
> appear at least 4-5 times in the corpus.
> - Fix spelling and grammar errors in the Spanish corpus using LanguageTool
> (for example, missing diacritics or agreement errors). The Spanish text is
> worse than expected.
> - Fix many common errors in spa-cat Apertium translation.
>
> This work is not complete. To finish it, we'll need probably 3-4 months of
> full-time work or more. Anyway, a neural translator can work even if a
> percentage of the corpus is not perfect.
>


Sorry, I said postedited, but I meant "tweaked". Jaume (who did most, if
not all, the work) has explained it.
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to