El dl., 19 d’oct. 2020, 22:10, Jaume Ortolà i Font <jaumeort...@gmail.com> va escriure:
> > No. The corpus was not postedited. It has 2 million sentences. I tried to > get a Catalan translation as good as possible. What I did was: > > - Try to cover all relevant vocabulary: all non-capitalized words that > appear at least 4-5 times in the corpus. > - Fix spelling and grammar errors in the Spanish corpus using LanguageTool > (for example, missing diacritics or agreement errors). The Spanish text is > worse than expected. > - Fix many common errors in spa-cat Apertium translation. > > This work is not complete. To finish it, we'll need probably 3-4 months of > full-time work or more. Anyway, a neural translator can work even if a > percentage of the corpus is not perfect. > Sorry, I said postedited, but I meant "tweaked". Jaume (who did most, if not all, the work) has explained it.
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff