Missatge de Hèctor Alòs i Font <hectora...@gmail.com> del dia dg., 18 d’oct. 2020 a les 7:50:
> Xavi, I am impressed that you could in Softcatalà get enough bilingual > texts to create an English-Catalan neural translator. Congratulations on > the results! I am curious to know how big the corpus you collected has > been, as well as from which sources to ensure the quality of the > translations. > The corpora used can be found here: https://github.com/Softcatala/en-ca-corpus One of the corpora is an automatic translation of the English-Spanish Europarl corpus using Spanish-Catalan Apertium. It has proved good enough to train the neural translator. The neural translator could be improved with better corpora and using more powerful hardware in the training. The vocabulary size is limited because of hardware constraints. > I'd maybe add that probably it would not be possible to collect such a > corpus for Valencian Catalan, so I guess we face in this neural translator > a typical problem with lesser-user languages/varieties. If it is ever > considered necessary to generate Valencian, this will have to be done by > translating it into "reference" Catalan and then automatically adapting it. > In fact the same happens for the many flavours we currently have in > Apertium for Catalan, both Valencian and "Catalonian". > It is easy to make a Catalan>Valencian adapter (a few lines of code using LanguageTool). Not so easy the other way around because some Valencian verbal forms are ambiguous. > By the way, is Softcatalà trying to create a neural translator for the > Spanish-Catalan pair? > Not yet. Neural translators require a lot of hardware resources, in training and in production. We could not support the current volume of Spanish-Catalan translations with neural translation. Jaume Ortolà
_______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff