Missatge de Hèctor Alòs i Font <hectora...@gmail.com> del dia dg., 18
d’oct. 2020 a les 7:50:

> Xavi, I am impressed that you could in Softcatalà get enough bilingual
> texts to create an English-Catalan neural translator. Congratulations on
> the results! I am curious to know how big the corpus you collected has
> been, as well as from which sources to ensure the quality of the
> translations.
>

The corpora used can be found here:
https://github.com/Softcatala/en-ca-corpus

One of the corpora is an automatic translation of the English-Spanish
Europarl corpus using Spanish-Catalan Apertium. It has proved good enough
to train the neural translator.

The neural translator could be improved with better corpora and using more
powerful hardware in the training. The vocabulary size is limited because
of hardware constraints.


> I'd maybe add that probably it would not be possible to collect such a
> corpus for Valencian Catalan, so I guess we face in this neural translator
> a typical problem with lesser-user languages/varieties. If it is ever
> considered necessary to generate Valencian, this will have to be done by
> translating it into "reference" Catalan and then automatically adapting it.
> In fact the same happens for the many flavours we currently have in
> Apertium for Catalan, both Valencian and "Catalonian".
>

It is easy to make a Catalan>Valencian adapter (a few lines of code using
LanguageTool). Not so easy the other way around because some Valencian
verbal forms are ambiguous.


> By the way, is Softcatalà trying to create a neural translator for the
> Spanish-Catalan pair?
>

Not yet. Neural translators require a lot of hardware resources, in
training and in production. We could not support the current volume of
Spanish-Catalan translations with neural translation.

Jaume Ortolà
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to