[Moses-support] Second call for participation: MT4All Unsupervised MT Shared Task at SIGUL 2022

Gorka Labaka Fri, 01 Apr 2022 02:10:40 -0700


MT4All Unsupervised MT Shared Task


at SIGUL 2022

(24-25 June, Marseille)


SECOND CALL FOR PARTICIPATION

We invite you to participate in the first edition of the MT4AllUnsupervised Machine Translation Shared Task, hosted by the ELRA/ISCASpecial Interest Group on Under-Resourced Languages Workshop (SIGUL2022). Papers on the task will be published as part of the Proceedings.

Invitation to Participate – Expression of Interest<https://docs.google.com/forms/d/1tllq0jWhcKwMHgPtRCA4aLkgLDuN8JlZG7Vp4TqcNQ0>.



TASK DESCRIPTION

For this Shared task we will leverage the resources generated by therecently finished CEF project MT4All , with the aim of exploringunsupervised MT techniques based only on monolingual corpora. In thecourse of the project, the following novel datasets were created: 18monolingual corpora for specific languages and domains, 12 bilingualdictionaries and translation models, and 10 annotated datasets forevaluation. Most of them will be used in the present Shared task.

The task is divided into three separate subtasks, each one covering aspecific domain and set of languages.


 *

   Subtask 1: Unsupervised translation from English to Ukrainian,
   Georgian and Kazakh in the Legal domain.

 *

   Subtask 2: Unsupervised translation from English to Finnish,
   Latvian, and Norwegian Bokmål in the Financial domain.

 *

   Subtask 3: Unsupervised translation from English to German,
   Norwegian Bokmål, and Spanish in the Customer support domain.

In this Shared task, we are interested in how the in-domain monolingualdata that we will provide can be leveraged by creating a purelyunsupervised machine translation model, either by


 *

   training an unsupervised model from scratch, or

 *

   adding value to an existing pre-trained model, on the condition that

     o

       it has been trained on monolingual datasets

     o

       it has not been fine-tuned with any parallel data

     o

       it is publicly accessible from the HuggingFace repository

Although we exclude the possibility of fine-tuning the models with anyexisting parallel data, we allow making use of the bilingual resourcescreated in the framework of MT4All using purely unsupervised technologies.

As additional monolingual data, we allow the use of any monolingualOscar dataset, only.


IMPORTANT DATES

 *

   Training data release10.03.2022

 *

   Test sets release25.04.2022

 *

   Results deadline02.05.2022

 *

   Paper submission deadline16.05.2022

 *

   Acceptance notice30.05.2022

 *

   Camera ready13.06.2022

 *

   Workshop starts24.06.2022

Please visit the website for more details:https://sigul-2022.ilc.cnr.it/mt4all-shared-task/<https://sigul-2022.ilc.cnr.it/mt4all-shared-task/>

If you have any comments and/or questions, do not hesitate to contactksenia.kharitonova at bsc.es <http://bsc.es/>.

_______________________________________________
UNSUBSCRIBE from this page: http://mailman.uib.no/options/corpora
Corpora mailing list
corp...@uib.no
https://mailman.uib.no/listinfo/corpora

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
https://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Second call for participation: MT4All Unsupervised MT Shared Task at SIGUL 2022

Reply via email to