[Corpora-List] CoCo4MT 2023 @ MT Summit 2nd Call for Papers

John Ortega via Corpora Tue, 20 Jun 2023 09:50:17 -0700

The Second Workshop on Corpus Generation and Corpus Augmentation forMachine Translation (CoCo4MT) @MT-SUMMIT XIX

The 19th Machine Translation Summit
Sep 4-8, 2023, Macau SAR, China
https://sites.google.com/view/coco4mt


SCOPE

It is a well-known fact that machine translation systems, especiallythose that use deep learning, require massive amounts of data. Severalresources for languages are not available in their human-created format.Some of the types of resources available are monolingual, multilingual,translation memories, and lexicons. Those types of resources aregenerally created for formal purposes such as parliamentary collectionswhen parallel and more informal situations when monolingual. The qualityand abundance of resources including corpora used for formal reasons isgenerally higher than those used for informal purposes. Additionally,corpora for low-resource languages, languages with less digitalresources available, tends to be less abundant and of lower quality.

CoCo4MT is a workshop centered around research that focuses on manualand automatic corpus creation, cleansing, and augmentation techniquesspecifically for machine translation. We accept work that covers anylanguage (including sign language) but we are specifically interested inthose submissions that explicitly report on work with languages withlimited existing resources (low-resource languages). Since techniquesfrom high-resource languages are generally statistical in nature andcould be used as generic solutions for any language, we welcomesubmissions on high-resource languages also.

CoCo4MT aims to encourage research on new and undiscovered techniques.We hope that the methods presented at this workshop will lead to thedevelopment of high-quality corpora that will in turn lead tohigh-performing MT systems and new dataset creation for multiplecorpora. We hope that submissions will provide high-quality corpora thatare available publicly for download and can be used to increase machinetranslation performance thus encouraging new dataset creation formultiple languages that will, in turn, provide a general workshop toconsult for corpora needs in the future. The workshop’s success will bemeasured by the following key performance indicators:

- Promotes the ongoing increase in quality of machine translationsystems when measured by standard measurements,- Provides a meeting place for collaboration from several research areasto increase the availability of commonly used corpora and new corpora,- Drives innovation to address the need for higher quality and abundanceof low-resource language data.


Topics of interest include:

- Difficulties with using existing corpora (e.g., politicalconsiderations or domain limitations) and their effects on final MTsystems,

- Strategies for collecting new MT datasets (e.g., via crowdsourcing),
- Data augmentation techniques,
- Data cleansing and denoising techniques,
- Quality control strategies for MT data,

- Exploration of datasets for pretraining or auxiliary tasks fortraining MT systems.



SHARED TASK

To encourage research on corpus construction for low-resource machinetranslation, we introduce a shared task focused on identifyinghigh-quality instances that should be translated into a targetlow-resource language. Participants are provided access to multi-waycorpora in the high-resource languages of English, Spanish, German,Korean, and Indonesian, and using these, are required to identifybeneficial instances, that when translated into the low-resourcelanguages of Cebuano, Gujarati, and Burmese, lead to high-performing MTsystems. More details on data, evaluation and submission can be found onthe website (https://sites.google.com/view/coco4mt/shared-task) or byemailing coco4mt-shared-t...@googlegroups.com.


SUBMISSION INFORMATION

CoCo4MT will accept research, review, or position papers. The length ofeach paper should be at least four (4) and not exceed ten (10) pages,plus unlimited pages for references. Submissions should be formattedaccording to the official MT Summit 2023 style templates(https://www.overleaf.com/latex/templates/mt-summit-2023-template/knrrcnxhkqxd).Accepted papers will be published in the MT Summit 2023 proceedingswhich are included in the ACL Anthology and will be presented at theconference either orally or as a poster.

Submissions must be anonymized and should be made to the workshop usingthe Softconf conference management system(https://softconf.com/mtsummit2023/CoCo4MT). Scientific papers that havebeen or will be submitted to other venues must be declared as such, andmust be withdrawn from the other venues if accepted and published atCoCo4MT. The review will be double-blind.

We would like to encourage authors to cite papers written in ANYlanguage that are related to the topics, as long as both originalbibliographic items and their corresponding English translations areprovided.


Registration will be handled by the main conference. (To be announced)

IMPORTANT DATES

May 18, 2023  - Call for papers released
May 19, 2023  - Shared task release of train, dev and test data
May 25, 2023  - Shared task release of baselines
June 5, 2023  - Second call for papers
June 20, 2023 - Third and final call for papers
July 05, 2023 - Paper submissions due
July 05, 2023 - Shared task deadline to submit results
July 20, 2023 - Notification of acceptance
July 20, 2023 - Shared task system description papers due
July 31, 2023 - Camera-ready due
September 4-5, 2023 - CoCo4MT workshop

CONTACT

CoCo4MT Workshop Organizers:
coco4mt-2023-organiz...@googlegroups.com

CoCo4MT Shared Task Organizers:
coco4mt-shared-t...@googlegroups.com

ORGANIZING COMMITTEE (listed alphabetically)

Ananya Ganesh    University of Colorado Boulder
Constantine Lignos     Brandeis University
John E. Ortega     Northeastern University
Jonne Sälevä     Brandeis University
Katharina Kann     University of Colorado Boulder
Marine Carpuat     University of Maryland
Rodolfo Zevallos    Universitat Pompeu Fabra
Shabnam Tafreshi     University of Maryland
William Chen     Carnegie Mellon University

PROGRAM COMMITTEE (listed alphabetically tentative)

Abteen   Ebrahimi     University of Colorado Boulder
Adelani  David     Saarland University
Ananya  Ganesh     University of Colorado Boulder
Alberto Poncelas     ADAPT Centre at Dublin City University
Anna Currey     Amazon
Amirhossein Tebbifakhr     University of Trento
Atul Kr. Ojha     National University of Ireland Galway
Ayush Singh     Northeastern University
Barrow Haddow University of Edinburgh
Bharathi Raja Chakravarthi     National University of Ireland Galway
Beatrice Savoldi     University of Trento
Bogdan Babych     Heidelberg University
Briakou  Eleftheria     University of Maryland
Constantine Lignos     Brandeis University
Dossou  Bonaventure     Mila Quebec AI Institute
Duygu Ataman     New York University
Eleftheria Briakou     University of Maryland
Eleni Metheniti     Université Toulosse - Paul Sabatier
Jasper Kyle Catapang     University of Birmingham
John E. Ortega     Northeastern University
Jonne Sälevä     Brandeis University
Kalika Bali     Microsoft
Katharina Kann University of Colorado Boulder
Kochiro Watanabe     The University of Tokyo
Koel Dutta Chowdhury     Saarland University
Liangyou Li     Huawei
Manuel  Mager     University of Stuttgart
Maria Art Antonette Clariño     University of the Philippines Los Baños
Marine Carpuat     University of Maryland
Mathias Müller     University of Zurich
Nathaniel Oco     De La Salle University
Niu  Xing     Amazon
Patrick Simianer     Lilt
Rico Sennrich     University of Zurich
Rodolfo Zevallos     Universitat Pompeu Fabra
Sangjee Dondrub     Qinghai Normal University
Santanu Pal     Saarland University
Sardana Ivanova     University of Helsinki
Shantipriya Parida     Silo AI
Shiran Dudy Northeastern University
Surafel Melaku Lakew     Amazon
Tommi A Pirinen     University of Tromsø
Valentin Malykh     Moscow Institute of Physics and Technology
Xing Niu     Amazon
Xu  Weijia     University of Maryland
_______________________________________________
Corpora mailing list -- corpora@list.elra.info
https://list.elra.info/mailman3/postorius/lists/corpora.list.elra.info/
To unsubscribe send an email to corpora-le...@list.elra.info

[Corpora-List] CoCo4MT 2023 @ MT Summit 2nd Call for Papers

Reply via email to