I invite all organisers and participants to filter the parallel datasets
with ModelFront with just a few clicks.

console.modelfront.com

Even just running the evaluation can be revealing.  For example, the
winning submission in the Russian: English biomedical task at WMT2020
identified systemic misalignments and then trained a model to realign.  And
of course when there's bad stuff in the eval set, BLEU won't correlate with
human eval.

You can read more at modelfront.com/filter.

You can also find open evaluations in the Explore tab, and filter and
download.  For example, the TAUS Corona Crisis Corpus for English:
Chinese.  We'll be adding more open corpora with Asian languages in the
coming days.

You can add your datasets there too.  Just ask us.  Every new ModelFront
account has free credit for 1M characters (10K+ lines).  If you are doing
open research we will support you as much as we can.

Greetings from West Asia,
Adam




On Tue, 19 Jan 2021, 09:44 Toshiaki Nakazawa, <
nakaz...@logos.t.u-tokyo.ac.jp> wrote:

> Dear all EAMT members,
>
> I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
> call for participation for the MT shared tasks and research papers to
> the 8th Workshop on Asian Translation (WAT2021), workshop of
> ACL-IJCNLP 2021. Those who are working on machine translation, please
> join us.
>
> IMPORTANT DATES
> ---------------
>
> April 26, 2021 – Translation Task Submission Deadline
> April 26, 2021 – Research Paper Submission Deadline
> May 28, 2021 – Notification of Acceptance for Research Papers
> May 17, 2021 – System Description Paper Submission Deadline
> May 28, 2021 – Review Feedback of System Description Papers
> June 7, 2021 – Camera-ready Deadline (both Research and System
> Description Papers)
> August 5-6, 2021 – 2020 Workshop Dates (one of these days)
>
> * All deadlines are calculated at 11:59PM UTC-12
>
> Best regards,
>
> ---------------------------------------------------------------------------
>                        WAT2021
>        (The 8th Workshop on Asian Translation)
>         in conjunction with ACL-IJCNLP2021
>         http://lotus.kuee.kyoto-u.ac.jp/WAT/
>         August 5-6, 2021, Bangkok, Thailand
>
> Following the success of the previous WAT workshops (WAT2014 --
> WAT2020), WAT2021 will bring together machine translation researchers
> and users to try, evaluate, share and discuss brand-new ideas about
> machine translation. For the 8th WAT, we will include the following
> new translation tasks:
>
> * MultiIndicMT: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi,
> Oriya, Punjabi, Tamil, Telugu <--> English Multilingual Task
> * Malayalam Visual Genome: English --> Malayalam Multimodal Task
> * Ambiguous MS COCO: English <--> Japanese Multimodal Task
> * ArEnMulti30K: English <--> Arabic Multimodal Task
> * Restricted Translation Task
>
> together with the following continuing tasks:
>
> * English/Chinese <--> Japanese scientific paper task
> * English/Chinese/Korean <--> Japanese patent task
> * English <--> Japanese newswire task
> * Russian <--> Japanese news commentary task
> * Myanmar <--> English mixed-domain task
> * Khmer <--> English mixed-domain task
> * English <--> Japanese (Flickr30kEnt-JP) multimodal translation task
> * English <--> Hindi, Thai, Malay, Indonesian NICT-SAP multilingual
> multi-domain task
> * English --> Hindi multimodal task
>
> In addition to the shared tasks, the workshop will also feature
> scientific papers on topics related to the machine translation,
> especially for Asian languages. Topics of interest include, but are
> not limited to:
>
> - analysis of the automatic/human evaluation results in the past WAT
> workshops
> - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
> machine translation
> - Asian language processing
> - incorporating linguistic information into machine translation
> - decoding algorithms
> - system combination
> - error analysis
> - manual and automatic machine translation evaluation
> - machine translation applications
> - quality estimation
> - domain adaptation
> - machine translation for low resource languages
> - language resources
>
> ************************* IMPORTANT NOTICE *************************
> Participants of the previous workshop are also required to sign up to
> WAT2021
> ********************************************************************
>
> TRANSLATION TASKS
> -----------------
>
> The task is to improve the text translation quality for scientific
> papers and patent documents. Participants choose any of the subtasks
> in which they would like to participate and translate the test data
> using their machine translation systems. The WAT organizers will
> evaluate the results submitted using automatic evaluation and human
> evaluation. We will also provide a baseline machine translation.
>
> Tasks:
>   * Document-level translation tasks:
>     - ASPEC+ParaNatCom: English --> Japanese Scientific Paper
>     - BSD Corpus: English <--> Japanese Business Scene Dialogue
>     - JIJI Corpus: English <--> Japanese Newswire
>     - NICT-SAP: Hindi/Thai/Malay/Indonesian <--> English
>   * Multimodal translation tasks:
>     - Hindi Visual Genome: English --> Hindi
>     - Malayalam Visual Genome: English --> Malayalam (NEW!!)
>     - Flickr30kEnt-JP: English <--> Japanese
>     - Ambiguous MS COCO: English <--> Japanese (NEW!!)
>     - ArEnMulti30K: English <--> Arabic (NEW!!)
>   * Indic tasks:
> MultiIndicMT:
> Bengali/Gujarati/Hindi/Kannada/Malayalam/Marathi/Odia/Punjabi/Tamil/Telugu
> <--> English (NEW and expanded training data and evaluation sets!!)
>   * ALT+ tasks:
>     - +UCSY: Myanmar (Burmese) <--> English
>     - +ECCC: Khmer <--> English
>   * Patent task:
>     - JPC2: English/Chinese/Korean <--> Japanese
>   * News Commentary task:
>     - JaRuNC: Japanese <--> Russian
>   * Restricted Translation task
>
> Dataset:
>
> * Scientific paper
>
> WAT uses ASPEC for the dataset including training, development,
> development test and test data. Participants of the scientific papers
> subtask must get a copy of ASPEC by themselves. ASPEC consists of
> approximately 3 million Japanese-English parallel sentences from paper
> abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
> paper excerpts (ASPEC-JC)
>
> * Patent
>
> WAT uses JPO Patent Corpus, which is constructed by Japan Patent
> Office (JPO). This corpus consists of 1 million English-Japanese
> parallel sentences, 1 million Chinese-Japanese parallel sentences, and
> 1 million Korean-Japanese parallel sentences from patent description
> with four categories. Participants of patent tasks are required to get
> it on WAT2019 site of JPO Patent Corpus.
>
>  - English/Chinese/Korean <--> Japanese:
>  These tasks evaluate performance of a translation model similarly as
> the other translation tasks. Differing from the previous tasks at
> WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
> of (a) patent documents published between 2011 and 2013, which were
> used in the past years' WAT, and (b) ones published between 2016 and
>  2017 for each language pair. We will also evaluate performance of the
>  section (a) so as to compare systems submitted in the past years'
>  WAT.
>
>  - Chinese -> Japanese expression pattern task:
>  This task evaluates performance of a translation model for each
> predifined category of expression patterns, which corresponds to
> title of invention (TIT), abstract (ABS), scope of claim (CLM) or
> description (DES). Test set of this task consists of sentences each
> of which is annotated with a corresponding category of expression
> patterns.
>
> * Newswire
>
> WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
> collaboration with the National Institute of Information and
> Communications Technology (NICT). This corpus consists of a
> Japanese-English news corpus of 200K parallel sentences, from Jiji
> Press news with various categories. At WAT2021, the organizers newly
> added a new document-level translation testset, which consists of
> manually filtered test and reference sentences and document-level
> context of the test sentences. Participants of the newswire subtask
> are required to get it on WAT2021 site of JIJI Corpus.
>
> * News Commentary
>
> WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
> from the News Commentary domain to study extremely low resource
> situations for distant language pairs. The parallel corpus contains
> around 12,000 lines.  This year, we invite participants to utilize any
> existing monolingual or parallel corpora from WMT 2020 in addition to
> those listed on the WAT website. In particular, solutions focusing on
> monolingual pretraining and multilingualism are encouraged.
>
> * IT and Wikinews
>
>  - Hindi/Thai/Malay/Indonesian <--> English
>
> In collaboration with SAP and NICT, WAT is organising a pilot
> translation task to/from English to/from Hindi, Thai, Malay and
> Indonesian. The evaluation data belongs to the IT domain (Software
> Documentation) and Wikinews domain (Asian Language Treebank).
> Participants will be expected to train systems and submit translations
> for all language pairs (to and from English) and both domains using
> any existing monolingual or parallel data. Given the growing focus on
> a universal translation model for multiple languages and domains, WAT
> encourages a single multilingual and multi-domain model for all
> language pairs and both domains (IT as well as Wikinews). Additional
> details will be given on the WAT 2021 website.
>
> * Mixed domain
>
>  - Myanmar (Burmese) <--> English
>  WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
>  the ALT corpus are use as training data, which are around 220,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
>  - Khmer <--> English
>  WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
>  the ALT corpus are used as training data, which are around 120,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
> * Indic
>
>
> - Indian language <--> English multilingual translation task. This
> task is a successor to the 2018 and the 2020 tasks with major
> improvements. . There has been an increase in the available datasets
> for Indian languages in the past few years along with major advances
> in multilingual learning. The task will involve training a
> multilingual  model for 10 Indian languages to English (and
> vice-versa) translation. The goal is to encourage exploration of
> methods which utilize multilingualism and language relatedness to
> improve translation quality for low-resource languages while having a
> single, compact translation model.  The evaluation set is 11-way
> parallel enabling the potential evaluation of non-English centric
> language pairs.
>
> * Multimodal
> Given the growing interest in multimodal NLP and the warm response
> from the participants for the “WAT 2019 and 2020 Multimodal
> Translation Tasks”, WAT will evaluate the following multimodal tasks:
>
>  - English --> Hindi Multimodal  (Visual Genome)  WAT will continue
> organizing the multimodal English -->  Hindi translation task where
> the input will be text and an Image and the output will be a caption
> (text). The training set contains around 30,000 segments. Additional
> details will be given on the task website.
>
>  - English --> Malayalam Multimodal  (Visual Genome)  WAT will
> organize a new multimodal English -->  Malayalam translation task
> where the input will be text and an Image and the output will be a
> caption (text). The training set contains around 30,000 segments.
> Additional details will be given on the task website.
>
>  - Japanese <--> English Multimodal (Flickr30kEnt-JP)
>  WAT  will continue the Flickr30kEnt-JP task using the corpus with the
> same name for this task. https://github.com/nlab-mpg/Flickr30kEnt-JP
>
>  - Arabic <--> English Multimodal  (ArEnMulti30K)  WAT will organize a
> new multimodal Arabic <--> English translation task where the input
> will be text and an Image and the output will be a caption (text). The
> training set contains around 30,000 segments. Additional details will
> be given on the task website.
>
>  - Japanese <--> English Multimodal  (Ambiguous MS COCO)  WAT will
> organize an additional multimodal Japanese <--> English translation
> task where the evaluation set, Ambiguous MS COCO, will focus on
> translation of ambiguous words and sentences.  Along with the
> Flickr30kEnt-JP dataset, the MS COCO English data may also be used.
> Additional details will be given on the task website.
>
> EVALUATION
> ----------
>
> Automatic evaluation:
> We are providing an automatic evaluation server. It is free for
> everyone, but you need to create an account for evaluation. Just
> showing the list of evaluation results does not require an account.
>
> Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2021/
> Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/
>
> Human evaluation:
> Both crowdsourcing evaluation and JPO adequacy evaluation will be
> carried out for selected subtasks and selected submitted systems (the
> details will be announced later).
>
> ORGANIZERS
> ----------
>
> - Toshiaki Nakazawa, The University of Tokyo, Japan [GENERAL,
> ASPEC+ParaNatCom, BSD]
> - Hideki Nakayama, The University of Tokyo, Japan [Flickr30kEnt-JP]
> - Isao Goto, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
> - Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
> - Chenchen Ding, National Institute of Information and Communications
> Technology (NICT), Japan [GENERAL, ALT+UCSY, ALT+ECCC]
> - Raj Dabre, National Institute of Information and Communications
> Technology (NICT), Japan [MultiIndicMT, ALT+SAP, Global voices]
> - Anoop Kunchookuttan, Microsoft AI and Research, India [MultiIndicMT]
> - Shohei Higashiyama, National Institute of Information and
> Communications Technology (NICT), Japan [JPC]
> - Hiroshi Manabe, National Institute of Information and Communications
> Technology (NICT), Japan [GENERAL]
> - Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
> [ALT+UCSY]
> - Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
> [Hindi Visual Genome, Malayalam Visual Genome]
> - Ondřej Bojar, Charles University, Prague, Czech Republic [Hindi
> Visual Genome, Malayalam Visual Genome]
> - Chenhui Chu, Kyoto University, Japan [Ambiguous MS COCO]
> - Mahmoud Al-Ayyoub, Jordan University of Science and Technology,
> Jordan [Multi30K]
> - Ali Fadel, Jordan University of Science and Technology, Jordan [Multi30K]
> - Roweida Mohammed, Jordan University of Science and Technology,
> Jordan [Multi30K]
> - Inad Aljarrah, Jordan University of Science and Technology, Jordan
> [Multi30K]
> - Akiko Eriguchi, Microsoft, USA [Restricted Translation]
> - Yusuke Oda, LegalForce, Japan [Restricted Translation]
> - Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST),
> Japan [GENERAL]
> - Sadao Kurohashi, Kyoto University, Japan [GENERAL]
> - Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP),
> India [GENERAL]
>
> CONTACT
> -------
>
> wat-organi...@googlegroups.com
>
> I'm Toshiaki Nakazawa from The University of Tokyo, Japan. This is the
> call for participation for the MT shared tasks and research papers to
> the 8th Workshop on Asian Translation (WAT2021), workshop of
> ACL-IJCNLP 2021. Those who are working on machine translation, please
> join us.
>
> IMPORTANT DATES
> ---------------
>
> April 26, 2021 – Translation Task Submission Deadline
> April 26, 2021 – Research Paper Submission Deadline
> May 28, 2021 – Notification of Acceptance for Research Papers
> May 17, 2021 – System Description Paper Submission Deadline
> May 28, 2021 – Review Feedback of System Description Papers
> June 7, 2021 – Camera-ready Deadline (both Research and System
> Description Papers)
> August 5-6, 2021 – 2020 Workshop Dates (one of these days)
>
> * All deadlines are calculated at 11:59PM UTC-12
>
> Best regards,
>
> ---------------------------------------------------------------------------
>                        WAT2021
>        (The 8th Workshop on Asian Translation)
>         in conjunction with ACL-IJCNLP2021
>         http://lotus.kuee.kyoto-u.ac.jp/WAT/
>         August 5-6, 2021, Bangkok, Thailand
>
> Following the success of the previous WAT workshops (WAT2014 --
> WAT2020), WAT2021 will bring together machine translation researchers
> and users to try, evaluate, share and discuss brand-new ideas about
> machine translation. For the 8th WAT, we will include the following
> new translation tasks:
>
> * MultiIndicMT: Bengali, Gujarati, Hindi, Kannada, Malayalam, Marathi,
> Oriya, Punjabi, Tamil, Telugu <--> English Multilingual Task
> * Malayalam Visual Genome: English --> Malayalam Multimodal Task
> * Ambiguous MS COCO: English <--> Japanese Multimodal Task
> * ArEnMulti30K: English <--> Arabic Multimodal Task
> * Restricted Translation Task
>
> together with the following continuing tasks:
>
> * English/Chinese <--> Japanese scientific paper task
> * English/Chinese/Korean <--> Japanese patent task
> * English <--> Japanese newswire task
> * Russian <--> Japanese news commentary task
> * Myanmar <--> English mixed-domain task
> * Khmer <--> English mixed-domain task
> * English <--> Japanese (Flickr30kEnt-JP) multimodal translation task
> * English <--> Hindi, Thai, Malay, Indonesian NICT-SAP multilingual
> multi-domain task
> * English --> Hindi multimodal task
>
> In addition to the shared tasks, the workshop will also feature
> scientific papers on topics related to the machine translation,
> especially for Asian languages. Topics of interest include, but are
> not limited to:
>
> - analysis of the automatic/human evaluation results in the past WAT
> workshops
> - word-/phrase-/syntax-/semantics-/rule-based, neural and hybrid
> machine translation
> - Asian language processing
> - incorporating linguistic information into machine translation
> - decoding algorithms
> - system combination
> - error analysis
> - manual and automatic machine translation evaluation
> - machine translation applications
> - quality estimation
> - domain adaptation
> - machine translation for low resource languages
> - language resources
>
> ************************* IMPORTANT NOTICE *************************
> Participants of the previous workshop are also required to sign up to
> WAT2021
> ********************************************************************
>
> TRANSLATION TASKS
> -----------------
>
> The task is to improve the text translation quality for scientific
> papers and patent documents. Participants choose any of the subtasks
> in which they would like to participate and translate the test data
> using their machine translation systems. The WAT organizers will
> evaluate the results submitted using automatic evaluation and human
> evaluation. We will also provide a baseline machine translation.
>
> Tasks:
>   * Document-level translation tasks:
>     - ASPEC+ParaNatCom: English --> Japanese Scientific Paper
>     - BSD Corpus: English <--> Japanese Business Scene Dialogue
>     - JIJI Corpus: English <--> Japanese Newswire
>     - NICT-SAP: Hindi/Thai/Malay/Indonesian <--> English
>   * Multimodal translation tasks:
>     - Hindi Visual Genome: English --> Hindi
>     - Malayalam Visual Genome: English --> Malayalam (NEW!!)
>     - Flickr30kEnt-JP: English <--> Japanese
>     - Ambiguous MS COCO: English <--> Japanese (NEW!!)
>     - ArEnMulti30K: English <--> Arabic (NEW!!)
>   * Indic tasks:
> MultiIndicMT:
> Bengali/Gujarati/Hindi/Kannada/Malayalam/Marathi/Odia/Punjabi/Tamil/Telugu
> <--> English (NEW and expanded training data and evaluation sets!!)
>   * ALT+ tasks:
>     - +UCSY: Myanmar (Burmese) <--> English
>     - +ECCC: Khmer <--> English
>   * Patent task:
>     - JPC2: English/Chinese/Korean <--> Japanese
>   * News Commentary task:
>     - JaRuNC: Japanese <--> Russian
>   * Restricted Translation task
>
> Dataset:
>
> * Scientific paper
>
> WAT uses ASPEC for the dataset including training, development,
> development test and test data. Participants of the scientific papers
> subtask must get a copy of ASPEC by themselves. ASPEC consists of
> approximately 3 million Japanese-English parallel sentences from paper
> abstracts (ASPEC-JE) and approximately 0.7 million Japanese-Chinese
> paper excerpts (ASPEC-JC)
>
> * Patent
>
> WAT uses JPO Patent Corpus, which is constructed by Japan Patent
> Office (JPO). This corpus consists of 1 million English-Japanese
> parallel sentences, 1 million Chinese-Japanese parallel sentences, and
> 1 million Korean-Japanese parallel sentences from patent description
> with four categories. Participants of patent tasks are required to get
> it on WAT2019 site of JPO Patent Corpus.
>
>  - English/Chinese/Korean <--> Japanese:
>  These tasks evaluate performance of a translation model similarly as
> the other translation tasks. Differing from the previous tasks at
> WAT2015, WAT2016 and WAT2017, new test sets of these tasks consists
> of (a) patent documents published between 2011 and 2013, which were
> used in the past years' WAT, and (b) ones published between 2016 and
>  2017 for each language pair. We will also evaluate performance of the
>  section (a) so as to compare systems submitted in the past years'
>  WAT.
>
>  - Chinese -> Japanese expression pattern task:
>  This task evaluates performance of a translation model for each
> predifined category of expression patterns, which corresponds to
> title of invention (TIT), abstract (ABS), scope of claim (CLM) or
> description (DES). Test set of this task consists of sentences each
> of which is annotated with a corresponding category of expression
> patterns.
>
> * Newswire
>
> WAT uses JIJI Corpus, which is constructed by Jiji Press Ltd. in
> collaboration with the National Institute of Information and
> Communications Technology (NICT). This corpus consists of a
> Japanese-English news corpus of 200K parallel sentences, from Jiji
> Press news with various categories. At WAT2021, the organizers newly
> added a new document-level translation testset, which consists of
> manually filtered test and reference sentences and document-level
> context of the test sentences. Participants of the newswire subtask
> are required to get it on WAT2021 site of JIJI Corpus.
>
> * News Commentary
>
> WAT uses a manually aligned and cleaned Japanese <--> Russian corpus
> from the News Commentary domain to study extremely low resource
> situations for distant language pairs. The parallel corpus contains
> around 12,000 lines.  This year, we invite participants to utilize any
> existing monolingual or parallel corpora from WMT 2020 in addition to
> those listed on the WAT website. In particular, solutions focusing on
> monolingual pretraining and multilingualism are encouraged.
>
> * IT and Wikinews
>
>  - Hindi/Thai/Malay/Indonesian <--> English
>
> In collaboration with SAP and NICT, WAT is organising a pilot
> translation task to/from English to/from Hindi, Thai, Malay and
> Indonesian. The evaluation data belongs to the IT domain (Software
> Documentation) and Wikinews domain (Asian Language Treebank).
> Participants will be expected to train systems and submit translations
> for all language pairs (to and from English) and both domains using
> any existing monolingual or parallel data. Given the growing focus on
> a universal translation model for multiple languages and domains, WAT
> encourages a single multilingual and multi-domain model for all
> language pairs and both domains (IT as well as Wikinews). Additional
> details will be given on the WAT 2021 website.
>
> * Mixed domain
>
>  - Myanmar (Burmese) <--> English
>  WAT uses UCSY Corpus and ALT Corpus. The UCSY corpus and a portion of
>  the ALT corpus are use as training data, which are around 220,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
>  - Khmer <--> English
>  WAT uses ECCC Corpus and ALT Corpus. The ECCC corpus and a portion of
>  the ALT corpus are used as training data, which are around 120,000
> lines of sentences and phrases. The development and test data are
> from the ALT corpus.
>
> * Indic
>
>
> - Indian language <--> English multilingual translation task. This
> task is a successor to the 2018 and the 2020 tasks with major
> improvements. . There has been an increase in the available datasets
> for Indian languages in the past few years along with major advances
> in multilingual learning. The task will involve training a
> multilingual  model for 10 Indian languages to English (and
> vice-versa) translation. The goal is to encourage exploration of
> methods which utilize multilingualism and language relatedness to
> improve translation quality for low-resource languages while having a
> single, compact translation model.  The evaluation set is 11-way
> parallel enabling the potential evaluation of non-English centric
> language pairs.
>
> * Multimodal
> Given the growing interest in multimodal NLP and the warm response
> from the participants for the “WAT 2019 and 2020 Multimodal
> Translation Tasks”, WAT will evaluate the following multimodal tasks:
>
>  - English --> Hindi Multimodal  (Visual Genome)  WAT will continue
> organizing the multimodal English -->  Hindi translation task where
> the input will be text and an Image and the output will be a caption
> (text). The training set contains around 30,000 segments. Additional
> details will be given on the task website.
>
>  - English --> Malayalam Multimodal  (Visual Genome)  WAT will
> organize a new multimodal English -->  Malayalam translation task
> where the input will be text and an Image and the output will be a
> caption (text). The training set contains around 30,000 segments.
> Additional details will be given on the task website.
>
>  - Japanese <--> English Multimodal (Flickr30kEnt-JP)
>  WAT  will continue the Flickr30kEnt-JP task using the corpus with the
> same name for this task. https://github.com/nlab-mpg/Flickr30kEnt-JP
>
>  - Arabic <--> English Multimodal  (ArEnMulti30K)  WAT will organize a
> new multimodal Arabic <--> English translation task where the input
> will be text and an Image and the output will be a caption (text). The
> training set contains around 30,000 segments. Additional details will
> be given on the task website.
>
>  - Japanese <--> English Multimodal  (Ambiguous MS COCO)  WAT will
> organize an additional multimodal Japanese <--> English translation
> task where the evaluation set, Ambiguous MS COCO, will focus on
> translation of ambiguous words and sentences.  Along with the
> Flickr30kEnt-JP dataset, the MS COCO English data may also be used.
> Additional details will be given on the task website.
>
> EVALUATION
> ----------
>
> Automatic evaluation:
> We are providing an automatic evaluation server. It is free for
> everyone, but you need to create an account for evaluation. Just
> showing the list of evaluation results does not require an account.
>
> Sign-up: http://lotus.kuee.kyoto-u.ac.jp/WAT/WAT2021/
> Eval. result: http://lotus.kuee.kyoto-u.ac.jp/WAT/evaluation/
>
> Human evaluation:
> Both crowdsourcing evaluation and JPO adequacy evaluation will be
> carried out for selected subtasks and selected submitted systems (the
> details will be announced later).
>
> ORGANIZERS
> ----------
>
> - Toshiaki Nakazawa, The University of Tokyo, Japan [GENERAL,
> ASPEC+ParaNatCom, BSD]
> - Hideki Nakayama, The University of Tokyo, Japan [Flickr30kEnt-JP]
> - Isao Goto, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
> - Hidaya Mino, Japan Broadcasting Corporation (NHK), Japan [GENERAL, JIJI]
> - Chenchen Ding, National Institute of Information and Communications
> Technology (NICT), Japan [GENERAL, ALT+UCSY, ALT+ECCC]
> - Raj Dabre, National Institute of Information and Communications
> Technology (NICT), Japan [MultiIndicMT, ALT+SAP, Global voices]
> - Anoop Kunchookuttan, Microsoft AI and Research, India [MultiIndicMT]
> - Shohei Higashiyama, National Institute of Information and
> Communications Technology (NICT), Japan [JPC]
> - Hiroshi Manabe, National Institute of Information and Communications
> Technology (NICT), Japan [GENERAL]
> - Win Pa Pa, University of Computer Studies, Yangon (UCSY), Myanmar
> [ALT+UCSY]
> - Shantipriya Parida, Idiap Research Institute, Martigny, Switzerland
> [Hindi Visual Genome, Malayalam Visual Genome]
> - Ondřej Bojar, Charles University, Prague, Czech Republic [Hindi
> Visual Genome, Malayalam Visual Genome]
> - Chenhui Chu, Kyoto University, Japan [Ambiguous MS COCO]
> - Mahmoud Al-Ayyoub, Jordan University of Science and Technology,
> Jordan [Multi30K]
> - Ali Fadel, Jordan University of Science and Technology, Jordan [Multi30K]
> - Roweida Mohammed, Jordan University of Science and Technology,
> Jordan [Multi30K]
> - Inad Aljarrah, Jordan University of Science and Technology, Jordan
> [Multi30K]
> - Akiko Eriguchi, Microsoft, USA [Restricted Translation]
> - Yusuke Oda, LegalForce, Japan [Restricted Translation]
> - Katsuhito Sudoh, Nara Institute of Science and Technology (NAIST),
> Japan [GENERAL]
> - Sadao Kurohashi, Kyoto University, Japan [GENERAL]
> - Pushpak Bhattacharyya, Indian Institute of Technology Patna (IITP),
> India [GENERAL]
>
> CONTACT
> -------
>
> wat-organi...@googlegroups.com
> _______________________________________________
> Mt-list site list
> Mt-list@eamt.org
> http://lists.eamt.org/mailman/listinfo/mt-list
>
_______________________________________________
Mt-list site list
Mt-list@eamt.org
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to