Dear all,
I would like to announce the official release of the United Nations 
Parallel Corpus v1.0. The corpus was created as part of the United 
Nations commitment to multilingualism and as a reaction to the growing 
importance of statistical machine translation (SMT) within the 
Department for General Assembly and Conference Management (DGACM) 
translation services and the United Nations. It covers 25 years, from 
1990 to 2014, and contains documents in the six official languages of 
the United Nations: Arabic, Chinese, English, French, Russian, and Spanish.

The purpose of the corpus is to allow access to multilingual language 
resources and facilitate research and progress in various natural 
language processing tasks, including machine translation. For 
convenience, the corpus is also available pre-packaged as bi-texts for 
each language pair.

A subset of the corpus is available as a six-language fully-parallel 
corpus, i.e. all sentences have equivalents in all six languages. Data 
from 2015 has been used to created official development sets and test 
sets, also fully aligned across the six official UN languages. The paper 
reports SMT baselines for all languages pairs for this corpus.

The corpus is available at:

http://conferences.unite.un.org/UNCorpus

The corresponding publication is available at:

http://www.lrec-conf.org/proceedings/lrec2016/pdf/1195_Paper.pdf

While registering, please leave a short description of the work for 
which you plan to use the corpus. In the near future we plan to set up a 
section with references to papers that describe research done with UN 
corpus. Feel free to share links and bibliography items with us (either 
with me or any of the authors of the above paper).

Sorry for cross-posting,
Marcin Junczys-Dowmunt

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to