Dear moses-list,

We make the English, Czech, Finnish, German, Latvian, Romanian, Russian,
Turkish, and Chinese datasets used for WMT'16 (http://www.statmt.org/wmt16/
<http://www.statmt.org/wmt15/>) and WMT'17 (http://www.statmt.org/wmt17/
<http://www.statmt.org/wmt15/>) translation task when building ParFDA Moses
SMT models available on the web, downloadable from:

[WMT'17]
https://drive.google.com/drive/folders/0B2k8ISN7gmi1SnA1d1gxcTQ5TTg?usp=sharing
[WMT'16] https://drive.google.com/drive/folders/0B2k8ISN7gmi
1NHNTSGFrMGhfaVU?usp=sharing

WMT'16 results are in the following paper:

Ergun Bicici. *ParFDA for Instance Selection for Statistical Machine
Translation*. In *Proc. of the First Conference on Statistical Machine
Translation (WMT16)*, Berlin, Germany, 8 2016. Association for
Computational Linguistics.

The datasets are selected by ParFDA for WMT'16 and WMT'17 translation tasks
from among the pool of sentences made available by the WMT organization and
ParFDA Moses SMT results can serve as a benchmark for SMT research. Language
model corpora used contain ~15M sentences and language models were built
using kenlm (https://kheafield.com/code/kenlm/).

LICENSE Note: BSD license. We also inherit characteristics of the license
of WMT conference organization, which allows the use for research purposes,
to make the datasets available.

ParFDA WMT SMT datasets:

   - ParFDA WMT'17 Datasets (https://github.com/bicici/ParFDAWMT17)
   - ParFDA WMT'16 Datasets (https://github.com/bicici/ParFDAWMT16)
   - ParFDA WMT'15 Datasets (https://github.com/bicici/ParFDAWMT15)
   - ParFDA WMT'14 Datasets (https://github.com/bicici/ParFDAWMT14)


Best Regards,
Ergun

TUBITAK BILGEM B3LAB Cloud Computing Laboratory
bicici.github.com
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to