Hi Barry
Good job. For some language pairs below 10k, it's quite appealing BLEU
scores as reported.
Best Regards
Doren
On Wednesday, January 29, 2020, Barry Haddow wrote:
> Hi All
>
> We have released a new sentence aligned corpora pairing English with 13
> different languages spoken in India. Up to 56k sentence pairs are
> available for each pair. The languages of India contained in the corpora
> are Assamese, Bengali, Gujarati, Hindi, Kannada, Malayalam, Manipuri,
> Marathi, Odia, Punjabi, Tamil, Telugu and Urdu. We also provide a larger
> version of the corpus, document-aligned only.
>
> The corpus is available here: http://data.statmt.org/pmindia/
>
> There is an accompanying paper which describes the construction of the
> corpus, a comparison of alignment methods, and some initial MT results.
>
> https://arxiv.org/abs/2001.09907
>
>
> Barry Haddow and Faheem Kirefu
>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support