Dear colleagues,

Is there any annotated dataset on the completeness of machine translation output at the sentence level, i.e. a dataset that contains annotation as to whether all words of a given source sentence have been translated in the corresponding MT output, or if any words are missing in the MT output? Anecdotal evidence shows that sometimes an otherwise fluently machine-translated target sentence may semantically lack some source elements, e.g. negations or adjectives.

I know that the dataset from task 1 of the TQE 2019 shared task at WMT19 contains some information about words missing in target sentences, but is there any dedicated dataset on this problem?

Recommendations from any language pair are welcome.

Thank you for your comments!

Best wishes,
Michael
--
E-Mail Signatur

*Universität Innsbruck*
Institut für Translationswissenschaft

*Dr. Michael Ustaszewski*

Herzog-Siegmund-Ufer 15
A-6020 Innsbruck | Austria

Telefon*  +43 512 507 42482*
E-Mail *[email protected]*
Web *www.uibk.ac.at/translation*

_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list

Reply via email to