Dear colleagues,
Is there any annotated dataset on the completeness of machine
translation output at the sentence level, i.e. a dataset that contains
annotation as to whether all words of a given source sentence have been
translated in the corresponding MT output, or if any words are missing
in the MT output? Anecdotal evidence shows that sometimes an otherwise
fluently machine-translated target sentence may semantically lack some
source elements, e.g. negations or adjectives.
I know that the dataset from task 1 of the TQE 2019 shared task at WMT19
contains some information about words missing in target sentences, but
is there any dedicated dataset on this problem?
Recommendations from any language pair are welcome.
Thank you for your comments!
Best wishes,
Michael
--
E-Mail Signatur
*Universität Innsbruck*
Institut für Translationswissenschaft
*Dr. Michael Ustaszewski*
Herzog-Siegmund-Ufer 15
A-6020 Innsbruck | Austria
Telefon* +43 512 507 42482*
E-Mail *[email protected]*
Web *www.uibk.ac.at/translation*
_______________________________________________
Mt-list site list
[email protected]
http://lists.eamt.org/mailman/listinfo/mt-list