Re: [Mt-list] Meaning-based MT evaluation

Vadim Berman Thu, 28 Oct 2010 02:40:58 -0700

Jan, Terence, Jesús,

Thank you for your feedback.

I'm glad to know that research is being pursued in this direction as well. It's
quite a surprising aspect (for me, at least) that BLEU is actually recommended
only for tuning SMTs (by Philip Koehn himself!).

Best regards,
Vadim
----- Original Message -----
From: Vadim Berman
To: [email protected]
Sent: Wednesday, October 27, 2010 10:51 PM
Subject: [Mt-list] Meaning-based MT evaluation

Hello group,

We all know well that words are not just sets of characters. They stand for
real-world entities. Sadly, it appears that the mainstream metrics have little
regards to semantics. OK, NIST has something like semantic weight, and METEOR
tries to match synonyms. I am intrigued by the less-known BADGER, but it
appears to have adopted by more or less the same approach.

However, it appears that the metrics themselves are looking at the content as
string data with some semantic characteristics. (I am sure the developers of
the metrics are present in this group, and will be happy to hear their
comments.)

The "similarity to human translation" is not an exact science: there can be
many different correct (or equally incorrect) human translations. J. L.
Borges's essay "The Translators of the One Thousand and One Nights"
(http://books.google.com.au/books?id=vLC5luAnbSUC&lpg=PR8&dq=%22The%20Translators%20of%20the%20One%20Thousand%20and%20One%20Nights%22%20borges&pg=PA94#v=onepage&q=%22The%20Translators%20of%20the%20One%20Thousand%20and%20One%20Nights%22%20borges&f=false
- sorry, that's the best I could find) is an excellent illustration. A
professional translator is not a mathematical unit.

What's constant then? The main idea, the semantic relations. The agent, the
patient, the circumstance. These should stay put in the source and the target
content.

Back in 1960s, the ALPAC people noted that there are two variables in play:
intelligibility and fidelity / trustworthiness. For some reason, the fidelity
seems to be largely overlooked. That's bizarre, because MT is about gisting,
isn't it?

OK, everybody heard about "Heath Ledger died" becoming "Tom Cruise died" in
English -> Spanish translation by Google Translate. 100% readability, 0%
fidelity. A pretty dangerous combination. (Worse than the opposite, BTW:
disinformation is worse than no information.) This, actually, happens quite
often in SMT, and when the agent / patient must be swapped in translation, more
often than not they stay put, resulting the opposite meaning. Yet the metrics
do not really reflect this well enough. So what if "not" is missing? It's just
one small particle.

I'm just throwing ideas. Any critique / comments are welcome:

1. What if there was a metric that would output not a single figure, but a
multidimensional vector of values.
2. The metric would try to parse the source and target content, and convert
it to a set of semantic nodes, which would be then inspected.
3. The values would be:
a.. semantic distance from the agent and patient semantic nodes in the
source content
b.. semantic distance from the action
c.. same for circumstances (time, location), and indirect objects
d.. attributes like adjectives, adverbs, and some determiners and the
correctness of their relative positions
e.. intelligibility: ratio between "parseability" (?) of the target content
to the "parseability" of the source content
f.. choice of lexicon: the choice of words in the current domain. For
instance, the same concept may have a synonym not appropriate in a current
context (e.g. racial slur)

Does any of this make sense?

Best regards,
Vadim Berman

Digital Sonata Pty Ltd
www.digitalsonata.com
Australian Business Number: 54 122 188 998

Address: PO Box 803, Camberwell, Vic 3124, Australia
Phone: +61 (0)3 98094461
Mobile: +61 (0)432 894 862
Skype: vadimberman

------------------------------------------------------------------------------

_______________________________________________
Mt-list mailing list

_______________________________________________
Mt-list mailing list

Re: [Mt-list] Meaning-based MT evaluation

Reply via email to