Jan, Terence, Jesús, Thank you for your feedback.
I'm glad to know that research is being pursued in this direction as well. It's quite a surprising aspect (for me, at least) that BLEU is actually recommended only for tuning SMTs (by Philip Koehn himself!). Best regards, Vadim ----- Original Message ----- From: Vadim Berman To: [email protected] Sent: Wednesday, October 27, 2010 10:51 PM Subject: [Mt-list] Meaning-based MT evaluation Hello group, We all know well that words are not just sets of characters. They stand for real-world entities. Sadly, it appears that the mainstream metrics have little regards to semantics. OK, NIST has something like semantic weight, and METEOR tries to match synonyms. I am intrigued by the less-known BADGER, but it appears to have adopted by more or less the same approach. However, it appears that the metrics themselves are looking at the content as string data with some semantic characteristics. (I am sure the developers of the metrics are present in this group, and will be happy to hear their comments.) The "similarity to human translation" is not an exact science: there can be many different correct (or equally incorrect) human translations. J. L. Borges's essay "The Translators of the One Thousand and One Nights" (http://books.google.com.au/books?id=vLC5luAnbSUC&lpg=PR8&dq=%22The%20Translators%20of%20the%20One%20Thousand%20and%20One%20Nights%22%20borges&pg=PA94#v=onepage&q=%22The%20Translators%20of%20the%20One%20Thousand%20and%20One%20Nights%22%20borges&f=false - sorry, that's the best I could find) is an excellent illustration. A professional translator is not a mathematical unit. What's constant then? The main idea, the semantic relations. The agent, the patient, the circumstance. These should stay put in the source and the target content. Back in 1960s, the ALPAC people noted that there are two variables in play: intelligibility and fidelity / trustworthiness. For some reason, the fidelity seems to be largely overlooked. That's bizarre, because MT is about gisting, isn't it? OK, everybody heard about "Heath Ledger died" becoming "Tom Cruise died" in English -> Spanish translation by Google Translate. 100% readability, 0% fidelity. A pretty dangerous combination. (Worse than the opposite, BTW: disinformation is worse than no information.) This, actually, happens quite often in SMT, and when the agent / patient must be swapped in translation, more often than not they stay put, resulting the opposite meaning. Yet the metrics do not really reflect this well enough. So what if "not" is missing? It's just one small particle. I'm just throwing ideas. Any critique / comments are welcome: 1. What if there was a metric that would output not a single figure, but a multidimensional vector of values. 2. The metric would try to parse the source and target content, and convert it to a set of semantic nodes, which would be then inspected. 3. The values would be: a.. semantic distance from the agent and patient semantic nodes in the source content b.. semantic distance from the action c.. same for circumstances (time, location), and indirect objects d.. attributes like adjectives, adverbs, and some determiners and the correctness of their relative positions e.. intelligibility: ratio between "parseability" (?) of the target content to the "parseability" of the source content f.. choice of lexicon: the choice of words in the current domain. For instance, the same concept may have a synonym not appropriate in a current context (e.g. racial slur) Does any of this make sense? Best regards, Vadim Berman Digital Sonata Pty Ltd www.digitalsonata.com Australian Business Number: 54 122 188 998 Address: PO Box 803, Camberwell, Vic 3124, Australia Phone: +61 (0)3 98094461 Mobile: +61 (0)432 894 862 Skype: vadimberman ------------------------------------------------------------------------------ _______________________________________________ Mt-list mailing list
_______________________________________________ Mt-list mailing list
