On Mon, Mar 21, 2011 at 3:19 AM, Alex Fraser <alexfra...@gmail.com> wrote: >> 2) there seems to be some evidence that some translations in the >> phrase table are so bad that having leaving some words untranslated >> is "better" than using what's in the phrase table. I can see an >> argument that says that you should use the phrase table entries no >> matter what, but my limited experiments suggest that letting the LM >> make this call at least improves the BLEU score. Interpret that as you >> will. > > With Moses and a single phrase table, this can happen if there is a > single word that is not covered by the phrase table as a singleton, > but it can be covered with a longer phrase. In a typical shared task > dev or test set with grow-diag-final-and GIZA alignments, this is > restricted to about 5 to 10 words. I think it is possible that for > these 5 to 10 words pass-through directly competes with translation > (in Moses); but I haven't carefully checked this. I instead noticed > that KenLM liked to output things that were missing from my LM, so > this is similar to the first scenario Chris outlined. > > Chris -- with respect to the second scenario (I quoted above) - it > wasn't clear to me if you have tried allowing pass-through for a > larger set of words than these 5 to 10 words? How do you build your > open-class LM? I assume this matters a lot. I allow pass through of all words, with a penalty that is also learned by MERT. With the open-class LM, I use the -unk option in SRILM, which reserves a bit of probability mass for OOVs. What exactly it does is a bit unclear to me (it's more than just replacing singletons with <unk>, but that's probably a reasonable approximation).
hth, Chris > > Cheers, Alex > _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support