On Mon, Mar 21, 2011 at 3:19 AM, Alex Fraser <alexfra...@gmail.com> wrote:
>> 2) there seems to be some evidence that some translations in the
>> phrase table are so bad that having leaving some words untranslated
>> is "better" than using what's in the phrase table. I can see an
>> argument that says that you should use the phrase table entries no
>> matter what, but my limited experiments suggest that letting the LM
>> make this call at least improves the BLEU score. Interpret that as you
>> will.
>
> With Moses and a single phrase table, this can happen if there is a
> single word that is not covered by the phrase table as a singleton,
> but it can be covered with a longer phrase. In a typical shared task
> dev or test set with grow-diag-final-and GIZA alignments, this is
> restricted to about 5 to 10 words. I think it is possible that for
> these 5 to 10 words pass-through directly competes with translation
> (in Moses); but I haven't carefully checked this. I instead noticed
> that KenLM liked to output things that were missing from my LM, so
> this is similar to the first scenario Chris outlined.
>
> Chris -- with respect to the second scenario (I quoted above) - it
> wasn't clear to me if you have tried allowing pass-through for a
> larger set of words than these 5 to 10 words? How do you build your
> open-class LM? I assume this matters a lot.
I allow pass through of all words, with a penalty that is also learned
by MERT.  With the open-class LM, I use the -unk option in SRILM,
which reserves a bit of probability mass for OOVs. What exactly it
does is a bit unclear to me (it's more than just replacing singletons
with <unk>, but that's probably a reasonable approximation).

hth,
Chris

>
> Cheers, Alex
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to