Re: [Moses-support] Handling unknown words in Moses

Philipp Koehn Mon, 09 Aug 2010 05:52:10 -0700

Hi,

this is not correct - LM cost is in the future cost estimate.
Obviously, this is a rather low probability, depending
on if the language model was trained with open or
closed vocabulary.


The reordering of unknown words does cause often some
strange reordering, due to the fact that an unknown word
creates an unknown context for following words, and some
words may prefer more than others to appear in such an
unknown context.

-phi

On Mon, Aug 9, 2010 at 12:39 PM, Alexander Fraser
<fra...@ims.uni-stuttgart.de> wrote:
> It seems like even if this is correctly implemented, unknown words
> would be delayed until the edge of the window they are in, due to
> trying to avoid paying the high LM cost until the last minute. LM cost
> is not in the future cost, so hypotheses paying this LM cost should
> lose to hypotheses delaying the payment of this LM cost until exactly
> the point where the future cost of translating the unknown word
> becomes infinity, which is at the rightmost point where the unknown
> word is still inside of the reordering limit.
>
> Does this seem right? I think I have seen weird reordering of unknown
> words that might fit this.
>
> Cheers, Alex
>
>
> On Mon, Aug 9, 2010 at 11:46 AM, Hieu Hoang <hieuho...@gmail.com> wrote:
>>  make sure the scores of the unknown word (-100) is included in your
>> calculation of future cost.
>>
>> On 09/08/2010 08:02, nghi...@comp.nus.edu.sg wrote:
>>> Hi everyone,
>>>
>>> I would like to ask for some details on how Moses deals with the handling
>>> of unknown words. As I read from the tutorial, unknown words are copied
>>> verbatim to the output. However, it is not clear of how we deal with the
>>> distortion limit while copying unknown words to the output.
>>>
>>> The situation is that :
>>>
>>> With the default setting of Moses (phrase-based, distance-based reordering
>>> ...), the handling of unknown words will be postponed (as we penalize them
>>> severely : -100) until the very end.
>>>
>>> Therefore, it is likely that some unknown words are dropped and won't
>>> appear in the output (due to the reordering limit constraint (default = 6)
>>> !
>>>
>>> ==>  It means that the copying of unknown words are forced to postpone
>>> until the last and when it is possible to do so, the reordering limit
>>> constrain interrupts and as a results, we won't get a complete translation
>>> !
>>>
>>> This is what happened with my re-implementation of Moses and it hampered
>>> the translation quality a lot (1 or 2 BLEU points behind).
>>>
>>> However, it seems that the situation won't happen with Moses (i.e : Moses
>>> always finds a complete translation).
>>>
>>> I hope that someone can help me clarify it as I cannot find the relevant
>>> information anywhere.
>>>
>>> Thanks.
>>>
>>> Hoang Trong Nghia
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Handling unknown words in Moses

Reply via email to