Re: [Moses-support] Skip OOV when computing Language Model score

Ergun Bicici Fri, 15 Jan 2016 07:45:12 -0800

No comment.



*Best Regards,*
Ergun

Ergun Biçici
DFKI Projektbüro Berlin


On Fri, Jan 15, 2016 at 4:20 PM, Jie Jiang <mail.jie.ji...@gmail.com> wrote:

> Hi Ergun:
>
> I think the -skipoovs option would just drop all the n-gram scores that
> has OOV in it, rather than using a skip-ngram LM model.
>
> Easy way to test it is just run it with that option to calculate log prob
> on a sentence with OOV, and it should result in a rather high score.
>
> Please correct me if I'm wrong...
>
> 2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>:
>
>>
>> Dear Jie,
>>
>> There may be some option from SRILM:
>> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
>> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
>> *    -skipoovs*
>> Instruct the LM to skip over contexts that contain out-of-vocabulary
>> words, instead of using a backoff strategy in these cases.
>>
>> if it is not there maybe for a reason...
>>
>> Bing appears fast to index this thread:
>> http://comments.gmane.org/gmane.comp.nlp.moses.user/14570
>>
>>
>> *Best Regards,*
>> Ergun
>>
>> Ergun Biçici
>> DFKI Projektbüro Berlin
>>
>>
>> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.ji...@gmail.com>
>> wrote:
>>
>>> Hi Ergun:
>>>
>>> The original request in Quang's post was:
>>>
>>> *For instance, with the n-gram: "the <unk> house <unk> in", I would like
>>> the decoder to assign it the probability of the phrase: "the house in"
>>> (existing in the LM).*
>>>
>>> so each time there is a <unk> when calculating the LM score, you need to
>>> look another word further.
>>>
>>> I believe that it cannot be achieved on current LM tools without
>>> modifying the source code, which has already been clarified by Kenneth.
>>>
>>>
>>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>:
>>>
>>>>
>>>> Dear Kenneth,
>>>>
>>>> In the Moses manual, -drop-unknown switch is mentioned:
>>>>
>>>> 4.7.2
>>>>  Handling Unknown Words
>>>> Unknown words are copied verbatim to the output. They are also scored
>>>> by the language
>>>> model, and may be placed out of order. Alternatively, you may want to
>>>> drop unknown words.
>>>> To do so add the switch -drop-unknown.
>>>>
>>>> Alternatively, you can write a script that replaces all OOV tokens
>>>> with some OOV-token-identifier such as <unk> before sending for
>>>> translation.
>>>>
>>>>
>>>> *Best Regards,*
>>>> Ergun
>>>>
>>>> Ergun Biçici
>>>> DFKI Projektbüro Berlin
>>>>
>>>>
>>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <mo...@kheafield.com
>>>> > wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>>         I think oov-feature=1 just activates the OOV count feature
>>>>> while
>>>>> leaving LM score unchanged.  So it would still include p(<unk> | in).
>>>>>
>>>>>         One might try setting the OOV feature weight to -weight_LM *
>>>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel
>>>>> out
>>>>> the log p(<unk>) terms.  However that won't work either because:
>>>>>
>>>>> 1) It will still charge backoff penalties, b(the)b(house) in the
>>>>> example.
>>>>>
>>>>> 2) The context will be lost each time so it's p(house) not p(house |
>>>>> the).
>>>>>
>>>>> If the <unk>s follow a pattern, such as appearing every other word, one
>>>>> could insert them into the ARPA file though that would waste memory.
>>>>>
>>>>> I don't think there's any way to accomplish exactly what OP asked for
>>>>> without coding (though it wouldn't be that hard once one understands
>>>>> how
>>>>> the LM infrastructure works).
>>>>>
>>>>> Kenneth
>>>>>
>>>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>>>> > Hi,
>>>>> >
>>>>> > You may get the behavior you want by adding
>>>>> >   "oov-feature=1"
>>>>> > to your LM specification line in moses.ini
>>>>> > and also add a second weight with value "0" to the corresponding LM
>>>>> > weight setting.
>>>>> >
>>>>> > This will then only use the scores
>>>>> > p(the|<s>)
>>>>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>>>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>>>>> >
>>>>> > -phi
>>>>> >
>>>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>>>> > <quangngoclu...@gmail.com <mailto:quangngoclu...@gmail.com>> wrote:
>>>>> >
>>>>> >     Dear All,
>>>>> >
>>>>> >     I am currently using a SRILM Language Model (LM) in my Moses
>>>>> >     decoder. Does anyone know how can I ask the decoder, at the
>>>>> decoding
>>>>> >     time, skip all out-of-vocabulary words when computing the LM
>>>>> score
>>>>> >     (instead of doing back-off)?
>>>>> >
>>>>> >     For instance, with the n-gram: "the <unk> house <unk> in", I
>>>>> would
>>>>> >     like the decoder to assign it the probability of the phrase: "the
>>>>> >     house in" (existing in the LM).
>>>>> >
>>>>> >     Do I need more options/declarations in moses.ini file?
>>>>> >
>>>>> >     Any help is very much appreciated,
>>>>> >
>>>>> >     Best,
>>>>> >     Quang
>>>>> >
>>>>> >
>>>>> >
>>>>> >     _______________________________________________
>>>>> >     Moses-support mailing list
>>>>> >     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>>> >     http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> > _______________________________________________
>>>>> > Moses-support mailing list
>>>>> > Moses-support@mit.edu
>>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>> >
>>>>> _______________________________________________
>>>>> Moses-support mailing list
>>>>> Moses-support@mit.edu
>>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> Best regards!
>>>
>>> Jie Jiang
>>>
>>>
>>
>
>
> --
>
> Best regards!
>
> Jie Jiang
>
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

Reply via email to