Hi Ergun:

I think the -skipoovs option would just drop all the n-gram scores that has
OOV in it, rather than using a skip-ngram LM model.

Easy way to test it is just run it with that option to calculate log prob
on a sentence with OOV, and it should result in a rather high score.

Please correct me if I'm wrong...

2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>:

>
> Dear Jie,
>
> There may be some option from SRILM:
> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
> *    -skipoovs*
> Instruct the LM to skip over contexts that contain out-of-vocabulary
> words, instead of using a backoff strategy in these cases.
>
> ​if it is not ​there maybe for a reason...
>
> Bing appears fast to index this thread:
> ​http://comments.gmane.org/gmane.comp.nlp.moses.user/14570​
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.ji...@gmail.com>
> wrote:
>
>> Hi Ergun:
>>
>> The original request in Quang's post was:
>>
>> *For instance, with the n-gram: "the <unk> house <unk> in", I would like
>> the decoder to assign it the probability of the phrase: "the house in"
>> (existing in the LM).*
>>
>> so each time there is a <unk> when calculating the LM score, you need to
>> look another word further.
>>
>> I believe that it cannot be achieved on current LM tools without
>> modifying the source code, which has already been clarified by Kenneth.
>>
>>
>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>:
>>
>>>
>>> Dear Kenneth,
>>>
>>> In the Moses manual, -drop-unknown switch is mentioned:
>>>
>>> 4.7.2
>>>  Handling Unknown Words
>>> Unknown words are copied verbatim to the output. They are also scored by
>>> the language
>>> model, and may be placed out of order. Alternatively, you may want to
>>> drop unknown words.
>>> To do so add the switch -drop-unknown.
>>>
>>> ​Alternatively, you can write a script that replaces all OOV tokens​
>>> with some OOV-token-identifier such as <unk> before sending for
>>> translation.
>>>
>>>
>>> *Best Regards,*
>>> Ergun
>>>
>>> Ergun Biçici
>>> DFKI Projektbüro Berlin
>>>
>>>
>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <mo...@kheafield.com>
>>> wrote:
>>>
>>>> Hi,
>>>>
>>>>         I think oov-feature=1 just activates the OOV count feature while
>>>> leaving LM score unchanged.  So it would still include p(<unk> | in).
>>>>
>>>>         One might try setting the OOV feature weight to -weight_LM *
>>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel out
>>>> the log p(<unk>) terms.  However that won't work either because:
>>>>
>>>> 1) It will still charge backoff penalties, b(the)b(house) in the
>>>> example.
>>>>
>>>> 2) The context will be lost each time so it's p(house) not p(house |
>>>> the).
>>>>
>>>> If the <unk>s follow a pattern, such as appearing every other word, one
>>>> could insert them into the ARPA file though that would waste memory.
>>>>
>>>> I don't think there's any way to accomplish exactly what OP asked for
>>>> without coding (though it wouldn't be that hard once one understands how
>>>> the LM infrastructure works).
>>>>
>>>> Kenneth
>>>>
>>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>>> > Hi,
>>>> >
>>>> > You may get the behavior you want by adding
>>>> >   "oov-feature=1"
>>>> > to your LM specification line in moses.ini
>>>> > and also add a second weight with value "0" to the corresponding LM
>>>> > weight setting.
>>>> >
>>>> > This will then only use the scores
>>>> > p(the|<s>)
>>>> > p(house|<s>,the,<unk>) ---> backoff to p(house)
>>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in)
>>>> >
>>>> > -phi
>>>> >
>>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>>> > <quangngoclu...@gmail.com <mailto:quangngoclu...@gmail.com>> wrote:
>>>> >
>>>> >     Dear All,
>>>> >
>>>> >     I am currently using a SRILM Language Model (LM) in my Moses
>>>> >     decoder. Does anyone know how can I ask the decoder, at the
>>>> decoding
>>>> >     time, skip all out-of-vocabulary words when computing the LM score
>>>> >     (instead of doing back-off)?
>>>> >
>>>> >     For instance, with the n-gram: "the <unk> house <unk> in", I would
>>>> >     like the decoder to assign it the probability of the phrase: "the
>>>> >     house in" (existing in the LM).
>>>> >
>>>> >     Do I need more options/declarations in moses.ini file?
>>>> >
>>>> >     Any help is very much appreciated,
>>>> >
>>>> >     Best,
>>>> >     Quang
>>>> >
>>>> >
>>>> >
>>>> >     _______________________________________________
>>>> >     Moses-support mailing list
>>>> >     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
>>>> >     http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> >
>>>> >
>>>> >
>>>> > _______________________________________________
>>>> > Moses-support mailing list
>>>> > Moses-support@mit.edu
>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>>> >
>>>> _______________________________________________
>>>> Moses-support mailing list
>>>> Moses-support@mit.edu
>>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>>
>>
>>
>> --
>>
>> Best regards!
>>
>> Jie Jiang
>>
>>
>


-- 

Best regards!

Jie Jiang
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to