subject:"Re\: \[Moses\-support\] Skip OOV when computing Language Model score"

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-18 Thread LUONG NGOC Quang

Dear All,

Thank you all of you for your contribution.

Actually I am using a LM trained over data not identical to the target side
of the phrase table (with much more limited vocabulary, for my own
purpose), so I don't think that -drop-unknown option would help.

As Jie also emphasized, my objective is to jump to one word futher when
encoutering , and so on. That would match "house  in" to "house
in" in the LM without doing anything else (e.g. backoff). And it is not
exactly what the setting oov-feature=1 can do!

I also observed that -skipoovs option puts zero probability for all ngrams
containing OOV and therefore does not count them in the overall sentence LM
score.

So far I am more convinced that the code modification is the possible way
to accomplish my goal, although it is not straighforward for me at present.

Best,
Quang



On Fri, Jan 15, 2016 at 4:41 PM, Ergun Bicici  wrote:

>
> No comment.
>
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 4:20 PM, Jie Jiang 
> wrote:
>
>> Hi Ergun:
>>
>> I think the -skipoovs option would just drop all the n-gram scores that
>> has OOV in it, rather than using a skip-ngram LM model.
>>
>> Easy way to test it is just run it with that option to calculate log prob
>> on a sentence with OOV, and it should result in a rather high score.
>>
>> Please correct me if I'm wrong...
>>
>> 2016-01-15 14:07 GMT+00:00 Ergun Bicici :
>>
>>>
>>> Dear Jie,
>>>
>>> There may be some option from SRILM:
>>> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
>>> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
>>> *-skipoovs*
>>> Instruct the LM to skip over contexts that contain out-of-vocabulary
>>> words, instead of using a backoff strategy in these cases.
>>>
>>> if it is not there maybe for a reason...
>>>
>>> Bing appears fast to index this thread:
>>> http://comments.gmane.org/gmane.comp.nlp.moses.user/14570
>>>
>>>
>>> *Best Regards,*
>>> Ergun
>>>
>>> Ergun Biçici
>>> DFKI Projektbüro Berlin
>>>
>>>
>>> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang 
>>> wrote:
>>>
 Hi Ergun:

 The original request in Quang's post was:

 *For instance, with the n-gram: "the  house  in", I would
 like the decoder to assign it the probability of the phrase: "the house in"
 (existing in the LM).*

 so each time there is a  when calculating the LM score, you need
 to look another word further.

 I believe that it cannot be achieved on current LM tools without
 modifying the source code, which has already been clarified by Kenneth.


 2016-01-15 13:20 GMT+00:00 Ergun Bicici :

>
> Dear Kenneth,
>
> In the Moses manual, -drop-unknown switch is mentioned:
>
> 4.7.2
>  Handling Unknown Words
> Unknown words are copied verbatim to the output. They are also scored
> by the language
> model, and may be placed out of order. Alternatively, you may want to
> drop unknown words.
> To do so add the switch -drop-unknown.
>
> Alternatively, you can write a script that replaces all OOV tokens
> with some OOV-token-identifier such as  before sending for
> translation.
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <
> mo...@kheafield.com> wrote:
>
>> Hi,
>>
>> I think oov-feature=1 just activates the OOV count feature
>> while
>> leaving LM score unchanged.  So it would still include p( | in).
>>
>> One might try setting the OOV feature weight to -weight_LM *
>> weird_moses_internal_constant * log p() in an attempt to cancel
>> out
>> the log p() terms.  However that won't work either because:
>>
>> 1) It will still charge backoff penalties, b(the)b(house) in the
>> example.
>>
>> 2) The context will be lost each time so it's p(house) not p(house |
>> the).
>>
>> If the s follow a pattern, such as appearing every other word,
>> one
>> could insert them into the ARPA file though that would waste memory.
>>
>> I don't think there's any way to accomplish exactly what OP asked for
>> without coding (though it wouldn't be that hard once one understands
>> how
>> the LM infrastructure works).
>>
>> Kenneth
>>
>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>> > Hi,
>> >
>> > You may get the behavior you want by adding
>> >   "oov-feature=1"
>> > to your LM specification line in moses.ini
>> > and also add a second weight with value "0" to the corresponding LM
>> > weight setting.
>> >
>> > This will then only use the scores
>> > p(the|)
>> > p(house|,the,) ---> backoff to p(house)
>> > p(in|,the,,house,) ---> backoff to p(in)
>> >
>> > -phi
>> >
>> > On Thu, Jan 14, 2016

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Ergun Bicici

No comment.



*Best Regards,*
Ergun

Ergun Biçici
DFKI Projektbüro Berlin


On Fri, Jan 15, 2016 at 4:20 PM, Jie Jiang  wrote:

> Hi Ergun:
>
> I think the -skipoovs option would just drop all the n-gram scores that
> has OOV in it, rather than using a skip-ngram LM model.
>
> Easy way to test it is just run it with that option to calculate log prob
> on a sentence with OOV, and it should result in a rather high score.
>
> Please correct me if I'm wrong...
>
> 2016-01-15 14:07 GMT+00:00 Ergun Bicici :
>
>>
>> Dear Jie,
>>
>> There may be some option from SRILM:
>> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
>> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
>> *-skipoovs*
>> Instruct the LM to skip over contexts that contain out-of-vocabulary
>> words, instead of using a backoff strategy in these cases.
>>
>> if it is not there maybe for a reason...
>>
>> Bing appears fast to index this thread:
>> http://comments.gmane.org/gmane.comp.nlp.moses.user/14570
>>
>>
>> *Best Regards,*
>> Ergun
>>
>> Ergun Biçici
>> DFKI Projektbüro Berlin
>>
>>
>> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang 
>> wrote:
>>
>>> Hi Ergun:
>>>
>>> The original request in Quang's post was:
>>>
>>> *For instance, with the n-gram: "the  house  in", I would like
>>> the decoder to assign it the probability of the phrase: "the house in"
>>> (existing in the LM).*
>>>
>>> so each time there is a  when calculating the LM score, you need to
>>> look another word further.
>>>
>>> I believe that it cannot be achieved on current LM tools without
>>> modifying the source code, which has already been clarified by Kenneth.
>>>
>>>
>>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici :
>>>

 Dear Kenneth,

 In the Moses manual, -drop-unknown switch is mentioned:

 4.7.2
  Handling Unknown Words
 Unknown words are copied verbatim to the output. They are also scored
 by the language
 model, and may be placed out of order. Alternatively, you may want to
 drop unknown words.
 To do so add the switch -drop-unknown.

 Alternatively, you can write a script that replaces all OOV tokens
 with some OOV-token-identifier such as  before sending for
 translation.


 *Best Regards,*
 Ergun

 Ergun Biçici
 DFKI Projektbüro Berlin


 On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield >>> > wrote:

> Hi,
>
> I think oov-feature=1 just activates the OOV count feature
> while
> leaving LM score unchanged.  So it would still include p( | in).
>
> One might try setting the OOV feature weight to -weight_LM *
> weird_moses_internal_constant * log p() in an attempt to cancel
> out
> the log p() terms.  However that won't work either because:
>
> 1) It will still charge backoff penalties, b(the)b(house) in the
> example.
>
> 2) The context will be lost each time so it's p(house) not p(house |
> the).
>
> If the s follow a pattern, such as appearing every other word, one
> could insert them into the ARPA file though that would waste memory.
>
> I don't think there's any way to accomplish exactly what OP asked for
> without coding (though it wouldn't be that hard once one understands
> how
> the LM infrastructure works).
>
> Kenneth
>
> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
> > Hi,
> >
> > You may get the behavior you want by adding
> >   "oov-feature=1"
> > to your LM specification line in moses.ini
> > and also add a second weight with value "0" to the corresponding LM
> > weight setting.
> >
> > This will then only use the scores
> > p(the|)
> > p(house|,the,) ---> backoff to p(house)
> > p(in|,the,,house,) ---> backoff to p(in)
> >
> > -phi
> >
> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
> > mailto:quangngoclu...@gmail.com>> wrote:
> >
> > Dear All,
> >
> > I am currently using a SRILM Language Model (LM) in my Moses
> > decoder. Does anyone know how can I ask the decoder, at the
> decoding
> > time, skip all out-of-vocabulary words when computing the LM
> score
> > (instead of doing back-off)?
> >
> > For instance, with the n-gram: "the  house  in", I
> would
> > like the decoder to assign it the probability of the phrase: "the
> > house in" (existing in the LM).
> >
> > Do I need more options/declarations in moses.ini file?
> >
> > Any help is very much appreciated,
> >
> > Best,
> > Quang
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu 
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> >

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Jie Jiang

Hi Ergun:

I think the -skipoovs option would just drop all the n-gram scores that has
OOV in it, rather than using a skip-ngram LM model.

Easy way to test it is just run it with that option to calculate log prob
on a sentence with OOV, and it should result in a rather high score.

Please correct me if I'm wrong...

2016-01-15 14:07 GMT+00:00 Ergun Bicici :

>
> Dear Jie,
>
> There may be some option from SRILM:
> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
> *-skipoovs*
> Instruct the LM to skip over contexts that contain out-of-vocabulary
> words, instead of using a backoff strategy in these cases.
>
> if it is not there maybe for a reason...
>
> Bing appears fast to index this thread:
> http://comments.gmane.org/gmane.comp.nlp.moses.user/14570
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang 
> wrote:
>
>> Hi Ergun:
>>
>> The original request in Quang's post was:
>>
>> *For instance, with the n-gram: "the  house  in", I would like
>> the decoder to assign it the probability of the phrase: "the house in"
>> (existing in the LM).*
>>
>> so each time there is a  when calculating the LM score, you need to
>> look another word further.
>>
>> I believe that it cannot be achieved on current LM tools without
>> modifying the source code, which has already been clarified by Kenneth.
>>
>>
>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici :
>>
>>>
>>> Dear Kenneth,
>>>
>>> In the Moses manual, -drop-unknown switch is mentioned:
>>>
>>> 4.7.2
>>>  Handling Unknown Words
>>> Unknown words are copied verbatim to the output. They are also scored by
>>> the language
>>> model, and may be placed out of order. Alternatively, you may want to
>>> drop unknown words.
>>> To do so add the switch -drop-unknown.
>>>
>>> Alternatively, you can write a script that replaces all OOV tokens
>>> with some OOV-token-identifier such as  before sending for
>>> translation.
>>>
>>>
>>> *Best Regards,*
>>> Ergun
>>>
>>> Ergun Biçici
>>> DFKI Projektbüro Berlin
>>>
>>>
>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield 
>>> wrote:
>>>
 Hi,

 I think oov-feature=1 just activates the OOV count feature while
 leaving LM score unchanged.  So it would still include p( | in).

 One might try setting the OOV feature weight to -weight_LM *
 weird_moses_internal_constant * log p() in an attempt to cancel out
 the log p() terms.  However that won't work either because:

 1) It will still charge backoff penalties, b(the)b(house) in the
 example.

 2) The context will be lost each time so it's p(house) not p(house |
 the).

 If the s follow a pattern, such as appearing every other word, one
 could insert them into the ARPA file though that would waste memory.

 I don't think there's any way to accomplish exactly what OP asked for
 without coding (though it wouldn't be that hard once one understands how
 the LM infrastructure works).

 Kenneth

 On 01/14/2016 11:07 PM, Philipp Koehn wrote:
 > Hi,
 >
 > You may get the behavior you want by adding
 >   "oov-feature=1"
 > to your LM specification line in moses.ini
 > and also add a second weight with value "0" to the corresponding LM
 > weight setting.
 >
 > This will then only use the scores
 > p(the|)
 > p(house|,the,) ---> backoff to p(house)
 > p(in|,the,,house,) ---> backoff to p(in)
 >
 > -phi
 >
 > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
 > mailto:quangngoclu...@gmail.com>> wrote:
 >
 > Dear All,
 >
 > I am currently using a SRILM Language Model (LM) in my Moses
 > decoder. Does anyone know how can I ask the decoder, at the
 decoding
 > time, skip all out-of-vocabulary words when computing the LM score
 > (instead of doing back-off)?
 >
 > For instance, with the n-gram: "the  house  in", I would
 > like the decoder to assign it the probability of the phrase: "the
 > house in" (existing in the LM).
 >
 > Do I need more options/declarations in moses.ini file?
 >
 > Any help is very much appreciated,
 >
 > Best,
 > Quang
 >
 >
 >
 > ___
 > Moses-support mailing list
 > Moses-support@mit.edu 
 > http://mailman.mit.edu/mailman/listinfo/moses-support
 >
 >
 >
 >
 > ___
 > Moses-support mailing list
 > Moses-support@mit.edu
 > http://mailman.mit.edu/mailman/listinfo/moses-support
 >
 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Ergun Bicici

Dear Jie,

There may be some option from SRILM:
- http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html
- http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html:
*-skipoovs*
Instruct the LM to skip over contexts that contain out-of-vocabulary words,
instead of using a backoff strategy in these cases.

if it is not there maybe for a reason...

Bing appears fast to index this thread:
http://comments.gmane.org/gmane.comp.nlp.moses.user/14570


*Best Regards,*
Ergun

Ergun Biçici
DFKI Projektbüro Berlin


On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang  wrote:

> Hi Ergun:
>
> The original request in Quang's post was:
>
> *For instance, with the n-gram: "the  house  in", I would like
> the decoder to assign it the probability of the phrase: "the house in"
> (existing in the LM).*
>
> so each time there is a  when calculating the LM score, you need to
> look another word further.
>
> I believe that it cannot be achieved on current LM tools without modifying
> the source code, which has already been clarified by Kenneth.
>
>
> 2016-01-15 13:20 GMT+00:00 Ergun Bicici :
>
>>
>> Dear Kenneth,
>>
>> In the Moses manual, -drop-unknown switch is mentioned:
>>
>> 4.7.2
>>  Handling Unknown Words
>> Unknown words are copied verbatim to the output. They are also scored by
>> the language
>> model, and may be placed out of order. Alternatively, you may want to
>> drop unknown words.
>> To do so add the switch -drop-unknown.
>>
>> Alternatively, you can write a script that replaces all OOV tokens with
>> some OOV-token-identifier such as  before sending for translation.
>>
>>
>> *Best Regards,*
>> Ergun
>>
>> Ergun Biçici
>> DFKI Projektbüro Berlin
>>
>>
>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield 
>> wrote:
>>
>>> Hi,
>>>
>>> I think oov-feature=1 just activates the OOV count feature while
>>> leaving LM score unchanged.  So it would still include p( | in).
>>>
>>> One might try setting the OOV feature weight to -weight_LM *
>>> weird_moses_internal_constant * log p() in an attempt to cancel out
>>> the log p() terms.  However that won't work either because:
>>>
>>> 1) It will still charge backoff penalties, b(the)b(house) in the example.
>>>
>>> 2) The context will be lost each time so it's p(house) not p(house |
>>> the).
>>>
>>> If the s follow a pattern, such as appearing every other word, one
>>> could insert them into the ARPA file though that would waste memory.
>>>
>>> I don't think there's any way to accomplish exactly what OP asked for
>>> without coding (though it wouldn't be that hard once one understands how
>>> the LM infrastructure works).
>>>
>>> Kenneth
>>>
>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>>> > Hi,
>>> >
>>> > You may get the behavior you want by adding
>>> >   "oov-feature=1"
>>> > to your LM specification line in moses.ini
>>> > and also add a second weight with value "0" to the corresponding LM
>>> > weight setting.
>>> >
>>> > This will then only use the scores
>>> > p(the|)
>>> > p(house|,the,) ---> backoff to p(house)
>>> > p(in|,the,,house,) ---> backoff to p(in)
>>> >
>>> > -phi
>>> >
>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>>> > mailto:quangngoclu...@gmail.com>> wrote:
>>> >
>>> > Dear All,
>>> >
>>> > I am currently using a SRILM Language Model (LM) in my Moses
>>> > decoder. Does anyone know how can I ask the decoder, at the
>>> decoding
>>> > time, skip all out-of-vocabulary words when computing the LM score
>>> > (instead of doing back-off)?
>>> >
>>> > For instance, with the n-gram: "the  house  in", I would
>>> > like the decoder to assign it the probability of the phrase: "the
>>> > house in" (existing in the LM).
>>> >
>>> > Do I need more options/declarations in moses.ini file?
>>> >
>>> > Any help is very much appreciated,
>>> >
>>> > Best,
>>> > Quang
>>> >
>>> >
>>> >
>>> > ___
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu 
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> >
>>> >
>>> >
>>> > ___
>>> > Moses-support mailing list
>>> > Moses-support@mit.edu
>>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>> >
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>>
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
> --
>
> Best regards!
>
> Jie Jiang
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Kenneth Heafield

It depends on what the OP meant by OOV.  If it's phrase-table OOV then
-drop-unknown will work.  If it's language model OOV then it won't.

However, if the target language model(s) contain the target side of the
phrase table, then language model OOV implies phrase table OOV.

Kenneth

On 01/15/2016 01:37 PM, Jie Jiang wrote:
> Hi Ergun:
> 
> The original request in Quang's post was:
> 
> */For instance, with the n-gram: "the  house  in", I would
> like the decoder to assign it the probability of the phrase: "the house
> in" (existing in the LM)./*
> 
> so each time there is a  when calculating the LM score, you need to
> look another word further.
> 
> I believe that it cannot be achieved on current LM tools without
> modifying the source code, which has already been clarified by Kenneth.
> 
> 
> 2016-01-15 13:20 GMT+00:00 Ergun Bicici  >:
> 
> 
> Dear Kenneth,
> 
> In the Moses manual, -drop-unknown switch is mentioned:
> 
> 4.7.2
>  Handling Unknown Words
> Unknown words are copied verbatim to the output. They are also
> scored by the language
> model, and may be placed out of order. Alternatively, you may want
> to drop unknown words.
> To do so add the switch -drop-unknown.
> 
> Alternatively, you can write a script that replaces all OOV tokens
> with some OOV-token-identifier such as  before sending for
> translation. 
> 
> 
> /Best Regards,/
> Ergun
> 
> Ergun Biçici
> DFKI Projektbüro Berlin
> 
> 
> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield
> mailto:mo...@kheafield.com>> wrote:
> 
> Hi,
> 
> I think oov-feature=1 just activates the OOV count
> feature while
> leaving LM score unchanged.  So it would still include p( |
> in).
> 
> One might try setting the OOV feature weight to -weight_LM *
> weird_moses_internal_constant * log p() in an attempt to
> cancel out
> the log p() terms.  However that won't work either because:
> 
> 1) It will still charge backoff penalties, b(the)b(house) in the
> example.
> 
> 2) The context will be lost each time so it's p(house) not
> p(house | the).
> 
> If the s follow a pattern, such as appearing every other
> word, one
> could insert them into the ARPA file though that would waste memory.
> 
> I don't think there's any way to accomplish exactly what OP
> asked for
> without coding (though it wouldn't be that hard once one
> understands how
> the LM infrastructure works).
> 
> Kenneth
> 
> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
> > Hi,
> >
> > You may get the behavior you want by adding
> >   "oov-feature=1"
> > to your LM specification line in moses.ini
> > and also add a second weight with value "0" to the corresponding LM
> > weight setting.
> >
> > This will then only use the scores
> > p(the|)
> > p(house|,the,) ---> backoff to p(house)
> > p(in|,the,,house,) ---> backoff to p(in)
> >
> > -phi
> >
> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
> > mailto:quangngoclu...@gmail.com>
>  >> wrote:
> >
> > Dear All,
> >
> > I am currently using a SRILM Language Model (LM) in my Moses
> > decoder. Does anyone know how can I ask the decoder, at the 
> decoding
> > time, skip all out-of-vocabulary words when computing the LM 
> score
> > (instead of doing back-off)?
> >
> > For instance, with the n-gram: "the  house  in", I 
> would
> > like the decoder to assign it the probability of the phrase: 
> "the
> > house in" (existing in the LM).
> >
> > Do I need more options/declarations in moses.ini file?
> >
> > Any help is very much appreciated,
> >
> > Best,
> > Quang
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu 
> >
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu 
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Jie Jiang

Hi Ergun:

The original request in Quang's post was:

*For instance, with the n-gram: "the  house  in", I would like
the decoder to assign it the probability of the phrase: "the house in"
(existing in the LM).*

so each time there is a  when calculating the LM score, you need to
look another word further.

I believe that it cannot be achieved on current LM tools without modifying
the source code, which has already been clarified by Kenneth.


2016-01-15 13:20 GMT+00:00 Ergun Bicici :

>
> Dear Kenneth,
>
> In the Moses manual, -drop-unknown switch is mentioned:
>
> 4.7.2
>  Handling Unknown Words
> Unknown words are copied verbatim to the output. They are also scored by
> the language
> model, and may be placed out of order. Alternatively, you may want to drop
> unknown words.
> To do so add the switch -drop-unknown.
>
> Alternatively, you can write a script that replaces all OOV tokens with
> some OOV-token-identifier such as  before sending for translation.
>
>
> *Best Regards,*
> Ergun
>
> Ergun Biçici
> DFKI Projektbüro Berlin
>
>
> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield 
> wrote:
>
>> Hi,
>>
>> I think oov-feature=1 just activates the OOV count feature while
>> leaving LM score unchanged.  So it would still include p( | in).
>>
>> One might try setting the OOV feature weight to -weight_LM *
>> weird_moses_internal_constant * log p() in an attempt to cancel out
>> the log p() terms.  However that won't work either because:
>>
>> 1) It will still charge backoff penalties, b(the)b(house) in the example.
>>
>> 2) The context will be lost each time so it's p(house) not p(house | the).
>>
>> If the s follow a pattern, such as appearing every other word, one
>> could insert them into the ARPA file though that would waste memory.
>>
>> I don't think there's any way to accomplish exactly what OP asked for
>> without coding (though it wouldn't be that hard once one understands how
>> the LM infrastructure works).
>>
>> Kenneth
>>
>> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
>> > Hi,
>> >
>> > You may get the behavior you want by adding
>> >   "oov-feature=1"
>> > to your LM specification line in moses.ini
>> > and also add a second weight with value "0" to the corresponding LM
>> > weight setting.
>> >
>> > This will then only use the scores
>> > p(the|)
>> > p(house|,the,) ---> backoff to p(house)
>> > p(in|,the,,house,) ---> backoff to p(in)
>> >
>> > -phi
>> >
>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
>> > mailto:quangngoclu...@gmail.com>> wrote:
>> >
>> > Dear All,
>> >
>> > I am currently using a SRILM Language Model (LM) in my Moses
>> > decoder. Does anyone know how can I ask the decoder, at the decoding
>> > time, skip all out-of-vocabulary words when computing the LM score
>> > (instead of doing back-off)?
>> >
>> > For instance, with the n-gram: "the  house  in", I would
>> > like the decoder to assign it the probability of the phrase: "the
>> > house in" (existing in the LM).
>> >
>> > Do I need more options/declarations in moses.ini file?
>> >
>> > Any help is very much appreciated,
>> >
>> > Best,
>> > Quang
>> >
>> >
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu 
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>> >
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 

Best regards!

Jie Jiang
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-15 Thread Ergun Bicici

Dear Kenneth,

In the Moses manual, -drop-unknown switch is mentioned:

4.7.2
 Handling Unknown Words
Unknown words are copied verbatim to the output. They are also scored by
the language
model, and may be placed out of order. Alternatively, you may want to drop
unknown words.
To do so add the switch -drop-unknown.

Alternatively, you can write a script that replaces all OOV tokens with
some OOV-token-identifier such as  before sending for translation.


*Best Regards,*
Ergun

Ergun Biçici
DFKI Projektbüro Berlin


On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield 
wrote:

> Hi,
>
> I think oov-feature=1 just activates the OOV count feature while
> leaving LM score unchanged.  So it would still include p( | in).
>
> One might try setting the OOV feature weight to -weight_LM *
> weird_moses_internal_constant * log p() in an attempt to cancel out
> the log p() terms.  However that won't work either because:
>
> 1) It will still charge backoff penalties, b(the)b(house) in the example.
>
> 2) The context will be lost each time so it's p(house) not p(house | the).
>
> If the s follow a pattern, such as appearing every other word, one
> could insert them into the ARPA file though that would waste memory.
>
> I don't think there's any way to accomplish exactly what OP asked for
> without coding (though it wouldn't be that hard once one understands how
> the LM infrastructure works).
>
> Kenneth
>
> On 01/14/2016 11:07 PM, Philipp Koehn wrote:
> > Hi,
> >
> > You may get the behavior you want by adding
> >   "oov-feature=1"
> > to your LM specification line in moses.ini
> > and also add a second weight with value "0" to the corresponding LM
> > weight setting.
> >
> > This will then only use the scores
> > p(the|)
> > p(house|,the,) ---> backoff to p(house)
> > p(in|,the,,house,) ---> backoff to p(in)
> >
> > -phi
> >
> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
> > mailto:quangngoclu...@gmail.com>> wrote:
> >
> > Dear All,
> >
> > I am currently using a SRILM Language Model (LM) in my Moses
> > decoder. Does anyone know how can I ask the decoder, at the decoding
> > time, skip all out-of-vocabulary words when computing the LM score
> > (instead of doing back-off)?
> >
> > For instance, with the n-gram: "the  house  in", I would
> > like the decoder to assign it the probability of the phrase: "the
> > house in" (existing in the LM).
> >
> > Do I need more options/declarations in moses.ini file?
> >
> > Any help is very much appreciated,
> >
> > Best,
> > Quang
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu 
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-14 Thread Kenneth Heafield

Hi,

I think oov-feature=1 just activates the OOV count feature while
leaving LM score unchanged.  So it would still include p( | in).

One might try setting the OOV feature weight to -weight_LM *
weird_moses_internal_constant * log p() in an attempt to cancel out
the log p() terms.  However that won't work either because:

1) It will still charge backoff penalties, b(the)b(house) in the example.

2) The context will be lost each time so it's p(house) not p(house | the).

If the s follow a pattern, such as appearing every other word, one
could insert them into the ARPA file though that would waste memory.

I don't think there's any way to accomplish exactly what OP asked for
without coding (though it wouldn't be that hard once one understands how
the LM infrastructure works).

Kenneth

On 01/14/2016 11:07 PM, Philipp Koehn wrote:
> Hi,
> 
> You may get the behavior you want by adding
>   "oov-feature=1"
> to your LM specification line in moses.ini
> and also add a second weight with value "0" to the corresponding LM
> weight setting.
> 
> This will then only use the scores
> p(the|)
> p(house|,the,) ---> backoff to p(house)
> p(in|,the,,house,) ---> backoff to p(in)
> 
> -phi
> 
> On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang
> mailto:quangngoclu...@gmail.com>> wrote:
> 
> Dear All,
> 
> I am currently using a SRILM Language Model (LM) in my Moses
> decoder. Does anyone know how can I ask the decoder, at the decoding
> time, skip all out-of-vocabulary words when computing the LM score
> (instead of doing back-off)?
> 
> For instance, with the n-gram: "the  house  in", I would
> like the decoder to assign it the probability of the phrase: "the
> house in" (existing in the LM).
> 
> Do I need more options/declarations in moses.ini file?
> 
> Any help is very much appreciated,
> 
> Best,
> Quang
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu 
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> 
> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

2016-01-14 Thread Philipp Koehn

Hi,

You may get the behavior you want by adding
  "oov-feature=1"
to your LM specification line in moses.ini
and also add a second weight with value "0" to the corresponding LM weight
setting.

This will then only use the scores
p(the|)
p(house|,the,) ---> backoff to p(house)
p(in|,the,,house,) ---> backoff to p(in)

-phi

On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang 
wrote:

> Dear All,
>
> I am currently using a SRILM Language Model (LM) in my Moses decoder. Does
> anyone know how can I ask the decoder, at the decoding time, skip all
> out-of-vocabulary words when computing the LM score (instead of doing
> back-off)?
>
> For instance, with the n-gram: "the  house  in", I would like
> the decoder to assign it the probability of the phrase: "the house in"
> (existing in the LM).
>
> Do I need more options/declarations in moses.ini file?
>
> Any help is very much appreciated,
>
> Best,
> Quang
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

Re: [Moses-support] Skip OOV when computing Language Model score

9 matches

Site Navigation

Mail list logo

Footer information