Hi Ergun: I think the -skipoovs option would just drop all the n-gram scores that has OOV in it, rather than using a skip-ngram LM model.
Easy way to test it is just run it with that option to calculate log prob on a sentence with OOV, and it should result in a rather high score. Please correct me if I'm wrong... 2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>: > > Dear Jie, > > There may be some option from SRILM: > - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html > - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html: > * -skipoovs* > Instruct the LM to skip over contexts that contain out-of-vocabulary > words, instead of using a backoff strategy in these cases. > > if it is not there maybe for a reason... > > Bing appears fast to index this thread: > http://comments.gmane.org/gmane.comp.nlp.moses.user/14570 > > > *Best Regards,* > Ergun > > Ergun Biçici > DFKI Projektbüro Berlin > > > On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.ji...@gmail.com> > wrote: > >> Hi Ergun: >> >> The original request in Quang's post was: >> >> *For instance, with the n-gram: "the <unk> house <unk> in", I would like >> the decoder to assign it the probability of the phrase: "the house in" >> (existing in the LM).* >> >> so each time there is a <unk> when calculating the LM score, you need to >> look another word further. >> >> I believe that it cannot be achieved on current LM tools without >> modifying the source code, which has already been clarified by Kenneth. >> >> >> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>: >> >>> >>> Dear Kenneth, >>> >>> In the Moses manual, -drop-unknown switch is mentioned: >>> >>> 4.7.2 >>> Handling Unknown Words >>> Unknown words are copied verbatim to the output. They are also scored by >>> the language >>> model, and may be placed out of order. Alternatively, you may want to >>> drop unknown words. >>> To do so add the switch -drop-unknown. >>> >>> Alternatively, you can write a script that replaces all OOV tokens >>> with some OOV-token-identifier such as <unk> before sending for >>> translation. >>> >>> >>> *Best Regards,* >>> Ergun >>> >>> Ergun Biçici >>> DFKI Projektbüro Berlin >>> >>> >>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <mo...@kheafield.com> >>> wrote: >>> >>>> Hi, >>>> >>>> I think oov-feature=1 just activates the OOV count feature while >>>> leaving LM score unchanged. So it would still include p(<unk> | in). >>>> >>>> One might try setting the OOV feature weight to -weight_LM * >>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel out >>>> the log p(<unk>) terms. However that won't work either because: >>>> >>>> 1) It will still charge backoff penalties, b(the)b(house) in the >>>> example. >>>> >>>> 2) The context will be lost each time so it's p(house) not p(house | >>>> the). >>>> >>>> If the <unk>s follow a pattern, such as appearing every other word, one >>>> could insert them into the ARPA file though that would waste memory. >>>> >>>> I don't think there's any way to accomplish exactly what OP asked for >>>> without coding (though it wouldn't be that hard once one understands how >>>> the LM infrastructure works). >>>> >>>> Kenneth >>>> >>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote: >>>> > Hi, >>>> > >>>> > You may get the behavior you want by adding >>>> > "oov-feature=1" >>>> > to your LM specification line in moses.ini >>>> > and also add a second weight with value "0" to the corresponding LM >>>> > weight setting. >>>> > >>>> > This will then only use the scores >>>> > p(the|<s>) >>>> > p(house|<s>,the,<unk>) ---> backoff to p(house) >>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in) >>>> > >>>> > -phi >>>> > >>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang >>>> > <quangngoclu...@gmail.com <mailto:quangngoclu...@gmail.com>> wrote: >>>> > >>>> > Dear All, >>>> > >>>> > I am currently using a SRILM Language Model (LM) in my Moses >>>> > decoder. Does anyone know how can I ask the decoder, at the >>>> decoding >>>> > time, skip all out-of-vocabulary words when computing the LM score >>>> > (instead of doing back-off)? >>>> > >>>> > For instance, with the n-gram: "the <unk> house <unk> in", I would >>>> > like the decoder to assign it the probability of the phrase: "the >>>> > house in" (existing in the LM). >>>> > >>>> > Do I need more options/declarations in moses.ini file? >>>> > >>>> > Any help is very much appreciated, >>>> > >>>> > Best, >>>> > Quang >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Moses-support mailing list >>>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > >>>> > >>>> > >>>> > >>>> > _______________________________________________ >>>> > Moses-support mailing list >>>> > Moses-support@mit.edu >>>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>>> > >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> -- >> >> Best regards! >> >> Jie Jiang >> >> > -- Best regards! Jie Jiang
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support