No comment.
*Best Regards,* Ergun Ergun Biçici DFKI Projektbüro Berlin On Fri, Jan 15, 2016 at 4:20 PM, Jie Jiang <mail.jie.ji...@gmail.com> wrote: > Hi Ergun: > > I think the -skipoovs option would just drop all the n-gram scores that > has OOV in it, rather than using a skip-ngram LM model. > > Easy way to test it is just run it with that option to calculate log prob > on a sentence with OOV, and it should result in a rather high score. > > Please correct me if I'm wrong... > > 2016-01-15 14:07 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>: > >> >> Dear Jie, >> >> There may be some option from SRILM: >> - http://www.speech.sri.com/pipermail/srilm-user/2013q2/001509.html >> - http://www.speech.sri.com/projects/srilm/manpages/ngram.1.html: >> * -skipoovs* >> Instruct the LM to skip over contexts that contain out-of-vocabulary >> words, instead of using a backoff strategy in these cases. >> >> if it is not there maybe for a reason... >> >> Bing appears fast to index this thread: >> http://comments.gmane.org/gmane.comp.nlp.moses.user/14570 >> >> >> *Best Regards,* >> Ergun >> >> Ergun Biçici >> DFKI Projektbüro Berlin >> >> >> On Fri, Jan 15, 2016 at 2:37 PM, Jie Jiang <mail.jie.ji...@gmail.com> >> wrote: >> >>> Hi Ergun: >>> >>> The original request in Quang's post was: >>> >>> *For instance, with the n-gram: "the <unk> house <unk> in", I would like >>> the decoder to assign it the probability of the phrase: "the house in" >>> (existing in the LM).* >>> >>> so each time there is a <unk> when calculating the LM score, you need to >>> look another word further. >>> >>> I believe that it cannot be achieved on current LM tools without >>> modifying the source code, which has already been clarified by Kenneth. >>> >>> >>> 2016-01-15 13:20 GMT+00:00 Ergun Bicici <ergun.bic...@dfki.de>: >>> >>>> >>>> Dear Kenneth, >>>> >>>> In the Moses manual, -drop-unknown switch is mentioned: >>>> >>>> 4.7.2 >>>> Handling Unknown Words >>>> Unknown words are copied verbatim to the output. They are also scored >>>> by the language >>>> model, and may be placed out of order. Alternatively, you may want to >>>> drop unknown words. >>>> To do so add the switch -drop-unknown. >>>> >>>> Alternatively, you can write a script that replaces all OOV tokens >>>> with some OOV-token-identifier such as <unk> before sending for >>>> translation. >>>> >>>> >>>> *Best Regards,* >>>> Ergun >>>> >>>> Ergun Biçici >>>> DFKI Projektbüro Berlin >>>> >>>> >>>> On Fri, Jan 15, 2016 at 12:22 AM, Kenneth Heafield <mo...@kheafield.com >>>> > wrote: >>>> >>>>> Hi, >>>>> >>>>> I think oov-feature=1 just activates the OOV count feature >>>>> while >>>>> leaving LM score unchanged. So it would still include p(<unk> | in). >>>>> >>>>> One might try setting the OOV feature weight to -weight_LM * >>>>> weird_moses_internal_constant * log p(<unk>) in an attempt to cancel >>>>> out >>>>> the log p(<unk>) terms. However that won't work either because: >>>>> >>>>> 1) It will still charge backoff penalties, b(the)b(house) in the >>>>> example. >>>>> >>>>> 2) The context will be lost each time so it's p(house) not p(house | >>>>> the). >>>>> >>>>> If the <unk>s follow a pattern, such as appearing every other word, one >>>>> could insert them into the ARPA file though that would waste memory. >>>>> >>>>> I don't think there's any way to accomplish exactly what OP asked for >>>>> without coding (though it wouldn't be that hard once one understands >>>>> how >>>>> the LM infrastructure works). >>>>> >>>>> Kenneth >>>>> >>>>> On 01/14/2016 11:07 PM, Philipp Koehn wrote: >>>>> > Hi, >>>>> > >>>>> > You may get the behavior you want by adding >>>>> > "oov-feature=1" >>>>> > to your LM specification line in moses.ini >>>>> > and also add a second weight with value "0" to the corresponding LM >>>>> > weight setting. >>>>> > >>>>> > This will then only use the scores >>>>> > p(the|<s>) >>>>> > p(house|<s>,the,<unk>) ---> backoff to p(house) >>>>> > p(in|<s>,the,<unk>,house,<unk>) ---> backoff to p(in) >>>>> > >>>>> > -phi >>>>> > >>>>> > On Thu, Jan 14, 2016 at 8:25 AM, LUONG NGOC Quang >>>>> > <quangngoclu...@gmail.com <mailto:quangngoclu...@gmail.com>> wrote: >>>>> > >>>>> > Dear All, >>>>> > >>>>> > I am currently using a SRILM Language Model (LM) in my Moses >>>>> > decoder. Does anyone know how can I ask the decoder, at the >>>>> decoding >>>>> > time, skip all out-of-vocabulary words when computing the LM >>>>> score >>>>> > (instead of doing back-off)? >>>>> > >>>>> > For instance, with the n-gram: "the <unk> house <unk> in", I >>>>> would >>>>> > like the decoder to assign it the probability of the phrase: "the >>>>> > house in" (existing in the LM). >>>>> > >>>>> > Do I need more options/declarations in moses.ini file? >>>>> > >>>>> > Any help is very much appreciated, >>>>> > >>>>> > Best, >>>>> > Quang >>>>> > >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Moses-support mailing list >>>>> > Moses-support@mit.edu <mailto:Moses-support@mit.edu> >>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > _______________________________________________ >>>>> > Moses-support mailing list >>>>> > Moses-support@mit.edu >>>>> > http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> > >>>>> _______________________________________________ >>>>> Moses-support mailing list >>>>> Moses-support@mit.edu >>>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Moses-support mailing list >>>> Moses-support@mit.edu >>>> http://mailman.mit.edu/mailman/listinfo/moses-support >>>> >>>> >>> >>> >>> -- >>> >>> Best regards! >>> >>> Jie Jiang >>> >>> >> > > > -- > > Best regards! > > Jie Jiang > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support