Hieu, thanks a million! Mikel
2013/8/8, Hieu Hoang <hieu.ho...@ed.ac.uk>: > I'm not sure exactly what is hinted in Philipp's book. Fractional count is > implemented for SCFG extraction in: > > https://github.com/moses-smt/mosesdecoder/blob/master/phrase-extract/extract-rules-main.cpp > It's even on by default. It may be a reimplementation of someone else's > fractional count formula. > > It's not implemented in phrase-based extraction. > > > On 7 August 2013 16:17, Mikel L. Forcada <m...@dlsi.ua.es> wrote: > >> Hi. >> >> I have a question which is not directly about Moses but more generally >> about >> phrase extraction in phrase-based statistical machine translation. I hope >> it is >> not considered off-topic! I haven't been able to easily locate a >> satisfactory >> answer. >> >> In state-of-the-art phrase-based machine translation, once the sentence >> pair has >> been aligned, all possible phrase pairs are extracted and it is assumed >> that all >> of them have exactly been seen once. Counts are collected for all >> sentence >> pairs >> in the training corpus and then used to compute a crude estimate of >> translation >> probability Phi(f|e) — in Philipp Koehn's book 'Statistical Machine >> Translation', p. 136, eq. (5.4). I was thinking about the possibility >> that >> Philipp himself hints at after this equation, that is, considering each >> possible >> segmentation completely (perfectly) covering *both* the source sentence >> and the >> target sentence, counting how many such complete coverings there are for >> that >> sentence pair, considering all of them equally likely, and assigning the >> corresponding "fractional counts" to the phrase pairs used in each >> covering, and >> then using the fractional counts to obtain a better estimate of Phi(f|e) >> (which >> could be iteratively refined by using it to estimate the likelihood of >> each >> covering, in a sort of "poor man's" expectation maximization, more crude >> than >> the alignment-less "rich man's" EM phrase extraction by Marcu and Wong >> (2002) or >> the alignment-constrained EM phrase extraction by Birch, Callison-Burch >> and >> Koehn (2006)). >> >> The "fractional counts" idea looks like somehting that could be easily >> done but >> before I explore the idea further I would appreciate it very much if >> someone in >> this list could tell me if it has been done. >> >> Thanks a million! >> >> Mikel >> >> Mikel L. Forcada <m...@dlsi.ua.es> >> Dept. Llenguatges i Sistemes Informàtics >> Universitat d\\\'Alacant, E-03071 Alacant (Spain) >> Tel.: +34 96 590 9776 Fax: +34 96 590 9326 >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> > > > > -- > Hieu Hoang > Research Associate > University of Edinburgh > http://www.hoang.co.uk/hieu > -- Mikel L. Forcada E-mail: m...@dlsi.ua.es Departament de Llenguatges Phone: +34-96-590-9776 i Sistemes Informàtics also +34-96-590-3772. UNIVERSITAT D'ALACANT Fax: +34-96-590-9326, -3464 E-03071 ALACANT, Spain. URL: http://www.dlsi.ua.es/~mlf _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support