Re: [Moses-support] A question about phrase extraction

Mikel L. Forcada Thu, 08 Aug 2013 06:42:10 -0700

Hieu,
thanks a million!

Mikel


2013/8/8, Hieu Hoang <hieu.ho...@ed.ac.uk>:
> I'm not sure exactly what is hinted in Philipp's book. Fractional count is
> implemented for SCFG extraction in:
>
> https://github.com/moses-smt/mosesdecoder/blob/master/phrase-extract/extract-rules-main.cpp
> It's even on by default. It may be a reimplementation of someone else's
> fractional count formula.
>
> It's not implemented in phrase-based extraction.
>
>
> On 7 August 2013 16:17, Mikel L. Forcada <m...@dlsi.ua.es> wrote:
>
>> Hi.
>>
>> I have a question which is not directly about Moses but more generally
>> about
>> phrase extraction in phrase-based statistical machine translation. I hope
>> it is
>> not considered off-topic! I haven't been able to easily locate a
>> satisfactory
>> answer.
>>
>> In state-of-the-art phrase-based machine translation, once the sentence
>> pair has
>> been aligned, all possible phrase pairs are extracted and it is assumed
>> that all
>> of them have exactly been seen once. Counts are collected for all
>> sentence
>> pairs
>> in the training corpus and then used to compute a crude estimate of
>> translation
>> probability Phi(f|e)  — in Philipp Koehn's book 'Statistical Machine
>> Translation', p. 136, eq. (5.4). I was thinking about the possibility
>> that
>> Philipp himself hints at after this equation, that is, considering each
>> possible
>> segmentation completely (perfectly) covering *both* the source sentence
>> and the
>> target sentence, counting how many such complete coverings there are for
>> that
>> sentence pair, considering all of them equally likely, and assigning the
>> corresponding "fractional counts" to the phrase pairs used in each
>> covering, and
>> then using the fractional counts to obtain a better estimate of Phi(f|e)
>> (which
>> could be iteratively refined by using it to estimate the likelihood of
>> each
>> covering, in a sort of "poor man's" expectation maximization, more crude
>> than
>> the alignment-less "rich man's" EM phrase extraction by Marcu and Wong
>> (2002) or
>> the alignment-constrained EM phrase extraction by Birch, Callison-Burch
>> and
>> Koehn (2006)).
>>
>> The "fractional counts" idea looks like somehting that could be easily
>> done but
>> before I explore the idea further I would appreciate it very much if
>> someone in
>> this list could tell me if it has been done.
>>
>> Thanks a million!
>>
>> Mikel
>>
>> Mikel L. Forcada <m...@dlsi.ua.es>
>> Dept. Llenguatges i Sistemes Informàtics
>> Universitat d\\\'Alacant, E-03071 Alacant (Spain)
>> Tel.: +34 96 590 9776    Fax: +34 96 590 9326
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>
>
>
> --
> Hieu Hoang
> Research Associate
> University of Edinburgh
> http://www.hoang.co.uk/hieu
>


-- 
Mikel L. Forcada                    E-mail: m...@dlsi.ua.es
Departament de Llenguatges          Phone: +34-96-590-9776
i Sistemes Informàtics                also +34-96-590-3772.
UNIVERSITAT D'ALACANT               Fax:   +34-96-590-9326, -3464
E-03071 ALACANT, Spain.

URL: http://www.dlsi.ua.es/~mlf

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] A question about phrase extraction

Reply via email to