Re: [Moses-support] Phrase probabilities

Lane Schwartz Tue, 20 Sep 2011 13:22:04 -0700

Taylor,

If you don't have a background in NLP or CL (or even if you do), I
highly recommend taking a look at Philipp's book "Statistical Machine
Translation"


I hope this doesn't come across as RTFM. That's not what I mean. :)

Cheers,
Lane

On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose
<tr...@languageintelligence.com> wrote:
> What would happen if I just multiplied the Direct Phrase Translation
> probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems
> like it would work? Sorry if I'm asking dumb questions. I come from the
> computational side of computational linguistics. I'm learning as fast as
> I can.
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
> On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote:
>> Taylor Rose wrote:
>>
>> > So what exactly can I infer from the metrics in the phrase table? I want
>> > to be able to compare phrases to each other. From my experience,
>> > multiplying them and sorting by that number has given me more accurate
>> > phrases... Obviously calling that metric "probability" is wrong. My
>> > question is: What is that metric best indicative of?
>>
>> That product has no principled interpretation that I can think of.  Phrase 
>> pairs with high values on all four features will obviously have high value 
>> products, but that's only interesting because all the features happen to be 
>> roughly monotonic in phrase quality.  If you wanted a more principled way to 
>> rank the phrases, I'd just use the MERT weights for those features, and 
>> combine them with a dot product.
>>
>> Pre-filtering the phrase table is something lots of people have looked at, 
>> and there are many approaches to this.  I like this paper:
>>
>>   Improving Translation Quality by Discarding Most of the Phrasetable
>>   Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland
>>   
>> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542
>>
>> - JB
>>
>> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> >> exactly,  the only correct way to get real probabilities out would be
>> >> to compute the normalising constant and renormalise the dot products
>> >> for each phrase pair.
>> >>
>> >> remember that this is best thought of as a set of scores, weighted
>> >> such that the relative proportions of each model are balanced
>> >>
>> >> Miles
>> >>
>> >> On 20 September 2011 16:07, Burger, John D. <j...@mitre.org> wrote:
>> >>> Taylor Rose wrote:
>> >>>
>> >>>> I am looking at pruning phrase tables for the experiment I'm working on.
>> >>>> I'm not sure if it would be a good idea to include the 'penalty' metric
>> >>>> when calculating probability. It is my understanding that multiplying 4
>> >>>> or 5 of the metrics from the phrase table would result in a probability
>> >>>> of the phrase being correct. Is this a good understanding or am I
>> >>>> missing something?
>> >>>
>> >>> I don't think this is correct.  At runtime all the features from the 
>> >>> phrase table and a number of other features, some only available during 
>> >>> decoding, are combined in an inner product with a weight vector to score 
>> >>> partial translations.  I believe it's fair to say that at no point is 
>> >>> there an explicit modeling of "a probability of the phrase being 
>> >>> correct", at least not in isolation from the partially translated 
>> >>> sentence.  This is not to say you couldn't model this yourself, of 
>> >>> course.
>> >>>
>> >>> - John Burger
>> >>> MITRE
>> >>> _______________________________________________
>> >>> Moses-support mailing list
>> >>> Moses-support@mit.edu
>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Phrase probabilities

Reply via email to