some terminology:  these are feature values, not metrics.

feature values have a number of roles to play eg P(e | f) indicates
the chance that phrase e should be the translation of phrase f.  these
values are designed to be used together, and weighted to produce an
overall score for a translation choice.  this is the basis of a
log-linear model.

if you take them all and multiply them together then I guess that is
equivalent to assuming each is equally weighted and that you have
something like the geometric mean of them (a product of logs, without
the divisor).  you may well be able to use the scores in the way you
suggest, but whether you have `good' or `bad' results will be by
chance.

if you want to prune the phrase table then a starting point is here:

http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16

Miles

On 20 September 2011 16:47, Taylor Rose <tr...@languageintelligence.com> wrote:
> So what exactly can I infer from the metrics in the phrase table? I want
> to be able to compare phrases to each other. From my experience,
> multiplying them and sorting by that number has given me more accurate
> phrases... Obviously calling that metric "probability" is wrong. My
> question is: What is that metric best indicative of?
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
> On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> exactly,  the only correct way to get real probabilities out would be
>> to compute the normalising constant and renormalise the dot products
>> for each phrase pair.
>>
>> remember that this is best thought of as a set of scores, weighted
>> such that the relative proportions of each model are balanced
>>
>> Miles
>>
>> On 20 September 2011 16:07, Burger, John D. <j...@mitre.org> wrote:
>> > Taylor Rose wrote:
>> >
>> >> I am looking at pruning phrase tables for the experiment I'm working on.
>> >> I'm not sure if it would be a good idea to include the 'penalty' metric
>> >> when calculating probability. It is my understanding that multiplying 4
>> >> or 5 of the metrics from the phrase table would result in a probability
>> >> of the phrase being correct. Is this a good understanding or am I
>> >> missing something?
>> >
>> > I don't think this is correct.  At runtime all the features from the 
>> > phrase table and a number of other features, some only available during 
>> > decoding, are combined in an inner product with a weight vector to score 
>> > partial translations.  I believe it's fair to say that at no point is 
>> > there an explicit modeling of "a probability of the phrase being correct", 
>> > at least not in isolation from the partially translated sentence.  This is 
>> > not to say you couldn't model this yourself, of course.
>> >
>> > - John Burger
>> >  MITRE
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>>
>>
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to