Re: [Moses-support] Phrase probabilities
on a related note, you don't even have to use probabilities as features in the phrase-table. for instance, using counts(e|f) and counts(f|e), instead of p(e|f) and p(f|e) gives ok translation. The features really are just scores. using probabilities: devtest2006: 27.55 BLEU-c ; 28.29 BLEU nc-dev2007: 22.26 BLEU-c ; 23.46 BLEU avg: 24.91 BLEU-c ; 25.88 BLEU using counts: devtest2006: 27.36 BLEU-c ; 28.11 BLEU nc-dev2007: 21.64 BLEU-c ; 22.90 BLEU avg: 24.50 BLEU-c ; 25.51 BLEU On 20/09/2011 22:14, Miles Osborne wrote: > exactly, the only correct way to get real probabilities out would be > to compute the normalising constant and renormalise the dot products > for each phrase pair. > > remember that this is best thought of as a set of scores, weighted > such that the relative proportions of each model are balanced > > Miles > > On 20 September 2011 16:07, Burger, John D. wrote: >> Taylor Rose wrote: >> >>> I am looking at pruning phrase tables for the experiment I'm working on. >>> I'm not sure if it would be a good idea to include the 'penalty' metric >>> when calculating probability. It is my understanding that multiplying 4 >>> or 5 of the metrics from the phrase table would result in a probability >>> of the phrase being correct. Is this a good understanding or am I >>> missing something? >> I don't think this is correct. At runtime all the features from the phrase >> table and a number of other features, some only available during decoding, >> are combined in an inner product with a weight vector to score partial >> translations. I believe it's fair to say that at no point is there an >> explicit modeling of "a probability of the phrase being correct", at least >> not in isolation from the partially translated sentence. This is not to say >> you couldn't model this yourself, of course. >> >> - John Burger >> MITRE >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
The weights are in the Moses config file that is produced by the MERT scripts. Cheers, Lane On Wed, Sep 21, 2011 at 9:45 AM, Taylor Rose wrote: > Thanks for the information Kevin. Where would I find these feature > weights? I've found files in Moses that I suspect might be the weights > but they're not labeled and the file/directory names don't really help > either. > -- > Taylor Rose > Machine Translation Intern > Language Intelligence > IRC: Handle: trose > Server: freenode > > > On Tue, 2011-09-20 at 23:32 -0400, Kevin Gimpel wrote: >> Hey Taylor, >> Sounds like you are trying to come up with a simple heuristic for >> scoring phrase table entries for purposes of pruning. Many choices are >> possible here, so it's good to check the literature as folks mentioned >> above. But as far as I know there's no single optimal answer for this. >> Typically researchers try a few things and use the approach that gives >> the best results on the task at hand. But while there's no single >> correct answer, here are some suggestions: >> If you have trained weights for the features, you should definitely >> use those weights (as Miles suggested). So this would involve >> computing the dot product of the features and weights as follows: >> score(f, e) = \theta_1 * log(p(e | f)) + \theta_2 * log(lex(e | f)) + >> \theta_3 * log(p(f | e)) + \theta_4 * log(lex(f | e)) >> where the thetas are the learned weights for each of the phrase table >> features. >> Note that the phrase table typically stores the feature values as >> probabilities, and Moses takes logs internally before computing the >> dot product. So you should take logs yourself before multiplying by >> the feature weights. >> If you don't have feature weights, using uniform weights is >> reasonable. >> And regarding your original question above: since the phrase penalty >> feature has the same value for all phrase pairs, it shouldn't affect >> pruning, right? >> HTH, >> Kevin >> >> On Tue, Sep 20, 2011 at 4:21 PM, Lane Schwartz >> wrote: >> Taylor, >> >> If you don't have a background in NLP or CL (or even if you >> do), I >> highly recommend taking a look at Philipp's book "Statistical >> Machine >> Translation" >> >> I hope this doesn't come across as RTFM. That's not what I >> mean. :) >> >> Cheers, >> Lane >> >> >> On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose >> wrote: >> > What would happen if I just multiplied the Direct Phrase >> Translation >> > probability φ(e|f) by the Direct Lexical weight Lex(e|f)? >> That seems >> > like it would work? Sorry if I'm asking dumb questions. I >> come from the >> > computational side of computational linguistics. I'm >> learning as fast as >> > I can. >> > -- >> > Taylor Rose >> > Machine Translation Intern >> > Language Intelligence >> > IRC: Handle: trose >> > Server: freenode >> > >> > >> > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote: >> >> Taylor Rose wrote: >> >> >> >> > So what exactly can I infer from the metrics in the >> phrase table? I want >> >> > to be able to compare phrases to each other. From my >> experience, >> >> > multiplying them and sorting by that number has given me >> more accurate >> >> > phrases... Obviously calling that metric "probability" is >> wrong. My >> >> > question is: What is that metric best indicative of? >> >> >> >> That product has no principled interpretation that I can >> think of. Phrase pairs with high values on all four features >> will obviously have high value products, but that's only >> interesting because all the features happen to be roughly >> monotonic in phrase quality. If you wanted a more principled >> way to rank the phrases, I'd just use the MERT weights for >> those features, and combine them with a dot product. >> >> >> >> Pre-filtering the phrase table is something lots of people >> have looked at, and there are many approaches to this. I like >> this paper: >> >> >> >> Improving Translation Quality by Discarding Most of the >> Phrasetable >> >> Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, >> Roland >> >> >> >> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 >> >> >> >> - JB >> >> >> >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: >> >> >> exactly, the only correct way to get real probabilities >> out would be >> >> >> to compute the normalising constant and renormalise the >> dot products >> >> >> for each phrase pair. >>
Re: [Moses-support] Phrase probabilities
Thanks for the information Kevin. Where would I find these feature weights? I've found files in Moses that I suspect might be the weights but they're not labeled and the file/directory names don't really help either. -- Taylor Rose Machine Translation Intern Language Intelligence IRC: Handle: trose Server: freenode On Tue, 2011-09-20 at 23:32 -0400, Kevin Gimpel wrote: > Hey Taylor, > Sounds like you are trying to come up with a simple heuristic for > scoring phrase table entries for purposes of pruning. Many choices are > possible here, so it's good to check the literature as folks mentioned > above. But as far as I know there's no single optimal answer for this. > Typically researchers try a few things and use the approach that gives > the best results on the task at hand. But while there's no single > correct answer, here are some suggestions: > If you have trained weights for the features, you should definitely > use those weights (as Miles suggested). So this would involve > computing the dot product of the features and weights as follows: > score(f, e) = \theta_1 * log(p(e | f)) + \theta_2 * log(lex(e | f)) + > \theta_3 * log(p(f | e)) + \theta_4 * log(lex(f | e)) > where the thetas are the learned weights for each of the phrase table > features. > Note that the phrase table typically stores the feature values as > probabilities, and Moses takes logs internally before computing the > dot product. So you should take logs yourself before multiplying by > the feature weights. > If you don't have feature weights, using uniform weights is > reasonable. > And regarding your original question above: since the phrase penalty > feature has the same value for all phrase pairs, it shouldn't affect > pruning, right? > HTH, > Kevin > > On Tue, Sep 20, 2011 at 4:21 PM, Lane Schwartz > wrote: > Taylor, > > If you don't have a background in NLP or CL (or even if you > do), I > highly recommend taking a look at Philipp's book "Statistical > Machine > Translation" > > I hope this doesn't come across as RTFM. That's not what I > mean. :) > > Cheers, > Lane > > > On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose > wrote: > > What would happen if I just multiplied the Direct Phrase > Translation > > probability φ(e|f) by the Direct Lexical weight Lex(e|f)? > That seems > > like it would work? Sorry if I'm asking dumb questions. I > come from the > > computational side of computational linguistics. I'm > learning as fast as > > I can. > > -- > > Taylor Rose > > Machine Translation Intern > > Language Intelligence > > IRC: Handle: trose > > Server: freenode > > > > > > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote: > >> Taylor Rose wrote: > >> > >> > So what exactly can I infer from the metrics in the > phrase table? I want > >> > to be able to compare phrases to each other. From my > experience, > >> > multiplying them and sorting by that number has given me > more accurate > >> > phrases... Obviously calling that metric "probability" is > wrong. My > >> > question is: What is that metric best indicative of? > >> > >> That product has no principled interpretation that I can > think of. Phrase pairs with high values on all four features > will obviously have high value products, but that's only > interesting because all the features happen to be roughly > monotonic in phrase quality. If you wanted a more principled > way to rank the phrases, I'd just use the MERT weights for > those features, and combine them with a dot product. > >> > >> Pre-filtering the phrase table is something lots of people > have looked at, and there are many approaches to this. I like > this paper: > >> > >> Improving Translation Quality by Discarding Most of the > Phrasetable > >> Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, > Roland > >> > > http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 > >> > >> - JB > >> > >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: > >> >> exactly, the only correct way to get real probabilities > out would be > >> >> to compute the normalising constant and renormalise the > dot products > >> >> for each phrase pair. > >> >> > >> >> remember that this is best thought of as a set of > scores, weighted > >> >> such that the relative proportions of each model are > balanced > >> >> > >> >> M
Re: [Moses-support] Phrase probabilities
Hey Taylor, Sounds like you are trying to come up with a simple heuristic for scoring phrase table entries for purposes of pruning. Many choices are possible here, so it's good to check the literature as folks mentioned above. But as far as I know there's no single optimal answer for this. Typically researchers try a few things and use the approach that gives the best results on the task at hand. But while there's no single correct answer, here are some suggestions: If you have trained weights for the features, you should definitely use those weights (as Miles suggested). So this would involve computing the dot product of the features and weights as follows: score(f, e) = \theta_1 * log(p(e | f)) + \theta_2 * log(lex(e | f)) + \theta_3 * log(p(f | e)) + \theta_4 * log(lex(f | e)) where the thetas are the learned weights for each of the phrase table features. Note that the phrase table typically stores the feature values as probabilities, and Moses takes logs internally before computing the dot product. So you should take logs yourself before multiplying by the feature weights. If you don't have feature weights, using uniform weights is reasonable. And regarding your original question above: since the phrase penalty feature has the same value for all phrase pairs, it shouldn't affect pruning, right? HTH, Kevin On Tue, Sep 20, 2011 at 4:21 PM, Lane Schwartz wrote: > Taylor, > > If you don't have a background in NLP or CL (or even if you do), I > highly recommend taking a look at Philipp's book "Statistical Machine > Translation" > > I hope this doesn't come across as RTFM. That's not what I mean. :) > > Cheers, > Lane > > On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose > wrote: > > What would happen if I just multiplied the Direct Phrase Translation > > probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems > > like it would work? Sorry if I'm asking dumb questions. I come from the > > computational side of computational linguistics. I'm learning as fast as > > I can. > > -- > > Taylor Rose > > Machine Translation Intern > > Language Intelligence > > IRC: Handle: trose > > Server: freenode > > > > > > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote: > >> Taylor Rose wrote: > >> > >> > So what exactly can I infer from the metrics in the phrase table? I > want > >> > to be able to compare phrases to each other. From my experience, > >> > multiplying them and sorting by that number has given me more accurate > >> > phrases... Obviously calling that metric "probability" is wrong. My > >> > question is: What is that metric best indicative of? > >> > >> That product has no principled interpretation that I can think of. > Phrase pairs with high values on all four features will obviously have high > value products, but that's only interesting because all the features happen > to be roughly monotonic in phrase quality. If you wanted a more principled > way to rank the phrases, I'd just use the MERT weights for those features, > and combine them with a dot product. > >> > >> Pre-filtering the phrase table is something lots of people have looked > at, and there are many approaches to this. I like this paper: > >> > >> Improving Translation Quality by Discarding Most of the Phrasetable > >> Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland > >> > http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 > >> > >> - JB > >> > >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: > >> >> exactly, the only correct way to get real probabilities out would be > >> >> to compute the normalising constant and renormalise the dot products > >> >> for each phrase pair. > >> >> > >> >> remember that this is best thought of as a set of scores, weighted > >> >> such that the relative proportions of each model are balanced > >> >> > >> >> Miles > >> >> > >> >> On 20 September 2011 16:07, Burger, John D. wrote: > >> >>> Taylor Rose wrote: > >> >>> > >> I am looking at pruning phrase tables for the experiment I'm > working on. > >> I'm not sure if it would be a good idea to include the 'penalty' > metric > >> when calculating probability. It is my understanding that > multiplying 4 > >> or 5 of the metrics from the phrase table would result in a > probability > >> of the phrase being correct. Is this a good understanding or am I > >> missing something? > >> >>> > >> >>> I don't think this is correct. At runtime all the features from the > phrase table and a number of other features, some only available during > decoding, are combined in an inner product with a weight vector to score > partial translations. I believe it's fair to say that at no point is there > an explicit modeling of "a probability of the phrase being correct", at > least not in isolation from the partially translated sentence. This is not > to say you couldn't model this yourself, of course. > >> >>> > >> >>> - John Burger > >> >>> MITRE > >> >>>
Re: [Moses-support] Phrase probabilities
Taylor, If you don't have a background in NLP or CL (or even if you do), I highly recommend taking a look at Philipp's book "Statistical Machine Translation" I hope this doesn't come across as RTFM. That's not what I mean. :) Cheers, Lane On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose wrote: > What would happen if I just multiplied the Direct Phrase Translation > probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems > like it would work? Sorry if I'm asking dumb questions. I come from the > computational side of computational linguistics. I'm learning as fast as > I can. > -- > Taylor Rose > Machine Translation Intern > Language Intelligence > IRC: Handle: trose > Server: freenode > > > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote: >> Taylor Rose wrote: >> >> > So what exactly can I infer from the metrics in the phrase table? I want >> > to be able to compare phrases to each other. From my experience, >> > multiplying them and sorting by that number has given me more accurate >> > phrases... Obviously calling that metric "probability" is wrong. My >> > question is: What is that metric best indicative of? >> >> That product has no principled interpretation that I can think of. Phrase >> pairs with high values on all four features will obviously have high value >> products, but that's only interesting because all the features happen to be >> roughly monotonic in phrase quality. If you wanted a more principled way to >> rank the phrases, I'd just use the MERT weights for those features, and >> combine them with a dot product. >> >> Pre-filtering the phrase table is something lots of people have looked at, >> and there are many approaches to this. I like this paper: >> >> Improving Translation Quality by Discarding Most of the Phrasetable >> Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland >> >> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 >> >> - JB >> >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: >> >> exactly, the only correct way to get real probabilities out would be >> >> to compute the normalising constant and renormalise the dot products >> >> for each phrase pair. >> >> >> >> remember that this is best thought of as a set of scores, weighted >> >> such that the relative proportions of each model are balanced >> >> >> >> Miles >> >> >> >> On 20 September 2011 16:07, Burger, John D. wrote: >> >>> Taylor Rose wrote: >> >>> >> I am looking at pruning phrase tables for the experiment I'm working on. >> I'm not sure if it would be a good idea to include the 'penalty' metric >> when calculating probability. It is my understanding that multiplying 4 >> or 5 of the metrics from the phrase table would result in a probability >> of the phrase being correct. Is this a good understanding or am I >> missing something? >> >>> >> >>> I don't think this is correct. At runtime all the features from the >> >>> phrase table and a number of other features, some only available during >> >>> decoding, are combined in an inner product with a weight vector to score >> >>> partial translations. I believe it's fair to say that at no point is >> >>> there an explicit modeling of "a probability of the phrase being >> >>> correct", at least not in isolation from the partially translated >> >>> sentence. This is not to say you couldn't model this yourself, of >> >>> course. >> >>> >> >>> - John Burger >> >>> MITRE >> >>> ___ >> >>> Moses-support mailing list >> >>> Moses-support@mit.edu >> >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >>> >> >>> >> >> >> >> >> >> >> > >> > ___ >> > Moses-support mailing list >> > Moses-support@mit.edu >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> >> ___ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support > > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- When a place gets crowded enough to require ID's, social collapse is not far away. It is time to go elsewhere. The best thing about space travel is that it made it possible to go elsewhere. -- R.A. Heinlein, "Time Enough For Love" ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
What would happen if I just multiplied the Direct Phrase Translation probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems like it would work? Sorry if I'm asking dumb questions. I come from the computational side of computational linguistics. I'm learning as fast as I can. -- Taylor Rose Machine Translation Intern Language Intelligence IRC: Handle: trose Server: freenode On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote: > Taylor Rose wrote: > > > So what exactly can I infer from the metrics in the phrase table? I want > > to be able to compare phrases to each other. From my experience, > > multiplying them and sorting by that number has given me more accurate > > phrases... Obviously calling that metric "probability" is wrong. My > > question is: What is that metric best indicative of? > > That product has no principled interpretation that I can think of. Phrase > pairs with high values on all four features will obviously have high value > products, but that's only interesting because all the features happen to be > roughly monotonic in phrase quality. If you wanted a more principled way to > rank the phrases, I'd just use the MERT weights for those features, and > combine them with a dot product. > > Pre-filtering the phrase table is something lots of people have looked at, > and there are many approaches to this. I like this paper: > > Improving Translation Quality by Discarding Most of the Phrasetable > Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland > > http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 > > - JB > > > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: > >> exactly, the only correct way to get real probabilities out would be > >> to compute the normalising constant and renormalise the dot products > >> for each phrase pair. > >> > >> remember that this is best thought of as a set of scores, weighted > >> such that the relative proportions of each model are balanced > >> > >> Miles > >> > >> On 20 September 2011 16:07, Burger, John D. wrote: > >>> Taylor Rose wrote: > >>> > I am looking at pruning phrase tables for the experiment I'm working on. > I'm not sure if it would be a good idea to include the 'penalty' metric > when calculating probability. It is my understanding that multiplying 4 > or 5 of the metrics from the phrase table would result in a probability > of the phrase being correct. Is this a good understanding or am I > missing something? > >>> > >>> I don't think this is correct. At runtime all the features from the > >>> phrase table and a number of other features, some only available during > >>> decoding, are combined in an inner product with a weight vector to score > >>> partial translations. I believe it's fair to say that at no point is > >>> there an explicit modeling of "a probability of the phrase being > >>> correct", at least not in isolation from the partially translated > >>> sentence. This is not to say you couldn't model this yourself, of course. > >>> > >>> - John Burger > >>> MITRE > >>> ___ > >>> Moses-support mailing list > >>> Moses-support@mit.edu > >>> http://mailman.mit.edu/mailman/listinfo/moses-support > >>> > >>> > >> > >> > >> > > > > ___ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
some terminology: these are feature values, not metrics. feature values have a number of roles to play eg P(e | f) indicates the chance that phrase e should be the translation of phrase f. these values are designed to be used together, and weighted to produce an overall score for a translation choice. this is the basis of a log-linear model. if you take them all and multiply them together then I guess that is equivalent to assuming each is equally weighted and that you have something like the geometric mean of them (a product of logs, without the divisor). you may well be able to use the scores in the way you suggest, but whether you have `good' or `bad' results will be by chance. if you want to prune the phrase table then a starting point is here: http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16 Miles On 20 September 2011 16:47, Taylor Rose wrote: > So what exactly can I infer from the metrics in the phrase table? I want > to be able to compare phrases to each other. From my experience, > multiplying them and sorting by that number has given me more accurate > phrases... Obviously calling that metric "probability" is wrong. My > question is: What is that metric best indicative of? > -- > Taylor Rose > Machine Translation Intern > Language Intelligence > IRC: Handle: trose > Server: freenode > > > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: >> exactly, the only correct way to get real probabilities out would be >> to compute the normalising constant and renormalise the dot products >> for each phrase pair. >> >> remember that this is best thought of as a set of scores, weighted >> such that the relative proportions of each model are balanced >> >> Miles >> >> On 20 September 2011 16:07, Burger, John D. wrote: >> > Taylor Rose wrote: >> > >> >> I am looking at pruning phrase tables for the experiment I'm working on. >> >> I'm not sure if it would be a good idea to include the 'penalty' metric >> >> when calculating probability. It is my understanding that multiplying 4 >> >> or 5 of the metrics from the phrase table would result in a probability >> >> of the phrase being correct. Is this a good understanding or am I >> >> missing something? >> > >> > I don't think this is correct. At runtime all the features from the >> > phrase table and a number of other features, some only available during >> > decoding, are combined in an inner product with a weight vector to score >> > partial translations. I believe it's fair to say that at no point is >> > there an explicit modeling of "a probability of the phrase being correct", >> > at least not in isolation from the partially translated sentence. This is >> > not to say you couldn't model this yourself, of course. >> > >> > - John Burger >> > MITRE >> > ___ >> > Moses-support mailing list >> > Moses-support@mit.edu >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > >> >> >> > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
Taylor Rose wrote: > So what exactly can I infer from the metrics in the phrase table? I want > to be able to compare phrases to each other. From my experience, > multiplying them and sorting by that number has given me more accurate > phrases... Obviously calling that metric "probability" is wrong. My > question is: What is that metric best indicative of? That product has no principled interpretation that I can think of. Phrase pairs with high values on all four features will obviously have high value products, but that's only interesting because all the features happen to be roughly monotonic in phrase quality. If you wanted a more principled way to rank the phrases, I'd just use the MERT weights for those features, and combine them with a dot product. Pre-filtering the phrase table is something lots of people have looked at, and there are many approaches to this. I like this paper: Improving Translation Quality by Discarding Most of the Phrasetable Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542 - JB > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: >> exactly, the only correct way to get real probabilities out would be >> to compute the normalising constant and renormalise the dot products >> for each phrase pair. >> >> remember that this is best thought of as a set of scores, weighted >> such that the relative proportions of each model are balanced >> >> Miles >> >> On 20 September 2011 16:07, Burger, John D. wrote: >>> Taylor Rose wrote: >>> I am looking at pruning phrase tables for the experiment I'm working on. I'm not sure if it would be a good idea to include the 'penalty' metric when calculating probability. It is my understanding that multiplying 4 or 5 of the metrics from the phrase table would result in a probability of the phrase being correct. Is this a good understanding or am I missing something? >>> >>> I don't think this is correct. At runtime all the features from the phrase >>> table and a number of other features, some only available during decoding, >>> are combined in an inner product with a weight vector to score partial >>> translations. I believe it's fair to say that at no point is there an >>> explicit modeling of "a probability of the phrase being correct", at least >>> not in isolation from the partially translated sentence. This is not to >>> say you couldn't model this yourself, of course. >>> >>> - John Burger >>> MITRE >>> ___ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >>> >>> >> >> >> > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support smime.p7s Description: S/MIME cryptographic signature ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
So what exactly can I infer from the metrics in the phrase table? I want to be able to compare phrases to each other. From my experience, multiplying them and sorting by that number has given me more accurate phrases... Obviously calling that metric "probability" is wrong. My question is: What is that metric best indicative of? -- Taylor Rose Machine Translation Intern Language Intelligence IRC: Handle: trose Server: freenode On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote: > exactly, the only correct way to get real probabilities out would be > to compute the normalising constant and renormalise the dot products > for each phrase pair. > > remember that this is best thought of as a set of scores, weighted > such that the relative proportions of each model are balanced > > Miles > > On 20 September 2011 16:07, Burger, John D. wrote: > > Taylor Rose wrote: > > > >> I am looking at pruning phrase tables for the experiment I'm working on. > >> I'm not sure if it would be a good idea to include the 'penalty' metric > >> when calculating probability. It is my understanding that multiplying 4 > >> or 5 of the metrics from the phrase table would result in a probability > >> of the phrase being correct. Is this a good understanding or am I > >> missing something? > > > > I don't think this is correct. At runtime all the features from the phrase > > table and a number of other features, some only available during decoding, > > are combined in an inner product with a weight vector to score partial > > translations. I believe it's fair to say that at no point is there an > > explicit modeling of "a probability of the phrase being correct", at least > > not in isolation from the partially translated sentence. This is not to > > say you couldn't model this yourself, of course. > > > > - John Burger > > MITRE > > ___ > > Moses-support mailing list > > Moses-support@mit.edu > > http://mailman.mit.edu/mailman/listinfo/moses-support > > > > > > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
exactly, the only correct way to get real probabilities out would be to compute the normalising constant and renormalise the dot products for each phrase pair. remember that this is best thought of as a set of scores, weighted such that the relative proportions of each model are balanced Miles On 20 September 2011 16:07, Burger, John D. wrote: > Taylor Rose wrote: > >> I am looking at pruning phrase tables for the experiment I'm working on. >> I'm not sure if it would be a good idea to include the 'penalty' metric >> when calculating probability. It is my understanding that multiplying 4 >> or 5 of the metrics from the phrase table would result in a probability >> of the phrase being correct. Is this a good understanding or am I >> missing something? > > I don't think this is correct. At runtime all the features from the phrase > table and a number of other features, some only available during decoding, > are combined in an inner product with a weight vector to score partial > translations. I believe it's fair to say that at no point is there an > explicit modeling of "a probability of the phrase being correct", at least > not in isolation from the partially translated sentence. This is not to say > you couldn't model this yourself, of course. > > - John Burger > MITRE > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] Phrase probabilities
Taylor Rose wrote: > I am looking at pruning phrase tables for the experiment I'm working on. > I'm not sure if it would be a good idea to include the 'penalty' metric > when calculating probability. It is my understanding that multiplying 4 > or 5 of the metrics from the phrase table would result in a probability > of the phrase being correct. Is this a good understanding or am I > missing something? I don't think this is correct. At runtime all the features from the phrase table and a number of other features, some only available during decoding, are combined in an inner product with a weight vector to score partial translations. I believe it's fair to say that at no point is there an explicit modeling of "a probability of the phrase being correct", at least not in isolation from the partially translated sentence. This is not to say you couldn't model this yourself, of course. - John Burger MITRE smime.p7s Description: S/MIME cryptographic signature ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] Phrase probabilities
I am looking at pruning phrase tables for the experiment I'm working on. I'm not sure if it would be a good idea to include the 'penalty' metric when calculating probability. It is my understanding that multiplying 4 or 5 of the metrics from the phrase table would result in a probability of the phrase being correct. Is this a good understanding or am I missing something? -- Taylor Rose Machine Translation Intern Language Intelligence IRC: Handle: trose Server: freenode ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support