Thanks for the information Kevin. Where would I find these feature
weights? I've found files in Moses that I suspect might be the weights
but they're not labeled and the file/directory names don't really help
either.
-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
     Server: freenode


On Tue, 2011-09-20 at 23:32 -0400, Kevin Gimpel wrote:
> Hey Taylor,
> Sounds like you are trying to come up with a simple heuristic for
> scoring phrase table entries for purposes of pruning. Many choices are
> possible here, so it's good to check the literature as folks mentioned
> above. But as far as I know there's no single optimal answer for this.
> Typically researchers try a few things and use the approach that gives
> the best results on the task at hand. But while there's no single
> correct answer, here are some suggestions: 
> If you have trained weights for the features, you should definitely
> use those weights (as Miles suggested). So this would involve
> computing the dot product of the features and weights as follows:
> score(f, e) = \theta_1 * log(p(e | f)) + \theta_2 * log(lex(e | f)) +
> \theta_3 * log(p(f | e)) + \theta_4 * log(lex(f | e))
> where the thetas are the learned weights for each of the phrase table
> features.
> Note that the phrase table typically stores the feature values as
> probabilities, and Moses takes logs internally before computing the
> dot product.  So you should take logs yourself before multiplying by
> the feature weights.
> If you don't have feature weights, using uniform weights is
> reasonable.
> And regarding your original question above: since the phrase penalty
> feature has the same value for all phrase pairs, it shouldn't affect
> pruning, right?
> HTH,
> Kevin
> 
> On Tue, Sep 20, 2011 at 4:21 PM, Lane Schwartz <dowob...@gmail.com>
> wrote:
>         Taylor,
>         
>         If you don't have a background in NLP or CL (or even if you
>         do), I
>         highly recommend taking a look at Philipp's book "Statistical
>         Machine
>         Translation"
>         
>         I hope this doesn't come across as RTFM. That's not what I
>         mean. :)
>         
>         Cheers,
>         Lane
>         
>         
>         On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose
>         <tr...@languageintelligence.com> wrote:
>         > What would happen if I just multiplied the Direct Phrase
>         Translation
>         > probability φ(e|f) by the Direct Lexical weight Lex(e|f)?
>         That seems
>         > like it would work? Sorry if I'm asking dumb questions. I
>         come from the
>         > computational side of computational linguistics. I'm
>         learning as fast as
>         > I can.
>         > --
>         > Taylor Rose
>         > Machine Translation Intern
>         > Language Intelligence
>         > IRC: Handle: trose
>         >     Server: freenode
>         >
>         >
>         > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote:
>         >> Taylor Rose wrote:
>         >>
>         >> > So what exactly can I infer from the metrics in the
>         phrase table? I want
>         >> > to be able to compare phrases to each other. From my
>         experience,
>         >> > multiplying them and sorting by that number has given me
>         more accurate
>         >> > phrases... Obviously calling that metric "probability" is
>         wrong. My
>         >> > question is: What is that metric best indicative of?
>         >>
>         >> That product has no principled interpretation that I can
>         think of.  Phrase pairs with high values on all four features
>         will obviously have high value products, but that's only
>         interesting because all the features happen to be roughly
>         monotonic in phrase quality.  If you wanted a more principled
>         way to rank the phrases, I'd just use the MERT weights for
>         those features, and combine them with a dot product.
>         >>
>         >> Pre-filtering the phrase table is something lots of people
>         have looked at, and there are many approaches to this.  I like
>         this paper:
>         >>
>         >>   Improving Translation Quality by Discarding Most of the
>         Phrasetable
>         >>   Johnson, John Howard; Martin, Joel; Foster, George; Kuhn,
>         Roland
>         >>
>         
> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542
>         >>
>         >> - JB
>         >>
>         >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>         >> >> exactly,  the only correct way to get real probabilities
>         out would be
>         >> >> to compute the normalising constant and renormalise the
>         dot products
>         >> >> for each phrase pair.
>         >> >>
>         >> >> remember that this is best thought of as a set of
>         scores, weighted
>         >> >> such that the relative proportions of each model are
>         balanced
>         >> >>
>         >> >> Miles
>         >> >>
>         >> >> On 20 September 2011 16:07, Burger, John D.
>         <j...@mitre.org> wrote:
>         >> >>> Taylor Rose wrote:
>         >> >>>
>         >> >>>> I am looking at pruning phrase tables for the
>         experiment I'm working on.
>         >> >>>> I'm not sure if it would be a good idea to include the
>         'penalty' metric
>         >> >>>> when calculating probability. It is my understanding
>         that multiplying 4
>         >> >>>> or 5 of the metrics from the phrase table would result
>         in a probability
>         >> >>>> of the phrase being correct. Is this a good
>         understanding or am I
>         >> >>>> missing something?
>         >> >>>
>         >> >>> I don't think this is correct.  At runtime all the
>         features from the phrase table and a number of other features,
>         some only available during decoding, are combined in an inner
>         product with a weight vector to score partial translations.  I
>         believe it's fair to say that at no point is there an explicit
>         modeling of "a probability of the phrase being correct", at
>         least not in isolation from the partially translated
>         sentence.  This is not to say you couldn't model this
>         yourself, of course.
>         >> >>>
>         >> >>> - John Burger
>         >> >>> MITRE
>         >> >>> _______________________________________________
>         >> >>> Moses-support mailing list
>         >> >>> Moses-support@mit.edu
>         >> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>         >> >>>
>         >> >>>
>         >> >>
>         >> >>
>         >> >>
>         >> >
>         >> > _______________________________________________
>         >> > Moses-support mailing list
>         >> > Moses-support@mit.edu
>         >> > http://mailman.mit.edu/mailman/listinfo/moses-support
>         >>
>         >> _______________________________________________
>         >> Moses-support mailing list
>         >> Moses-support@mit.edu
>         >> http://mailman.mit.edu/mailman/listinfo/moses-support
>         >
>         >
>         > _______________________________________________
>         > Moses-support mailing list
>         > Moses-support@mit.edu
>         > http://mailman.mit.edu/mailman/listinfo/moses-support
>         >
>         
>         
>         
>         
>         --
>         When a place gets crowded enough to require ID's, social
>         collapse is not
>         far away.  It is time to go elsewhere.  The best thing about
>         space travel
>         is that it made it possible to go elsewhere.
>                         -- R.A. Heinlein, "Time Enough For Love"
>         
>         
>         _______________________________________________
>         Moses-support mailing list
>         Moses-support@mit.edu
>         http://mailman.mit.edu/mailman/listinfo/moses-support
>         
> 


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to