Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Kevin Gimpel
Hey Taylor,
Sounds like you are trying to come up with a simple heuristic for scoring
phrase table entries for purposes of pruning. Many choices are possible
here, so it's good to check the literature as folks mentioned above. But as
far as I know there's no single optimal answer for this. Typically
researchers try a few things and use the approach that gives the best
results on the task at hand. But while there's no single correct answer,
here are some suggestions:
If you have trained weights for the features, you should definitely use
those weights (as Miles suggested). So this would involve computing the dot
product of the features and weights as follows:
score(f, e) = \theta_1 * log(p(e | f)) + \theta_2 * log(lex(e | f)) +
\theta_3 * log(p(f | e)) + \theta_4 * log(lex(f | e))
where the thetas are the learned weights for each of the phrase table
features.
Note that the phrase table typically stores the feature values as
probabilities, and Moses takes logs internally before computing the dot
product.  So you should take logs yourself before multiplying by the feature
weights.
If you don't have feature weights, using uniform weights is reasonable.
And regarding your original question above: since the phrase penalty feature
has the same value for all phrase pairs, it shouldn't affect pruning, right?
HTH,
Kevin

On Tue, Sep 20, 2011 at 4:21 PM, Lane Schwartz  wrote:

> Taylor,
>
> If you don't have a background in NLP or CL (or even if you do), I
> highly recommend taking a look at Philipp's book "Statistical Machine
> Translation"
>
> I hope this doesn't come across as RTFM. That's not what I mean. :)
>
> Cheers,
> Lane
>
> On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose
>  wrote:
> > What would happen if I just multiplied the Direct Phrase Translation
> > probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems
> > like it would work? Sorry if I'm asking dumb questions. I come from the
> > computational side of computational linguistics. I'm learning as fast as
> > I can.
> > --
> > Taylor Rose
> > Machine Translation Intern
> > Language Intelligence
> > IRC: Handle: trose
> > Server: freenode
> >
> >
> > On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote:
> >> Taylor Rose wrote:
> >>
> >> > So what exactly can I infer from the metrics in the phrase table? I
> want
> >> > to be able to compare phrases to each other. From my experience,
> >> > multiplying them and sorting by that number has given me more accurate
> >> > phrases... Obviously calling that metric "probability" is wrong. My
> >> > question is: What is that metric best indicative of?
> >>
> >> That product has no principled interpretation that I can think of.
>  Phrase pairs with high values on all four features will obviously have high
> value products, but that's only interesting because all the features happen
> to be roughly monotonic in phrase quality.  If you wanted a more principled
> way to rank the phrases, I'd just use the MERT weights for those features,
> and combine them with a dot product.
> >>
> >> Pre-filtering the phrase table is something lots of people have looked
> at, and there are many approaches to this.  I like this paper:
> >>
> >>   Improving Translation Quality by Discarding Most of the Phrasetable
> >>   Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland
> >>
> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542
> >>
> >> - JB
> >>
> >> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
> >> >> exactly,  the only correct way to get real probabilities out would be
> >> >> to compute the normalising constant and renormalise the dot products
> >> >> for each phrase pair.
> >> >>
> >> >> remember that this is best thought of as a set of scores, weighted
> >> >> such that the relative proportions of each model are balanced
> >> >>
> >> >> Miles
> >> >>
> >> >> On 20 September 2011 16:07, Burger, John D.  wrote:
> >> >>> Taylor Rose wrote:
> >> >>>
> >>  I am looking at pruning phrase tables for the experiment I'm
> working on.
> >>  I'm not sure if it would be a good idea to include the 'penalty'
> metric
> >>  when calculating probability. It is my understanding that
> multiplying 4
> >>  or 5 of the metrics from the phrase table would result in a
> probability
> >>  of the phrase being correct. Is this a good understanding or am I
> >>  missing something?
> >> >>>
> >> >>> I don't think this is correct.  At runtime all the features from the
> phrase table and a number of other features, some only available during
> decoding, are combined in an inner product with a weight vector to score
> partial translations.  I believe it's fair to say that at no point is there
> an explicit modeling of "a probability of the phrase being correct", at
> least not in isolation from the partially translated sentence.  This is not
> to say you couldn't model this yourself, of course.
> >> >>>
> >> >>> - John Burger
> >> >>> MITRE
> >> >>> 

Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Lane Schwartz
Taylor,

If you don't have a background in NLP or CL (or even if you do), I
highly recommend taking a look at Philipp's book "Statistical Machine
Translation"

I hope this doesn't come across as RTFM. That's not what I mean. :)

Cheers,
Lane

On Tue, Sep 20, 2011 at 3:45 PM, Taylor Rose
 wrote:
> What would happen if I just multiplied the Direct Phrase Translation
> probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems
> like it would work? Sorry if I'm asking dumb questions. I come from the
> computational side of computational linguistics. I'm learning as fast as
> I can.
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
> On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote:
>> Taylor Rose wrote:
>>
>> > So what exactly can I infer from the metrics in the phrase table? I want
>> > to be able to compare phrases to each other. From my experience,
>> > multiplying them and sorting by that number has given me more accurate
>> > phrases... Obviously calling that metric "probability" is wrong. My
>> > question is: What is that metric best indicative of?
>>
>> That product has no principled interpretation that I can think of.  Phrase 
>> pairs with high values on all four features will obviously have high value 
>> products, but that's only interesting because all the features happen to be 
>> roughly monotonic in phrase quality.  If you wanted a more principled way to 
>> rank the phrases, I'd just use the MERT weights for those features, and 
>> combine them with a dot product.
>>
>> Pre-filtering the phrase table is something lots of people have looked at, 
>> and there are many approaches to this.  I like this paper:
>>
>>   Improving Translation Quality by Discarding Most of the Phrasetable
>>   Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland
>>   
>> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542
>>
>> - JB
>>
>> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> >> exactly,  the only correct way to get real probabilities out would be
>> >> to compute the normalising constant and renormalise the dot products
>> >> for each phrase pair.
>> >>
>> >> remember that this is best thought of as a set of scores, weighted
>> >> such that the relative proportions of each model are balanced
>> >>
>> >> Miles
>> >>
>> >> On 20 September 2011 16:07, Burger, John D.  wrote:
>> >>> Taylor Rose wrote:
>> >>>
>>  I am looking at pruning phrase tables for the experiment I'm working on.
>>  I'm not sure if it would be a good idea to include the 'penalty' metric
>>  when calculating probability. It is my understanding that multiplying 4
>>  or 5 of the metrics from the phrase table would result in a probability
>>  of the phrase being correct. Is this a good understanding or am I
>>  missing something?
>> >>>
>> >>> I don't think this is correct.  At runtime all the features from the 
>> >>> phrase table and a number of other features, some only available during 
>> >>> decoding, are combined in an inner product with a weight vector to score 
>> >>> partial translations.  I believe it's fair to say that at no point is 
>> >>> there an explicit modeling of "a probability of the phrase being 
>> >>> correct", at least not in isolation from the partially translated 
>> >>> sentence.  This is not to say you couldn't model this yourself, of 
>> >>> course.
>> >>>
>> >>> - John Burger
>> >>> MITRE
>> >>> ___
>> >>> Moses-support mailing list
>> >>> Moses-support@mit.edu
>> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>>
>> >>>
>> >>
>> >>
>> >>
>> >
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Taylor Rose
What would happen if I just multiplied the Direct Phrase Translation
probability φ(e|f) by the Direct Lexical weight Lex(e|f)? That seems
like it would work? Sorry if I'm asking dumb questions. I come from the
computational side of computational linguistics. I'm learning as fast as
I can.
-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
 Server: freenode


On Tue, 2011-09-20 at 12:11 -0400, Burger, John D. wrote:
> Taylor Rose wrote:
> 
> > So what exactly can I infer from the metrics in the phrase table? I want
> > to be able to compare phrases to each other. From my experience,
> > multiplying them and sorting by that number has given me more accurate
> > phrases... Obviously calling that metric "probability" is wrong. My
> > question is: What is that metric best indicative of?
> 
> That product has no principled interpretation that I can think of.  Phrase 
> pairs with high values on all four features will obviously have high value 
> products, but that's only interesting because all the features happen to be 
> roughly monotonic in phrase quality.  If you wanted a more principled way to 
> rank the phrases, I'd just use the MERT weights for those features, and 
> combine them with a dot product.
> 
> Pre-filtering the phrase table is something lots of people have looked at, 
> and there are many approaches to this.  I like this paper:
> 
>   Improving Translation Quality by Discarding Most of the Phrasetable
>   Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland
>   
> http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542
> 
> - JB
> 
> > On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
> >> exactly,  the only correct way to get real probabilities out would be
> >> to compute the normalising constant and renormalise the dot products
> >> for each phrase pair.
> >> 
> >> remember that this is best thought of as a set of scores, weighted
> >> such that the relative proportions of each model are balanced
> >> 
> >> Miles
> >> 
> >> On 20 September 2011 16:07, Burger, John D.  wrote:
> >>> Taylor Rose wrote:
> >>> 
>  I am looking at pruning phrase tables for the experiment I'm working on.
>  I'm not sure if it would be a good idea to include the 'penalty' metric
>  when calculating probability. It is my understanding that multiplying 4
>  or 5 of the metrics from the phrase table would result in a probability
>  of the phrase being correct. Is this a good understanding or am I
>  missing something?
> >>> 
> >>> I don't think this is correct.  At runtime all the features from the 
> >>> phrase table and a number of other features, some only available during 
> >>> decoding, are combined in an inner product with a weight vector to score 
> >>> partial translations.  I believe it's fair to say that at no point is 
> >>> there an explicit modeling of "a probability of the phrase being 
> >>> correct", at least not in isolation from the partially translated 
> >>> sentence.  This is not to say you couldn't model this yourself, of course.
> >>> 
> >>> - John Burger
> >>> MITRE
> >>> ___
> >>> Moses-support mailing list
> >>> Moses-support@mit.edu
> >>> http://mailman.mit.edu/mailman/listinfo/moses-support
> >>> 
> >>> 
> >> 
> >> 
> >> 
> > 
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact in-memory phrase-table representation

2011-09-20 Thread Kenneth Heafield
I took at look at the existing FactorCollection code and it made me cry,
so I rewrote it for revision 4242 including a better locking strategy. 

On 09/20/11 12:10, Marcin Junczys-Dowmunt wrote:
> Hi Barry,
> very high lock contention. Deadlock is the wrong word. With 48 threads
> 'top' shows me roughly 120% of processor load instead of 4800%. Actual
> translation speed however is far below single thread.
>
> Yes, we are running an online system, filtering is not an option.
> Bye,
> Marcin
>
> 20/9/2011, "Barry Haddow"  napisał/a:
>
>> Hi Marcin
>>
>> That makes sense. I looked at the locking in FactorCollection recently and 
>> realised that it wasn't implemented correctly, although I didn't know that 
>> it 
>> had the potential for deadlock.
>>
>> Do you know if it's an actual deadlock that you're observing, or very high 
>> lock contention?
>>
>> btw - why aren't you filtering the phrase table? Are you running an online 
>> system where the source sentences are not given in advance?
>>
>> cheers - Barry
>>
>> On Tuesday 20 September 2011 11:22:49 Marcin Junczys-Dowmunt wrote:
>>> Hall all,
>>> by the way, I have found the place, where the heavy locking is occurring.
>>> It's the lock in
>>>
>>> FactorCollection::AddFactor
>>>
>>> When I simply and naively remove that one, everything works on full
>>> throttle with 48 threads and nothing bad seems to be happening. With
>>> this locks in place the deadlock occurs starting with around 20 threads
>>> regardless whether the binary phrase table is used or the in-memory
>>> version.
>>>
>>> The size of the phrase table is also a factor. With a small phrase table
>>> filtered according to given test set there are no deadlocks. Does that
>>> make any sense?
>>>
>>> Bye,
>>> Marcin
>>>
>>> 19/9/2011, "Barry Haddow"  napisa�/a:
 Hi Marcin

 On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
> The binary implementation seems to become unusable with more than 10-12
> threads. Speed drops as more threads are used until it nearly deadlocks
> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
> copying the binary phrase tables to a ramdisk does not solve the
> problem. The behavior stays the same. The in-memory version works fine
> with 48 threads, but uses nearly all our ram.
 There's a shared cache for the on-disk phrase table, which is probably
 where the contention is coming from. I don't think disabling the cache
 would help as in a large phrase table you'll have 10s of 1000s of
 translations of common words and punctuation, which you don't want to
 reload for every sentence. A per-thread cache may improve things.

> Pruning is also not enough, our filtered phrase table still takes around
> 300 GB when loaded into memory, I did not even dare to try and load the
> unfiltered phrase-table into memory :). But I will take a look at the
> implementation from the marathon, thanks.
 I think Hieu was referring to this
 http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
 rather than filtering, which may be of some use. It's hard to imagine that
 a 500G phrase table doesn't contain a lot of noise. I'm surprised that
 filtering doesn't remove more though - are you decoding large batches of
 sentences?

> At the moment I am thinking about using a perfect hash function as an
> index and keeping target phrases as packed strings in memory. That
> should use about as much memory as a gzipped phrase table on disk, it
> will be slower though, but probably still faster than the binary
> version.
 Will look forward to seeing how you get on,

 cheers - Barry

 --
 The University of Edinburgh is a charitable body, registered in
 Scotland, with registration number SC005336.
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>>
>> --
>> Barry Haddow
>> University of Edinburgh
>> +44 (0) 131 651 3173
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Miles Osborne
some terminology:  these are feature values, not metrics.

feature values have a number of roles to play eg P(e | f) indicates
the chance that phrase e should be the translation of phrase f.  these
values are designed to be used together, and weighted to produce an
overall score for a translation choice.  this is the basis of a
log-linear model.

if you take them all and multiply them together then I guess that is
equivalent to assuming each is equally weighted and that you have
something like the geometric mean of them (a product of logs, without
the divisor).  you may well be able to use the scores in the way you
suggest, but whether you have `good' or `bad' results will be by
chance.

if you want to prune the phrase table then a starting point is here:

http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16

Miles

On 20 September 2011 16:47, Taylor Rose  wrote:
> So what exactly can I infer from the metrics in the phrase table? I want
> to be able to compare phrases to each other. From my experience,
> multiplying them and sorting by that number has given me more accurate
> phrases... Obviously calling that metric "probability" is wrong. My
> question is: What is that metric best indicative of?
> --
> Taylor Rose
> Machine Translation Intern
> Language Intelligence
> IRC: Handle: trose
>     Server: freenode
>
>
> On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> exactly,  the only correct way to get real probabilities out would be
>> to compute the normalising constant and renormalise the dot products
>> for each phrase pair.
>>
>> remember that this is best thought of as a set of scores, weighted
>> such that the relative proportions of each model are balanced
>>
>> Miles
>>
>> On 20 September 2011 16:07, Burger, John D.  wrote:
>> > Taylor Rose wrote:
>> >
>> >> I am looking at pruning phrase tables for the experiment I'm working on.
>> >> I'm not sure if it would be a good idea to include the 'penalty' metric
>> >> when calculating probability. It is my understanding that multiplying 4
>> >> or 5 of the metrics from the phrase table would result in a probability
>> >> of the phrase being correct. Is this a good understanding or am I
>> >> missing something?
>> >
>> > I don't think this is correct.  At runtime all the features from the 
>> > phrase table and a number of other features, some only available during 
>> > decoding, are combined in an inner product with a weight vector to score 
>> > partial translations.  I believe it's fair to say that at no point is 
>> > there an explicit modeling of "a probability of the phrase being correct", 
>> > at least not in isolation from the partially translated sentence.  This is 
>> > not to say you couldn't model this yourself, of course.
>> >
>> > - John Burger
>> >  MITRE
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>> >
>>
>>
>>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Burger, John D.
Taylor Rose wrote:

> So what exactly can I infer from the metrics in the phrase table? I want
> to be able to compare phrases to each other. From my experience,
> multiplying them and sorting by that number has given me more accurate
> phrases... Obviously calling that metric "probability" is wrong. My
> question is: What is that metric best indicative of?

That product has no principled interpretation that I can think of.  Phrase 
pairs with high values on all four features will obviously have high value 
products, but that's only interesting because all the features happen to be 
roughly monotonic in phrase quality.  If you wanted a more principled way to 
rank the phrases, I'd just use the MERT weights for those features, and combine 
them with a dot product.

Pre-filtering the phrase table is something lots of people have looked at, and 
there are many approaches to this.  I like this paper:

  Improving Translation Quality by Discarding Most of the Phrasetable
  Johnson, John Howard; Martin, Joel; Foster, George; Kuhn, Roland
  
http://nparc.cisti-icist.nrc-cnrc.gc.ca/npsi/ctrl?action=shwart&index=an&req=5763542

- JB

> On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
>> exactly,  the only correct way to get real probabilities out would be
>> to compute the normalising constant and renormalise the dot products
>> for each phrase pair.
>> 
>> remember that this is best thought of as a set of scores, weighted
>> such that the relative proportions of each model are balanced
>> 
>> Miles
>> 
>> On 20 September 2011 16:07, Burger, John D.  wrote:
>>> Taylor Rose wrote:
>>> 
 I am looking at pruning phrase tables for the experiment I'm working on.
 I'm not sure if it would be a good idea to include the 'penalty' metric
 when calculating probability. It is my understanding that multiplying 4
 or 5 of the metrics from the phrase table would result in a probability
 of the phrase being correct. Is this a good understanding or am I
 missing something?
>>> 
>>> I don't think this is correct.  At runtime all the features from the phrase 
>>> table and a number of other features, some only available during decoding, 
>>> are combined in an inner product with a weight vector to score partial 
>>> translations.  I believe it's fair to say that at no point is there an 
>>> explicit modeling of "a probability of the phrase being correct", at least 
>>> not in isolation from the partially translated sentence.  This is not to 
>>> say you couldn't model this yourself, of course.
>>> 
>>> - John Burger
>>> MITRE
>>> ___
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>> 
>>> 
>> 
>> 
>> 
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support



smime.p7s
Description: S/MIME cryptographic signature
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Taylor Rose
So what exactly can I infer from the metrics in the phrase table? I want
to be able to compare phrases to each other. From my experience,
multiplying them and sorting by that number has given me more accurate
phrases... Obviously calling that metric "probability" is wrong. My
question is: What is that metric best indicative of?
-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
 Server: freenode


On Tue, 2011-09-20 at 16:14 +0100, Miles Osborne wrote:
> exactly,  the only correct way to get real probabilities out would be
> to compute the normalising constant and renormalise the dot products
> for each phrase pair.
> 
> remember that this is best thought of as a set of scores, weighted
> such that the relative proportions of each model are balanced
> 
> Miles
> 
> On 20 September 2011 16:07, Burger, John D.  wrote:
> > Taylor Rose wrote:
> >
> >> I am looking at pruning phrase tables for the experiment I'm working on.
> >> I'm not sure if it would be a good idea to include the 'penalty' metric
> >> when calculating probability. It is my understanding that multiplying 4
> >> or 5 of the metrics from the phrase table would result in a probability
> >> of the phrase being correct. Is this a good understanding or am I
> >> missing something?
> >
> > I don't think this is correct.  At runtime all the features from the phrase 
> > table and a number of other features, some only available during decoding, 
> > are combined in an inner product with a weight vector to score partial 
> > translations.  I believe it's fair to say that at no point is there an 
> > explicit modeling of "a probability of the phrase being correct", at least 
> > not in isolation from the partially translated sentence.  This is not to 
> > say you couldn't model this yourself, of course.
> >
> > - John Burger
> >  MITRE
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> 
> 
> 

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Miles Osborne
exactly,  the only correct way to get real probabilities out would be
to compute the normalising constant and renormalise the dot products
for each phrase pair.

remember that this is best thought of as a set of scores, weighted
such that the relative proportions of each model are balanced

Miles

On 20 September 2011 16:07, Burger, John D.  wrote:
> Taylor Rose wrote:
>
>> I am looking at pruning phrase tables for the experiment I'm working on.
>> I'm not sure if it would be a good idea to include the 'penalty' metric
>> when calculating probability. It is my understanding that multiplying 4
>> or 5 of the metrics from the phrase table would result in a probability
>> of the phrase being correct. Is this a good understanding or am I
>> missing something?
>
> I don't think this is correct.  At runtime all the features from the phrase 
> table and a number of other features, some only available during decoding, 
> are combined in an inner product with a weight vector to score partial 
> translations.  I believe it's fair to say that at no point is there an 
> explicit modeling of "a probability of the phrase being correct", at least 
> not in isolation from the partially translated sentence.  This is not to say 
> you couldn't model this yourself, of course.
>
> - John Burger
>  MITRE
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Phrase probabilities

2011-09-20 Thread Burger, John D.
Taylor Rose wrote:

> I am looking at pruning phrase tables for the experiment I'm working on.
> I'm not sure if it would be a good idea to include the 'penalty' metric
> when calculating probability. It is my understanding that multiplying 4
> or 5 of the metrics from the phrase table would result in a probability
> of the phrase being correct. Is this a good understanding or am I
> missing something?

I don't think this is correct.  At runtime all the features from the phrase 
table and a number of other features, some only available during decoding, are 
combined in an inner product with a weight vector to score partial 
translations.  I believe it's fair to say that at no point is there an explicit 
modeling of "a probability of the phrase being correct", at least not in 
isolation from the partially translated sentence.  This is not to say you 
couldn't model this yourself, of course.

- John Burger
  MITRE

smime.p7s
Description: S/MIME cryptographic signature
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] [Corpora-List] I have a problem with multi-bleu.perl

2011-09-20 Thread Taylor Rose
wow I just messed that reply up a lot. Sorry... disregard that last
message.
-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
 Server: freenode


On Tue, 2011-09-20 at 10:29 -0400, Taylor Rose wrote:
> I got four copies of it :P

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Phrase probabilities

2011-09-20 Thread Taylor Rose
I am looking at pruning phrase tables for the experiment I'm working on.
I'm not sure if it would be a good idea to include the 'penalty' metric
when calculating probability. It is my understanding that multiplying 4
or 5 of the metrics from the phrase table would result in a probability
of the phrase being correct. Is this a good understanding or am I
missing something?
-- 
Taylor Rose
Machine Translation Intern
Language Intelligence
IRC: Handle: trose
 Server: freenode



___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact in-memory phrase-table representation

2011-09-20 Thread Marcin Junczys-Dowmunt

Hi Barry,
very high lock contention. Deadlock is the wrong word. With 48 threads
'top' shows me roughly 120% of processor load instead of 4800%. Actual
translation speed however is far below single thread.

Yes, we are running an online system, filtering is not an option.
Bye,
Marcin

20/9/2011, "Barry Haddow"  napisał/a:

>Hi Marcin
>
>That makes sense. I looked at the locking in FactorCollection recently and 
>realised that it wasn't implemented correctly, although I didn't know that it 
>had the potential for deadlock.
>
>Do you know if it's an actual deadlock that you're observing, or very high 
>lock contention?
>
>btw - why aren't you filtering the phrase table? Are you running an online 
>system where the source sentences are not given in advance?
>
>cheers - Barry
>
>On Tuesday 20 September 2011 11:22:49 Marcin Junczys-Dowmunt wrote:
>> Hall all,
>> by the way, I have found the place, where the heavy locking is occurring.
>> It's the lock in
>> 
>> FactorCollection::AddFactor
>> 
>> When I simply and naively remove that one, everything works on full
>> throttle with 48 threads and nothing bad seems to be happening. With
>> this locks in place the deadlock occurs starting with around 20 threads
>> regardless whether the binary phrase table is used or the in-memory
>> version.
>> 
>> The size of the phrase table is also a factor. With a small phrase table
>> filtered according to given test set there are no deadlocks. Does that
>> make any sense?
>> 
>> Bye,
>> Marcin
>> 
>> 19/9/2011, "Barry Haddow"  napisa�/a:
>> >Hi Marcin
>> >
>> >On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
>> >> The binary implementation seems to become unusable with more than 10-12
>> >> threads. Speed drops as more threads are used until it nearly deadlocks
>> >> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
>> >> copying the binary phrase tables to a ramdisk does not solve the
>> >> problem. The behavior stays the same. The in-memory version works fine
>> >> with 48 threads, but uses nearly all our ram.
>> >
>> >There's a shared cache for the on-disk phrase table, which is probably
>> > where the contention is coming from. I don't think disabling the cache
>> > would help as in a large phrase table you'll have 10s of 1000s of
>> > translations of common words and punctuation, which you don't want to
>> > reload for every sentence. A per-thread cache may improve things.
>> >
>> >> Pruning is also not enough, our filtered phrase table still takes around
>> >> 300 GB when loaded into memory, I did not even dare to try and load the
>> >> unfiltered phrase-table into memory :). But I will take a look at the
>> >> implementation from the marathon, thanks.
>> >
>> >I think Hieu was referring to this
>> >http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
>> >rather than filtering, which may be of some use. It's hard to imagine that
>> > a 500G phrase table doesn't contain a lot of noise. I'm surprised that
>> > filtering doesn't remove more though - are you decoding large batches of
>> > sentences?
>> >
>> >> At the moment I am thinking about using a perfect hash function as an
>> >> index and keeping target phrases as packed strings in memory. That
>> >> should use about as much memory as a gzipped phrase table on disk, it
>> >> will be slower though, but probably still faster than the binary
>> >> version.
>> >
>> >Will look forward to seeing how you get on,
>> >
>> >cheers - Barry
>> >
>> >--
>> >The University of Edinburgh is a charitable body, registered in
>> >Scotland, with registration number SC005336.
>> 
>> ___
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>> 
> 
>--
>Barry Haddow
>University of Edinburgh
>+44 (0) 131 651 3173
>
>-- 
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact in-memory phrase-table representation

2011-09-20 Thread Barry Haddow
Hi Marcin

That makes sense. I looked at the locking in FactorCollection recently and 
realised that it wasn't implemented correctly, although I didn't know that it 
had the potential for deadlock.

Do you know if it's an actual deadlock that you're observing, or very high 
lock contention?

btw - why aren't you filtering the phrase table? Are you running an online 
system where the source sentences are not given in advance?

cheers - Barry

On Tuesday 20 September 2011 11:22:49 Marcin Junczys-Dowmunt wrote:
> Hall all,
> by the way, I have found the place, where the heavy locking is occurring.
> It's the lock in
> 
> FactorCollection::AddFactor
> 
> When I simply and naively remove that one, everything works on full
> throttle with 48 threads and nothing bad seems to be happening. With
> this locks in place the deadlock occurs starting with around 20 threads
> regardless whether the binary phrase table is used or the in-memory
> version.
> 
> The size of the phrase table is also a factor. With a small phrase table
> filtered according to given test set there are no deadlocks. Does that
> make any sense?
> 
> Bye,
> Marcin
> 
> 19/9/2011, "Barry Haddow"  napisał/a:
> >Hi Marcin
> >
> >On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
> >> The binary implementation seems to become unusable with more than 10-12
> >> threads. Speed drops as more threads are used until it nearly deadlocks
> >> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
> >> copying the binary phrase tables to a ramdisk does not solve the
> >> problem. The behavior stays the same. The in-memory version works fine
> >> with 48 threads, but uses nearly all our ram.
> >
> >There's a shared cache for the on-disk phrase table, which is probably
> > where the contention is coming from. I don't think disabling the cache
> > would help as in a large phrase table you'll have 10s of 1000s of
> > translations of common words and punctuation, which you don't want to
> > reload for every sentence. A per-thread cache may improve things.
> >
> >> Pruning is also not enough, our filtered phrase table still takes around
> >> 300 GB when loaded into memory, I did not even dare to try and load the
> >> unfiltered phrase-table into memory :). But I will take a look at the
> >> implementation from the marathon, thanks.
> >
> >I think Hieu was referring to this
> >http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
> >rather than filtering, which may be of some use. It's hard to imagine that
> > a 500G phrase table doesn't contain a lot of noise. I'm surprised that
> > filtering doesn't remove more though - are you decoding large batches of
> > sentences?
> >
> >> At the moment I am thinking about using a perfect hash function as an
> >> index and keeping target phrases as packed strings in memory. That
> >> should use about as much memory as a gzipped phrase table on disk, it
> >> will be slower though, but probably still faster than the binary
> >> version.
> >
> >Will look forward to seeing how you get on,
> >
> >cheers - Barry
> >
> >--
> >The University of Edinburgh is a charitable body, registered in
> >Scotland, with registration number SC005336.
> 
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
> 
 
--
Barry Haddow
University of Edinburgh
+44 (0) 131 651 3173

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Compact in-memory phrase-table representation

2011-09-20 Thread Marcin Junczys-Dowmunt

Hall all,
by the way, I have found the place, where the heavy locking is occurring.
It's the lock in

FactorCollection::AddFactor

When I simply and naively remove that one, everything works on full
throttle with 48 threads and nothing bad seems to be happening. With
this locks in place the deadlock occurs starting with around 20 threads
regardless whether the binary phrase table is used or the in-memory
version.

The size of the phrase table is also a factor. With a small phrase table
filtered according to given test set there are no deadlocks. Does that
make any sense?

Bye,
Marcin

19/9/2011, "Barry Haddow"  napisał/a:

>Hi Marcin
>
>On Monday 19 September 2011 07:58:48 Marcin Junczys-Dowmunt wrote:
>> The binary implementation seems to become unusable with more than 10-12
>> threads. Speed drops as more threads are used until it nearly deadlocks
>> at around 30 threads. I am using a 48-core server with 512 GB ram. Even
>> copying the binary phrase tables to a ramdisk does not solve the
>> problem. The behavior stays the same. The in-memory version works fine
>> with 48 threads, but uses nearly all our ram.
>
>There's a shared cache for the on-disk phrase table, which is probably where
>the contention is coming from. I don't think disabling the cache would help as
>in a large phrase table you'll have 10s of 1000s of translations of common
>words and punctuation, which you don't want to reload for every sentence. A
>per-thread cache may improve things.
>
>>
>> Pruning is also not enough, our filtered phrase table still takes around
>> 300 GB when loaded into memory, I did not even dare to try and load the
>> unfiltered phrase-table into memory :). But I will take a look at the
>> implementation from the marathon, thanks.
>
>I think Hieu was referring to this
>http://www.statmt.org/moses/?n=Moses.AdvancedFeatures#ntoc16
>rather than filtering, which may be of some use. It's hard to imagine that a
>500G phrase table doesn't contain a lot of noise. I'm surprised that filtering
>doesn't remove more though - are you decoding large batches of sentences?
>
>>
>> At the moment I am thinking about using a perfect hash function as an
>> index and keeping target phrases as packed strings in memory. That
>> should use about as much memory as a gzipped phrase table on disk, it
>> will be slower though, but probably still faster than the binary version.
>>
>
>Will look forward to seeing how you get on,
>
>cheers - Barry
>
>--
>The University of Edinburgh is a charitable body, registered in
>Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] I need to a graph including all hypotheses

2011-09-20 Thread Ondrej Bojar
Hi, Vakil,

run moses with -h to see the list of command-line options. One of them is -osg 
or -output-search-graph, which is probably what you are after.

Cheers, O.

"zeinab vakil"  wrote:

>hello,
>
>Moses give best hypothesis for one sentence, but I need to a graph including
>all possible paths (all hypotheses) after pruning. I know that moses product
>such graph, but I don't know that how I can access it. Please guide me.
>
>vakil
>___
>Moses-support mailing list
>Moses-support@mit.edu
>http://mailman.mit.edu/mailman/listinfo/moses-support


-- 
Ondrej Bojar
http://www.cuni.cz/~obo
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] I need to a graph including all hypotheses

2011-09-20 Thread zeinab vakil
hello,

Moses give best hypothesis for one sentence, but I need to a graph including
all possible paths (all hypotheses) after pruning. I know that moses product
such graph, but I don't know that how I can access it. Please guide me.

vakil
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support