[Moses-support] How is the final LM score obtained?

2009-03-05 Thread Carlos Henriquez

Hi all.


I'm making some tests extracting the nbest list from moses (-n-best-list 
option) with all models' weights set to 1 and I don't understand how do you get 
the final LM score. I'm using srilm.

For instance, my best translation from Chinese to English on sentence 9 was

9 ||| after three hours .  ||| d: 0 lm: -17.0614 tm: -7.41812 -0.944461 
-4.79107 -2.87243 w: -4 ||| -37.0874

but if I run ngram alone with the same output sentence

echo "after three hours ." | ngram -order 5 -lm ../marie/lm/train.tok.en.lm 
-ppl -

the result is very different

file -: 1 sentences, 4 words, 0 OOVs
0 zeroprobs, logprob= -7.40966 ppl= 30.3341 ppl1= 71.1892

I tried with some other values from my nbest list and I always found a big 
difference between the two scores.

If my initial weight is 1, why are the scores so different? I suppose I am 
misunderstanding something.

The moses command to obtain the n-best-list was 

moses -f moses.ini -i ../../corpus/dev.zh -d 1 -tm 1 1 1 1 -lm 1 -w 1 
-n-best-list devout.moses.nbest 10 -include-alignment-in-n-best true > 
devout.moses 2> /dev/null

(yep, I'm not using the last tm weight) and the moses.ini file does not have 
any weights.

--
Carlos A. Henríquez Q.
carlo...@gps.tsc.upc.es


  

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How is the final LM score obtained?

2009-03-05 Thread Barry Haddow
Hi Carlos

Moses is outputting natural logarithms, srilm is outputting base 10.

log(10) * -7.40966 =  -17.0614

regards
Barry

On Thursday 05 March 2009 10:05, Carlos Henriquez wrote:
> -7.40966

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How is the final LM score obtained?

2009-03-05 Thread Miles Osborne
a couple of points:

--you are asking ngram for perplexities scores, but Moses uses log probs
--Moses will append  and  pseudo-words to the start and end ot
a sentence;  this will change the probabilities

Miles

2009/3/5 Carlos Henriquez :
>
> Hi all.
>
>
> I'm making some tests extracting the nbest list from moses (-n-best-list 
> option) with all models' weights set to 1 and I don't understand how do you 
> get the final LM score. I'm using srilm.
>
> For instance, my best translation from Chinese to English on sentence 9 was
>
> 9 ||| after three hours .  ||| d: 0 lm: -17.0614 tm: -7.41812 -0.944461 
> -4.79107 -2.87243 w: -4 ||| -37.0874
>
> but if I run ngram alone with the same output sentence
>
> echo "after three hours ." | ngram -order 5 -lm ../marie/lm/train.tok.en.lm 
> -ppl -
>
> the result is very different
>
> file -: 1 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -7.40966 ppl= 30.3341 ppl1= 71.1892
>
> I tried with some other values from my nbest list and I always found a big 
> difference between the two scores.
>
> If my initial weight is 1, why are the scores so different? I suppose I am 
> misunderstanding something.
>
> The moses command to obtain the n-best-list was
>
> moses -f moses.ini -i ../../corpus/dev.zh -d 1 -tm 1 1 1 1 -lm 1 -w 1 
> -n-best-list devout.moses.nbest 10 -include-alignment-in-n-best true > 
> devout.moses 2> /dev/null
>
> (yep, I'm not using the last tm weight) and the moses.ini file does not have 
> any weights.
>
> --
> Carlos A. Henríquez Q.
> carlo...@gps.tsc.upc.es
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Error in running moses with randlm

2009-03-05 Thread Michael Zuckerman
Miles, Chris,

Thank you very much for your support. Could you please let me know when you
fix moses ?

Michael.

On Wed, Mar 4, 2009 at 2:21 PM, Chris Dyer  wrote:

> Yeah, sorry about this- I broke moses, at least for certain compilers.
>  I'll fix it shortly.
> -Chris
>
> On Wed, Mar 4, 2009 at 12:17 PM, Miles Osborne  wrote:
> > ok, it seems that the most recent version of Moses had a bad commit
> > and broke the language model interface.  so, this is not really
> > anything to do with RandLM as such.
> >
> > Miles
> >
> > 2009/2/26 Michael Zuckerman :
> >> Hi,
> >>
> >> As you said, I tried again with europarl used for training the language
> >> model, but I get the same error:
> >> Start loading LanguageModel
> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap : [0.000] seconds
> >> pure virtual method called
> >> terminate called without an active exception
> >> Aborted
> >>
> >> For creating the language model I ran:
> >> $ ../bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix
> model
> >> -input-path ../../europarl.lower.token.en.gz
> >>
> >> Thank you for your help,
> >>  Michael.
> >> - Show quoted text -
> >>
> >> On Tue, Feb 24, 2009 at 8:36 PM, Miles Osborne 
> wrote:
> >>>
> >>> can you try it again with a large amount of data for training the
> >>> language model?  in the past i've noticed that it doesn't work very
> >>> well with minute numbers of sentences.
> >>>
> >>> try europarl
> >>>
> >>> (i get a different error message, but it might be the same thing)
> >>>
> >>> Miles
> >>>
> >>> 2009/2/24 Michael Zuckerman :
> >>> > Hi,
> >>> >
> >>> >>
> >>> >> I am running moses on a small example containing two german
> sentences
> >>> >> (in
> >>> >> file "in"):
> >>> >> das ist ein kleines haus
> >>> >> das ist ein kleines haus
> >>> >> I am using the attached randlm language model model.BloomMap, and
> the
> >>> >> attached phrase table and moses.ini files.
> >>> >> My command line is:
> >>> >> $ ../../../../mosesdecoder/moses-cmd/src/moses -f moses.ini < in >
> out
> >>> >> When loading the language model, moses gives an error:
> >>> >>
> >>> >> Defined parameters (per moses.ini or switch):
> >>> >> config: moses.ini
> >>> >> input-factors: 0
> >>> >> lmodel-file: 5 0 3
> >>> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap
> >>> >> mapping: T 0
> >>> >> ttable-file: 0 0 1 phrase-table
> >>> >> ttable-limit: 10
> >>> >> weight-d: 1
> >>> >> weight-l: 1
> >>> >> weight-t: 1
> >>> >> weight-w: 0
> >>> >> Added ScoreProducer(0 Distortion) index=0-0
> >>> >> Added ScoreProducer(1 WordPenalty) index=1-1
> >>> >> Added ScoreProducer(2 !UnknownWordPenalty) index=2-2
> >>> >> Loading lexical distortion models...
> >>> >> have 0 models
> >>> >> Start loading LanguageModel
> >>> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap : [0.000]
> seconds
> >>> >> pure virtual method called
> >>> >> terminate called without an active exception
> >>> >> Aborted
> >>> >>
> >>> >> Do you have a clue how to handle this error ?
> >>> >>
> >>> >> Thanks,
> >>> >> Michael.
> >>> >
> >>> >
> >>> > ___
> >>> > Moses-support mailing list
> >>> > Moses-support@mit.edu
> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >>> >
> >>> >
> >>>
> >>>
> >>>
> >>> --
> >>> The University of Edinburgh is a charitable body, registered in
> >>> Scotland, with registration number SC005336.
> >>
> >>
> >
> >
> >
> > --
> > The University of Edinburgh is a charitable body, registered in
> > Scotland, with registration number SC005336.
> > ___
> > Moses-support mailing list
> > Moses-support@mit.edu
> > http://mailman.mit.edu/mailman/listinfo/moses-support
> >
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] How is the final LM score obtained?

2009-03-05 Thread Carlos Henriquez

Hi Miles.

You said that moses will append  and  pseudo-words to the start and end 
of
a sentence and that will change the probabilities, but actually

echo "after three hours ." | ngram -order 5 -lm ../marie/lm/train.tok.en.lm 
-ppl -
echo " after three hours ." | ngram -order 5 -lm 
../marie/lm/train.tok.en.lm -ppl -

return the same value (perplexity and logprob) so I suppose that ngram added 
those pseudo-words as well, didn't it?

It seems more like a base difference, as explained by Barry.

Moses is outputting natural logarithms, srilm is outputting base 10.
log(10) * -7.40966 =  -17.0614

Thank you all.

 --
Carlos A. Henríquez Q.
carlo...@gps.tsc.upc.es




- Mensaje original 
De: Miles Osborne 
Para: Carlos Henriquez 
CC: moses-support@mit.edu
Enviado: jueves, 5 de marzo, 2009 11:26:29
Asunto: Re: [Moses-support] How is the final LM score obtained?

a couple of points:

--you are asking ngram for perplexities scores, but Moses uses log probs
--Moses will append  and  pseudo-words to the start and end ot
a sentence;  this will change the probabilities

Miles

2009/3/5 Carlos Henriquez :
>
> Hi all.
>
>
> I'm making some tests extracting the nbest list from moses (-n-best-list 
> option) with all models' weights set to 1 and I don't understand how do you 
> get the final LM score. I'm using srilm.
>
> For instance, my best translation from Chinese to English on sentence 9 was
>
> 9 ||| after three hours .  ||| d: 0 lm: -17.0614 tm: -7.41812 -0.944461 
> -4.79107 -2.87243 w: -4 ||| -37.0874
>
> but if I run ngram alone with the same output sentence
>
> echo "after three hours ." | ngram -order 5 -lm ../marie/lm/train.tok.en.lm 
> -ppl -
>
> the result is very different
>
> file -: 1 sentences, 4 words, 0 OOVs
> 0 zeroprobs, logprob= -7.40966 ppl= 30.3341 ppl1= 71.1892
>
> I tried with some other values from my nbest list and I always found a big 
> difference between the two scores.
>
> If my initial weight is 1, why are the scores so different? I suppose I am 
> misunderstanding something.
>
> The moses command to obtain the n-best-list was
>
> moses -f moses.ini -i ../../corpus/dev.zh -d 1 -tm 1 1 1 1 -lm 1 -w 1 
> -n-best-list devout.moses.nbest 10 -include-alignment-in-n-best true > 
> devout.moses 2> /dev/null
>
> (yep, I'm not using the last tm weight) and the moses.ini file does not have 
> any weights.
>
> --
> Carlos A. Henríquez Q.
> carlo...@gps.tsc.upc.es
>
>
>
>
> ___
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


  

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] word alignment quality and symmetrization

2009-03-05 Thread Alexander Fraser
Hi Jorg,

The short answer to your question, is yes, the numbers you are
reporting are reasonable. Intersection gets around 6% AER, and Och's
refined gets around 10% AER for the training data set I worked on in
the past, which is the LDC Hansard.

Here is the longer answer to the question you didn't ask :-)

1) AER is broken for Sure and Possible links and can be gamed by
guessing fewer links. If you must use Sure vs. Possible alignments,
use Och and Ney's definition of Precision and Recall, and take 1 - the
geometric mean. (See our CL squib, kindly already cited by Adam, for
more details).

2) The gold standard alignment set is broken (I assume we are talking
about French/English btw, I think there was also German/English which
I am not familiar with). There are 4376 Sure links and 19222 Possible
links. Franz told me that the way this was generated is that two
annotators both annotated the set. Intersection of the annotators was
marked Sure, and union of the annotators was marked Possible. So the
interannotator agreement was really low. This was not done using a
GUI, btw, but instead by typing in offsets.

3) Sure vs. Possible_and_not_Sure is a nebulous distinction (see
above). If you would like the first 220 sentences of the set
reannotated as Sure only (in the spirit of Melamed's Blinker
guidelines), I can make those available. They worked better for
predicting MT performance.

4) The sentences annotated were sampled from the LDC Hansard, not the
ISI Hansard; results using the ISI Hansard are not directly comparable
(the gold standard alignments are also mismatched in time, I don't
know if this is important).

5) There are French/English alignments available for Europarl, perhaps
you should be using these instead? They use Sure vs. Possible
unfortunately. I don't know if they had French or English native
spakers, so YMMV. Not to criticize though, I bet there are errors in
my annotation as well. Many thanks to those guys for releasing their
work!!

https://www.l2f.inesc-id.pt/wiki/index.php/Word_Alignments

6) I would use unbalanced F-Measure rather than balanced F-Measure
(see again the squib, this is the main point of it). For applications
where precision is more important (such as cross-lingual retrieval),
increase alpha to weight precision more.

Cheers, Alex
---
Alexander Fraser
Institute for Natural Language Processing
University of Stuttgart
Azenbergstrasse 12
70174 Stuttgart, Germany

phone: +49 (711) 685-81375
fax:   +49 (711) 685-71400
email: fra...@ims.uni-stuttgart.de
web:   http://www.ims.uni-stuttgart.de/~fraser
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] Error in running moses with randlm

2009-03-05 Thread Michael Zuckerman
Chris,

Is there a version of moses that works correctly with RandLM ? If yes, how
can I fetch it from the database ?

Thanks,
 Michael.

On Thu, Mar 5, 2009 at 1:37 PM, Michael Zuckerman wrote:

> Miles, Chris,
>
> Thank you very much for your support. Could you please let me know when you
> fix moses ?
>
> Michael.
>
>
> On Wed, Mar 4, 2009 at 2:21 PM, Chris Dyer  wrote:
>
>> Yeah, sorry about this- I broke moses, at least for certain compilers.
>>  I'll fix it shortly.
>> -Chris
>>
>> On Wed, Mar 4, 2009 at 12:17 PM, Miles Osborne 
>> wrote:
>> > ok, it seems that the most recent version of Moses had a bad commit
>> > and broke the language model interface.  so, this is not really
>> > anything to do with RandLM as such.
>> >
>> > Miles
>> >
>> > 2009/2/26 Michael Zuckerman :
>> >> Hi,
>> >>
>> >> As you said, I tried again with europarl used for training the language
>> >> model, but I get the same error:
>> >> Start loading LanguageModel
>> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap : [0.000] seconds
>> >> pure virtual method called
>> >> terminate called without an active exception
>> >> Aborted
>> >>
>> >> For creating the language model I ran:
>> >> $ ../bin/buildlm -struct BloomMap -falsepos 8 -values 8 -output-prefix
>> model
>> >> -input-path ../../europarl.lower.token.en.gz
>> >>
>> >> Thank you for your help,
>> >>  Michael.
>> >> - Show quoted text -
>> >>
>> >> On Tue, Feb 24, 2009 at 8:36 PM, Miles Osborne 
>> wrote:
>> >>>
>> >>> can you try it again with a large amount of data for training the
>> >>> language model?  in the past i've noticed that it doesn't work very
>> >>> well with minute numbers of sentences.
>> >>>
>> >>> try europarl
>> >>>
>> >>> (i get a different error message, but it might be the same thing)
>> >>>
>> >>> Miles
>> >>>
>> >>> 2009/2/24 Michael Zuckerman :
>> >>> > Hi,
>> >>> >
>> >>> >>
>> >>> >> I am running moses on a small example containing two german
>> sentences
>> >>> >> (in
>> >>> >> file "in"):
>> >>> >> das ist ein kleines haus
>> >>> >> das ist ein kleines haus
>> >>> >> I am using the attached randlm language model model.BloomMap, and
>> the
>> >>> >> attached phrase table and moses.ini files.
>> >>> >> My command line is:
>> >>> >> $ ../../../../mosesdecoder/moses-cmd/src/moses -f moses.ini < in >
>> out
>> >>> >> When loading the language model, moses gives an error:
>> >>> >>
>> >>> >> Defined parameters (per moses.ini or switch):
>> >>> >> config: moses.ini
>> >>> >> input-factors: 0
>> >>> >> lmodel-file: 5 0 3
>> >>> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap
>> >>> >> mapping: T 0
>> >>> >> ttable-file: 0 0 1 phrase-table
>> >>> >> ttable-limit: 10
>> >>> >> weight-d: 1
>> >>> >> weight-l: 1
>> >>> >> weight-t: 1
>> >>> >> weight-w: 0
>> >>> >> Added ScoreProducer(0 Distortion) index=0-0
>> >>> >> Added ScoreProducer(1 WordPenalty) index=1-1
>> >>> >> Added ScoreProducer(2 !UnknownWordPenalty) index=2-2
>> >>> >> Loading lexical distortion models...
>> >>> >> have 0 models
>> >>> >> Start loading LanguageModel
>> >>> >> /home/michez/alfabetic/lm/randlm/test/model.BloomMap : [0.000]
>> seconds
>> >>> >> pure virtual method called
>> >>> >> terminate called without an active exception
>> >>> >> Aborted
>> >>> >>
>> >>> >> Do you have a clue how to handle this error ?
>> >>> >>
>> >>> >> Thanks,
>> >>> >> Michael.
>> >>> >
>> >>> >
>> >>> > ___
>> >>> > Moses-support mailing list
>> >>> > Moses-support@mit.edu
>> >>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >>> >
>> >>> >
>> >>>
>> >>>
>> >>>
>> >>> --
>> >>> The University of Edinburgh is a charitable body, registered in
>> >>> Scotland, with registration number SC005336.
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > The University of Edinburgh is a charitable body, registered in
>> > Scotland, with registration number SC005336.
>> > ___
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>
>
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support