Re: [Moses-support] Tuning and decoding of lattices in the new Moses.

2013-09-06 Thread Hieu Hoang
Good to know. I don't think it's obvious that you need that switch for lattice 
input. Maybe there should be a check of some sort in the mert scrip 

Sent while bumping into things

On 6 Sep 2013, at 15:42, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:

 Hi Hieu, 
 
 A quick update: I should have used the --no-filter-phrase-table flag, 
 otherwise phrase table gets filtered. Thanks a lot for our help
 
 Yulia
 
 
 On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang hieuho...@gmail.com wrote:
 Ok. If you're stil stuck please send me your phrase table and I'll try and 
 debug it
 
 Sent while bumping into things
 
 On 4 Sep 2013, at 17:07, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:
 
 phrase table is not empty, it looks normal, here is the snippet:
 
 no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0 0-1 
 1-2 ||| 1 166 1
 no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 ||| 0-0 
 1-2 ||| 2 166 1
 no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0 
 1-2 ||| 1 166 1
 no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07 
 ||| 0-0 0-1 1-3 ||| 14 166 1
 no one ||| on ne ||| 0.0044 0.000152764 0.0060241 0.000497078 ||| 1-0 
 0-1 ||| 225 166 1
 no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 ||| 
 15369 166 1
 
 i don't filter the phrase table...
 
 I'll debug more, and Chris was going to look at it too, I will send you an 
 update.
 
 Thanks!
 
 Yulia
 
 
 
 On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:
 hmm, strange. the moses.ini file looks ok. There shouldn't be an issue 
 with initialisation. Is the phrase-table empty? 
 
 make sure you're not fitlering the phrase table, i don't think the filter 
 script understand lattices
 
 
 
 
 On 4 September 2013 15:10, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:
 Hi Hieu,
 
 did you manage to get moses working with lattices again? it would be 
 nice to get some feedback
 
 Sorry for not sending feedback earlier -- I was just trying to debug by 
 myself before I send feedback or ask next question...
 
 I was able to run a pipeline with the new settings, thanks a lot for the 
 detailed answer!
 
 There is still a problem (with feature initialization?), here is the 
 first lattice translation, looks like all input words are treated as OOVs 
 (and they are not), and then MERT gets killed:
 
 BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK 
 the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK 
 travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK 
 the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK 
 ethnographic|UNK|UNK|UNK research|UNK|UNK|UNK is|UNK|UNK|UNK 
 the|UNK|UNK|UNK opportunity|UNK|UNK|UNK to|UNK|UNK|UNK live|UNK|UNK|UNK 
 amongst|UNK|UNK|UNK those|UNK|UNK|UNK who|UNK|UNK|UNK have|UNK|UNK|UNK 
 not|UNK|UNK|UNK forgotten|UNK|UNK|UNK the|UNK|UNK|UNK old|UNK|UNK|UNK 
 ways|UNK|UNK|UNK to|UNK|UNK|UNK still|UNK|UNK|UNK feel|UNK|UNK|UNK 
 their|UNK|UNK|UNK pass|UNK|UNK|UNK in|UNK|UNK|UNK the|UNK|UNK|UNK 
 when|UNK|UNK|UNK touch|UNK|UNK|UNK and|UNK|UNK|UNK stones|UNK|UNK|UNK 
 caused|UNK|UNK|UNK by|UNK|UNK|UNK rain|UNK|UNK|UNK tasted|UNK|UNK|UNK 
 leaves|UNK|UNK|UNK of|UNK|UNK|UNK the|UNK|UNK|UNK bitter|UNK|UNK|UNK 
 plants|UNK|UNK|UNK 
 [1]  
 [total=-6405.459] 
 core=(-6100.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000)
   
 Line 0: Translation took 0.000 seconds total
 Translating line 1  in thread id 47061808453376
 sh: line 1:  7333 Killed  
 /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini 
 -inputtype 2 -weight-overwrite 'InputFeature0= 0.07 PhrasePenalty0= 
 0.07 WordPenalty0= -0.33 TranslationModel0= 0.07 0.07 
 0.07 0.07 Distortion0= 0.10 LM0= 0.17' -n-best-list 
 run1.best100.out 100 -input-file 
 /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en
   run1.out
 Exit code: 137
 The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.07 
 PhrasePenalty0= 0.07 WordPenalty0= -0.33 TranslationModel0= 
 0.07 0.07 0.07 0.07 Distortion0= 0.10 LM0= 0.17' 
 
 I attach my config file, and here is the exact command that I am 
 executing:
 
 mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr 
 /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir 
 ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2
 
 
 Thanks a lot for your help!
 Yulia
 
 
 
 
 On 2 September 2013 17:03, Hieu Hoang hieu.ho...@ed.ac.uk wrote:
 Hi Yulia
 
 
 On 1 September 2013 22:46, Yulia Tsvetkov yulia.tsvet...@gmail.com 
 wrote:
 Dear Moses developers, 
 
 I am trying to use the a new version of Moses, seems like things have 
 changed quite a bit and I have hard time finding an up-to-date 
 documentation. For debugging I used very small 

Re: [Moses-support] Tuning and decoding of lattices in the new Moses.

2013-09-06 Thread Chris Dyer
Yes, there definitely should be a few checks in various places. I've
got a list of recommendations to make lattice decoding a bit easier to
get started with. We'll discuss this next week.
-C

On Fri, Sep 6, 2013 at 5:47 PM, Hieu Hoang hieuho...@gmail.com wrote:
 Good to know. I don't think it's obvious that you need that switch for
 lattice input. Maybe there should be a check of some sort in the mert scrip

 Sent while bumping into things

 On 6 Sep 2013, at 15:42, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:

 Hi Hieu,

 A quick update: I should have used the --no-filter-phrase-table flag,
 otherwise phrase table gets filtered. Thanks a lot for our help

 Yulia


 On Wed, Sep 4, 2013 at 12:34 PM, Hieu Hoang hieuho...@gmail.com wrote:

 Ok. If you're stil stuck please send me your phrase table and I'll try and
 debug it

 Sent while bumping into things

 On 4 Sep 2013, at 17:07, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:

 phrase table is not empty, it looks normal, here is the snippet:

 no one ||| aucun de ceux ||| 1 0.00157474 0.0060241 5.00684e-06 ||| 0-0
 0-1 1-2 ||| 1 166 1
 no one ||| ce que personne ||| 0.5 3.7494e-05 0.0060241 5.89199e-06 |||
 0-0 1-2 ||| 2 166 1
 no one ||| il que personne ||| 1 9.31515e-05 0.0060241 1.11289e-05 ||| 0-0
 1-2 ||| 1 166 1
 no one ||| n'est pas le seul ||| 0.0714286 0.0073779 0.0060241 4.54759e-07
 ||| 0-0 0-1 1-3 ||| 14 166 1
 no one ||| on ne ||| 0.0044 0.000152764 0.0060241 0.000497078 ||| 1-0
 0-1 ||| 225 166 1
 no one ||| pas ||| 6.5066e-05 0.000267155 0.0060241 0.294497 ||| 0-0 |||
 15369 166 1

 i don't filter the phrase table...

 I'll debug more, and Chris was going to look at it too, I will send you an
 update.

 Thanks!

 Yulia



 On Wed, Sep 4, 2013 at 10:41 AM, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 hmm, strange. the moses.ini file looks ok. There shouldn't be an issue
 with initialisation. Is the phrase-table empty?

 make sure you're not fitlering the phrase table, i don't think the filter
 script understand lattices




 On 4 September 2013 15:10, Yulia Tsvetkov yulia.tsvet...@gmail.com
 wrote:

 Hi Hieu,

 did you manage to get moses working with lattices again? it would be
 nice to get some feedback

 Sorry for not sending feedback earlier -- I was just trying to debug by
 myself before I send feedback or ask next question...

 I was able to run a pipeline with the new settings, thanks a lot for the
 detailed answer!

 There is still a problem (with feature initialization?), here is the
 first lattice translation, looks like all input words are treated as OOVs
 (and they are not), and then MERT gets killed:

 BEST TRANSLATION: no|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK
 the|UNK|UNK|UNK intense|UNK|UNK|UNK closures|UNK|UNK|UNK of|UNK|UNK|UNK
 travel|UNK|UNK|UNK and|UNK|UNK|UNK one|UNK|UNK|UNK of|UNK|UNK|UNK
 the|UNK|UNK|UNK delights|UNK|UNK|UNK of|UNK|UNK|UNK 
 ethnographic|UNK|UNK|UNK
 research|UNK|UNK|UNK is|UNK|UNK|UNK the|UNK|UNK|UNK opportunity|UNK|UNK|UNK
 to|UNK|UNK|UNK live|UNK|UNK|UNK amongst|UNK|UNK|UNK those|UNK|UNK|UNK
 who|UNK|UNK|UNK have|UNK|UNK|UNK not|UNK|UNK|UNK forgotten|UNK|UNK|UNK
 the|UNK|UNK|UNK old|UNK|UNK|UNK ways|UNK|UNK|UNK to|UNK|UNK|UNK
 still|UNK|UNK|UNK feel|UNK|UNK|UNK their|UNK|UNK|UNK pass|UNK|UNK|UNK
 in|UNK|UNK|UNK the|UNK|UNK|UNK when|UNK|UNK|UNK touch|UNK|UNK|UNK
 and|UNK|UNK|UNK stones|UNK|UNK|UNK caused|UNK|UNK|UNK by|UNK|UNK|UNK
 rain|UNK|UNK|UNK tasted|UNK|UNK|UNK leaves|UNK|UNK|UNK of|UNK|UNK|UNK
 the|UNK|UNK|UNK bitter|UNK|UNK|UNK plants|UNK|UNK|UNK
 [1]
 [total=-6405.459]
 core=(-6100.000,-50.000,61.000,0.000,0.000,0.000,0.000,-8.000,-1952.355,0.000)
 Line 0: Translation took 0.000 seconds total
 Translating line 1  in thread id 47061808453376
 sh: line 1:  7333 Killed
 /home/ytsvetko/tools/mosesdecoder/bin/moses -config filtered/moses.ini
 -inputtype 2 -weight-overwrite 'InputFeature0= 0.07 PhrasePenalty0=
 0.07 WordPenalty0= -0.33 TranslationModel0= 0.07 0.07
 0.07 0.07 Distortion0= 0.10 LM0= 0.17' -n-best-list
 run1.best100.out 100 -input-file
 /share/workhorse4/ytsvetko/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/tuning/corpus.en
  run1.out
 Exit code: 137
 The decoder died. CONFIG WAS -weight-overwrite 'InputFeature0= 0.07
 PhrasePenalty0= 0.07 WordPenalty0= -0.33 TranslationModel0= 
 0.07
 0.07 0.07 0.07 Distortion0= 0.10 LM0= 0.17'

 I attach my config file, and here is the exact command that I am
 executing:

 mert-moses.pl ./tuning/corpus.en ./tuning/corpus.fr
 /home/ytsvetko/tools/mosesdecoder/bin/moses ./moses.ini --working-dir
 ./tuning --mertdir /home/ytsvetko/tools/mosesdecoder/mert --inputtype 2


 Thanks a lot for your help!
 Yulia




 On 2 September 2013 17:03, Hieu Hoang hieu.ho...@ed.ac.uk wrote:

 Hi Yulia


 On 1 September 2013 22:46, Yulia Tsvetkov yulia.tsvet...@gmail.com
 wrote:

 Dear Moses developers,

 I am trying to use 

Re: [Moses-support] Tuning and decoding of lattices in the new Moses.

2013-09-02 Thread Hieu Hoang
Hi Yulia


On 1 September 2013 22:46, Yulia Tsvetkov yulia.tsvet...@gmail.com wrote:

 Dear Moses developers,

 I am trying to use the a new version of Moses, seems like things have
 changed quite a bit and I have hard time finding an up-to-date
 documentation. For debugging I used very small train/tune/test corpora (10
 lines each).

 First thing is running the following command produces a phrase table with
 only 4 features:
 train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name  --f
 $src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM
 -external-bin-dir $external_bin_dir`;

 Here is a snippet from a produced moses.iniPhraseDictionaryMemory
 name=TranslationModel0 table-limit=20 *num-features=4 
 *path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz
 input-factor=0 output-factor=0


Yes, the phrase-table now has 4 scores, instead of 5. The 5th score was a
constant 2.718. This has now moved into it's own feature function,
PhrasePenalty.

it save 3% of disk space, and i think is better for research. eg. create
better, non-constant phrase penalty feature functions, if we have 2 phrase
tables do we need just 1 phrase penalty? etc.


 Second, I am trying to run tuning and decoding of lattices in plf format.
 Can you point me to example commands and moses.ini for running mert and
 decoding lattices with the new Moses?

an example ini file for lattices can be seen here

https://github.com/moses-smt/moses-regression-tests/blob/master/tests/phrase.lattice-surface/moses.ini

Mert should run like it has always did. However, if you upgrade the
decoder, you should use the upgraded mert script too.

Decoding with lattice is exactly the same as for a sentence, except 2 things
   1. inputtype=2. This can be on the command line of in the ini file, eg.
   ./moses -inputtype 2

   or
[inputtype]
2

   2. You should use the InputFeature feature function. This is the score
of the path through the lattice. You can see the InputFeature in the ini
file:
  [feature]
  
  InputFeature num-features=1 num-input-features=1 real-word-count=0

  [weight]
  ...
  InputFeature0 = 1

   Before the refactoring, this was hacked into as an extra feature in the
phrase-table


 So far I tried training and tuning on text files and decoding on lattices
 because I could not figure out the right settings for tuning.
 According to some old documentation I am supposed to convert the phrase
 table to a binary format. Is it still needed?

You no longer need to convert it to binary format. It's good to convert to
binary format to save memory, but it is not required. Lattice decoding
works with all phrase-table implmentations now


 When I ran it with the following command:
 moses *-inputtype 2 -weight-i 0.62 -weight-l 12.5* -f $tune_dir/moses.ini
  $eval_dir/69.plf  $eval_dir/69.plf.out
 I got an error:
 *Don't mix old and new ini file format*
 What is the new equivalent of weight-i and weight-l?


   -weight-i 0.62
now becomes
   -weight-overwrite 'InputFeature0= 0.62'

  -weight-l 12.5
now becomes
   -weight-overwrite 'LM0= 12.5'

The updated mert script should be doing this anyway.


 Without those parameters I get a Segmentation Fault with both a .gz and a
 binary phrase table.


if you're still having problems, give me your ini file and exact command
you're executing and i'll try and debug it


 Could you help me figuring out the right settings?

 Thanks in advance.

 ___
 Moses-support mailing list
 Moses-support@mit.edu
 http://mailman.mit.edu/mailman/listinfo/moses-support




-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Tuning and decoding of lattices in the new Moses.

2013-09-01 Thread Yulia Tsvetkov
Dear Moses developers,

I am trying to use the a new version of Moses, seems like things have
changed quite a bit and I have hard time finding an up-to-date
documentation. For debugging I used very small train/tune/test corpora (10
lines each).

First thing is running the following command produces a phrase table with
only 4 features:
train-model.perl --root-dir $root_dir --corpus $root_dir/$corpus_name  --f
$src_lng --e $trg_lng --alignment grow-diag-final --lm 0:3:$LM
-external-bin-dir $external_bin_dir`;

Here is a snippet from a produced moses.iniPhraseDictionaryMemory
name=TranslationModel0 table-limit=20 *num-features=4
*path=/usr1/projects/mt_proj/mt_eval/baselines/fr-base-1-lats/model/phrase-table.gz
input-factor=0 output-factor=0

Second, I am trying to run tuning and decoding of lattices in plf format.
Can you point me to example commands and moses.ini for running mert and
decoding lattices with the new Moses?
So far I tried training and tuning on text files and decoding on lattices
because I could not figure out the right settings for tuning.
According to some old documentation I am supposed to convert the phrase
table to a binary format. Is it still needed?

When I ran it with the following command:
moses *-inputtype 2 -weight-i 0.62 -weight-l 12.5* -f $tune_dir/moses.ini 
$eval_dir/69.plf  $eval_dir/69.plf.out
I got an error:
*Don't mix old and new ini file format*
What is the new equivalent of weight-i and weight-l?

Without those parameters I get a Segmentation Fault with both a .gz and a
binary phrase table.

Could you help me figuring out the right settings?

Thanks in advance.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support