Re: [Moses-support] Steps for setting up Cache Based Models

2016-07-27 Thread Parmpreet Singh
Hi all,

Moving CBPT to foreground as suggested  by Prashant working for my use case.

Could any please suggest, how to generate word alignments for CBPT? Is there 
some tool or script available for generating alignments? I tried fast_align but 
the output format is totally different.


Any help is much appreciated. 


Kind Regards,
Parmpreet 


> On 21/07/2016, at 11:08 AM, Prashant Mathur  wrote:
> 
> Hi Parmpreet, 
> 
> Just FYI...
> CBPT is meant to work as a foreground PT but it depends on the use case of 
> course. The chances of decoder using the phrases in the cache are more if the 
> 0th phrase table is CBPT and not the background PT. In the sense that instead 
> of this order of PTs
> 
> PhraseDictionaryMemory name=TranslationModel0 num-features=4 
> path=/path/to/model/phrase-table.1.gz input-factor=0 output-factor=0
> DynamicCacheBasedLanguageModel name=CBLM0 num-features=1 cblm-score-type=12 
> cblm-query-type=1 cblm-max-age=1000
> 
> you have this order
> 
> DynamicCacheBasedLanguageModel name=CBLM0 num-features=1 cblm-score-type=12 
> cblm-query-type=1 cblm-max-age=1000
> PhraseDictionaryMemory name=TranslationModel0 num-features=4 
> path=/path/to/model/phrase-table.1.gz input-factor=0 output-factor=0
> 
> —Prashant
> 
>> On Jul 19, 2016, at 4:10 AM, Parmpreet Singh > > wrote:
>> 
>> Hi All,
>> 
>> 
>> Can anyone help me setting up Cache based Phrase tables?
>> 
>> Please see detailed information below.
>> 
>> Thanks,
>> Parmpreet Singh
>> 
>>> On 13/07/2016, at 11:52 AM, Parmpreet Singh >> > wrote:
>>> 
>>> Hi All,
>>> 
>>> I am trying to setup Dynamic Cache-Based Phrase Table for post edit 
>>> translation adaptation.
>>> 
>>> I am following this tutorial: 
>>> http://www.statmt.org/moses/?n=Advanced.CacheBased 
>>> 
>>> Adaptive MT server:
>>> https://307d7cc8-a-db0463cf-s-sites.googlegroups.com/a/fbk.eu/mt4cat/file-cabinet/AdaptiveMTserver-manual.pdf?attachauth=ANoY7crb4vDqMv94wuQREg76SnBs0jk3KdfMwTml0T78EAwNQbUgbmnvJQzUNJRbAnSm6TQwduhwfmvMa84r7JX1TXqhJjtsUSx2mOdkyrWUEiXXhBrCbWFPwcuyw575nm8Co1DP_l1aSa5Ur3v4AsFRtvmevFoLnNRuYa0bM2g7nsjsI_4s_sooPMODoVAoM7a-mcn1wI6dKgHLfpOO2DDyA0kyvTFkK4xj4w87FUGGKXglNj1uTyg%3D=0
>>>  
>>> 
>>> 
>>> Moses.ini is:   
>>> 
>>> #
>>> ### MOSES CONFIG FILE ###
>>> #
>>> 
>>> # input factors
>>> [input-factors]
>>> 0
>>> 
>>> [xml-input]
>>> inclusive
>>> 
>>> # mapping steps
>>> [mapping]
>>> 0 T 0
>>> 1 T 1
>>> 
>>> 
>>> [distortion-limit]
>>> 6
>>> 
>>> # feature functions
>>> [feature]
>>> KENLM lazyken=0 name=LM0 factor=0 path=/path/to/lm/nc.binlm.1 order=5
>>> Distortion
>>> LexicalReordering name=LexicalReordering0 num-features=6 
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
>>> path=/path/to/model/reordering-table.1.wbe-msd-bidirectional-fe.gz
>>> UnknownWordPenalty
>>> WordPenalty
>>> PhrasePenalty
>>> PhraseDictionaryMemory name=TranslationModel0 num-features=4 
>>> path=/path/to/model/phrase-table.1.gz input-factor=0 output-factor=0
>>> 
>>> PhraseDictionaryDynamicCacheBased name=CBTM0 input-factor=0 output-factor=0 
>>> num-features=1 table-limit=20 cbtm-score-type=12
>>> DynamicCacheBasedLanguageModel name=CBLM0 num-features=1 cblm-score-type=12 
>>> cblm-query-type=1 cblm-max-age=1000
>>> 
>>> # dense weights for feature functions
>>> [weight]
>>> # The default weights are NOT optimized for translation quality. You MUST 
>>> tune the weights.
>>> # Documentation for tuning is here: 
>>> http://www.statmt.org/moses/?n=FactoredTraining.Tuning 
>>>  
>>> UnknownWordPenalty0= 1
>>> WordPenalty0= -1
>>> PhrasePenalty0= 0.2
>>> TranslationModel0= 0.2 0.2 0.2 0.2
>>> LexicalReordering0= 0.3 0.3 0.3 0.3 0.3 0.3
>>> Distortion0= 0.3
>>> LM0= 0.5
>>> CBTM0= 1
>>> CBLM0= 1
>>> 
>>> 
>>> 
>>> Command to start moses SMT is: mosesdecoder/bin/moses -f /path/to/moses.ini 
>>> -t
>>> 
>>> Output of the above command is:
>>> Defined parameters (per moses.ini or switch):
>>> beam-threshold: 0.03 
>>> config: /path/to/moses.ini 
>>> distortion-limit: 6 
>>> feature: KENLM lazyken=0 name=LM0 factor=0 path=/path/to/lm/nc.binlm.1 
>>> order=5 Distortion LexicalReordering name=LexicalReordering0 num-features=6 
>>> type=wbe-msd-bidirectional-fe-allff input-factor=0 output-factor=0 
>>> path=/mnt/data/apps/models/baseline/en_fr/model/reordering-table.1.wbe-msd-bidirectional-fe.gz
>>>  

[Moses-support] wrap-xml.perl HELP

2016-07-27 Thread Despina Mouratidi
Hi all,

I want to convert a txt file to .sgm. I know that i can do it through
wrap-xml.perl script and I already see the excample,

$ scripts/wrap-xml.perl wmt08/devtest/devtest2006-ref.en.sgm en <
> working-dir/evaluation/devtest2006.output.detokenized >
> working-dir/evaluation/devtest2006.output.sgm

but its still confused. The text that I want to convert is 1000.en in
Desktop folder. Do you know how the command will be like?


Thanks in advance
Despina
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] train English to farsi binary language model

2016-07-27 Thread samane shahmohamadi
Hi i need to train my english-persian binary language model, but running
this command did not work. is that -f and -e switches right? what's the
problem

nohup nice ~/mizaan/moses/mosesdecoder/scripts/training/train-model.perl
-root-dir train -corpus ~/mizaan/corpus/Mizan.en-fa.clean -e en -f fa
-alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:$HOME/faraday/mizaan/lm/Mizan.en-fa.true.blm.fa:8 -external-bin-dir
~/mizaan/moses/mosesdecoder/tools >& training.out &
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] farsi language

2016-07-27 Thread samane.shahmohamadi
Hi guys where can I find moses scripts for farsi language?


Sent from Samsung tablet.___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


[Moses-support] Factored model configuration using stems and POS

2016-07-27 Thread Gmehlin Floran
Hi,

I am trying to build a factored translation model using stems and 
part-of-speech for a week now and I cannot have satisfying results. This 
probably comes from my factor configuration as I probably do not fully 
understand how it work (I am following the paper Factored Translation Model 
from Koehn and Hoang).

I previously built a standard phrase based model (with the same corpus) which 
gave me around 24-25 BLEU score (DE-EN). For my actual factored model, BLEU 
score is around 1 (?).

I tried opening the moses.ini's, (tuned or not) to see if I could have a 
something translated by copy/pasting some lines from the original corpus, but 
it only translates from german to german and does not recognize most of the 
words if not all.

 The motivation behind the factored model is that there are too many OOVs with 
the standard phrase-base, so I wanted to try using stems to reduce them.

I am annotating the corpus with TreeTagger and the factor configuration is as 
following :

input-factors = word stem pos
output-factors = word stem pos
alignment-factors = "word+stem -> word+stem"
translation-factors = "stem -> stem,pos -> pos"
reordering-factors = "word -> word"
generation-factors = "stem -> pos,stem+pos -> word"
decoding-steps = "t0,g0,t1,g1"

Is there something wrong with that ?

I only use a single language model over surface forms as the LM over POS yields 
a segmentation fault in the tuning phase.

Does anyone have an idea how I should configure my model to exploit stems in 
the source language ?

Thanks a lot,

Floran
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support