Re: [Moses-support] Mix of source and target languages in Moses translation

Barry Haddow Wed, 06 Apr 2011 03:56:44 -0700

Hi Nakul

Some of your source words are untranslated because there are no corresponding 
entries in your phrase table. If moses cannot find a translation in the 
phrase table, it cannot translate it, and by default will pass it straight 
through. If you don't want unknowns in the output, then use the -drop-unknown 
flag.


In particular, the following source tokens are unknown:

Even
require
recommendation.

You may want to lowercase and tokenise the source before passing it to moses,

best regards - Barry

On Wednesday 06 April 2011 10:47, nakul sharma wrote:
> Hi All,
>
> i am undertaking English to regional language translation.
> i have tuned the corpus using mert-moses.pl using following command:-
>
>  ./mert-moses.pl corpus/corpus.lowercased.en corpus/corpus.lowercased.hi
> /home/nakul/moses/mosesdecoder/trunk/moses-cmd/src/moses model/moses.ini
> --working-dir corpus/tuning/mert  --rootdir
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/ --decoder-flags "-v 0" >& mert.out &
>
> the moses.ini, i got after training (at corpus/tuning/mert) has following
> contents :-
>
>
> # MERT optimized configuration
> # decoder /home/nakul/moses/mosesdecoder/trunk/moses-cmd/src/moses
> # BLEU 0.612451 -> 0.612451 on dev
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/corpus/corpus.lowercased.en
>
> # We were before running iteration 8
> # finished बुध  अप्रेल  6 14:40:28 IST 2011
> ### MOSES CONFIG FILE ###
> #########################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: source-factors, target-factors, number of scores,
> file
> [ttable-file]
> 0 0 0 5
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/phrase-table.gz
>
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 0 0 3 /home/nakul/moses/mosesdecoder/trunk/scripts/training/lm_hin.lm
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 20
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 wbe-msd-bidirectional-fe-allff 6
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/reordering-table.wbe-msd-bidirectional-fe.gz
>
>
> # distortion (reordering) weight
> [weight-d]
> 0.009645
> 0.021906
> 0.008725
> 0.032902
> 0.019846
> 0.002462
> 0.016001
>
> # language model weights
> [weight-l]
> 0.022668
>
>
> # translation model weights
> [weight-t]
> -0.008632
> 0.019782
> 0.228404
> -0.005989
> -0.499862
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> -0.103176
>
> [distortion-limit]
> 6
>
> [v]
> 0
>
>
> the contents of initial moses.ini is as follows:-
>
> #########################
> ### MOSES CONFIG FILE ###
> #########################
>
> # input factors
> [input-factors]
> 0
>
> # mapping steps
> [mapping]
> 0 T 0
>
> # translation tables: source-factors, target-factors, number of scores,
> file
> [ttable-file]
> 0 0 0 5
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/phrase-table.gz
>
>
> # no generation models, no generation-file section
>
> # language models: type(srilm/irstlm), factors, order, file
> [lmodel-file]
> 0 0 3 /home/nakul/moses/mosesdecoder/trunk/scripts/training/lm_hin.lm
>
>
> # limit on how many phrase translations e for each phrase f are loaded
> # 0 = all elements loaded
> [ttable-limit]
> 20
>
> # distortion (reordering) files
> [distortion-file]
> 0-0 wbe-msd-bidirectional-fe-allff 6
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/reordering-table.wbe-msd-bidirectional-fe.gz
>
>
> # distortion (reordering) weight
> [weight-d]
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
> 0.3
>
> # language model weights
> [weight-l]
> 0.5000
>
>
> # translation model weights
> [weight-t]
> 0.2
> 0.2
> 0.2
> 0.2
> 0.2
>
> # no generation models, no weight-generation section
>
> # word penalty
> [weight-w]
> -1
>
> [distortion-limit]
> 6
>
> upon undertaking the translation i get mix of regional lang and english
> words as follows:-
> echo "Even amendments would require recommendation." | TMP=/tmp
> /home/nakul/moses/mosesdecoder/trunk/moses-cmd/src/moses -f
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/corpus/tuning/mert/moses.ini
>
> >output.txt
>
> Defined parameters (per moses.ini or switch):
>     config:
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/corpus/tuning/mert/moses.ini
>
>     distortion-file: 0-0 wbe-msd-bidirectional-fe-allff 6
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/reordering-table.wbe-msd-bidirectional-fe.gz
>
>     distortion-limit: 6
>     input-factors: 0
>     lmodel-file: 0 0 3
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/lm_hin.lm
>     mapping: 0 T 0
>     ttable-file: 0 0 0 5
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/phrase-table.gz
>
>     ttable-limit: 20
>     v: 0
>     weight-d: 0.009645 0.021906 0.008725 0.032902 0.019846 0.002462
> 0.016001
>     weight-l: 0.022668
>     weight-t: -0.008632 0.019782 0.228404 -0.005989 -0.499862
>     weight-w: -0.103176
> Loading lexical distortion models...have 1 models
> Creating lexical reordering...
> weights: 0.022 0.009 0.033 0.020 0.002 0.016
> Loading table into memory...done.
> Start loading LanguageModel
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/lm_hin.lm : [1.000]
> seconds
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/lm_hin.lm: line 122:
> warning: non-zero probability for <unk> in closed-vocabulary LM
> Finished loading LanguageModels : [1.000] seconds
> Start loading PhraseTable
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/phrase-table.gz
>
> : [1.000] seconds
>
> filePath:
> /home/nakul/moses/mosesdecoder/trunk/scripts/training/moses-scripts/scripts
>-20110405-1055/training/model/phrase-table.gz
>
> Finished loading phrase tables : [1.000] seconds
> IO from STDOUT/STDIN
> Created input-output object : [1.000] seconds
> Translating: Even amendments would require recommendation.
>
> Collecting options took 0.000 seconds
> Search took 0.000 seconds
> BEST TRANSLATION: संशोधनों सहित Even|UNK|UNK|UNK प्रभाव require|UNK|UNK|UNK
> recommendation.|UNK|UNK|UNK [11111]  [total=-301.895] <<-4.000, -6.000,
> -300.000, 0.000, 0.000, -1.022, -0.511, -1.609, 0.000, -48.665, -1.792,
> -2.303, -1.386, -3.624, 2.000>>
> Translation took 0.000 seconds
> Finished translating
> End. : [1.000] seconds
>
>
> Please tell how to improve upon this translation, i have already tuned the
> system but problem presists.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Mix of source and target languages in Moses translation

Reply via email to