Dear all, after various manual set ups, I wanted to try the EMS. After trying several experiment settings I wanted to run it with multi-giza and kenlm, but I cannot get it to work (tried it again with smaller corpus, same result. I tried to continue the experiment with different fixes - no success.
The log tells me: step TUNING:tune crashed further inspection in TUNE_tune.1.STDERR in steps/1/ told me IRSTLM is messing with my project, "against" my will (at least I thought so): line=IRSTLM name=LM0 factor=0 path=/home/moses/project_test_mgiza/experiment/lm/project-syndicate.binlm.1 order=3 Exception: Error: 4 number of threads specified but IRST LM is not threadsafe. Exit code: 1 Failed to run moses with the config /home/moses/project_test_mgiza/experiment/tuning/moses.filtered.ini.1 at /home/moses/mosesdecoder/scripts/training/mert-moses.pl line 1271. cp: cannot stat ‘/home/moses/project_test_mgiza/experiment/tuning/tmp.1/moses.ini’: No such file or directory Looking up what happened in the tuning folder, I found out that moses.filtered.ini.1 has set IRSTLM for Distortion, but filtered.1/moses.ini has set KenLM for Distortion which satisfies what I hoped to get. I attached the files from above and the following is the config file of the experiment: ################################################ ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### ################################################ [GENERAL] home-dir = /home/moses working-dir = $home-dir/project_test_mgiza/experiment moses-src-dir = $home-dir/mosesdecoder moses-script-dir = $moses-src-dir/scripts moses-bin-dir = $moses-src-dir/bin external-bin-dir = $moses-src-dir/BINDIR data-dir = $home-dir/project_test_mgiza/experiment/corpus train-dir = $data-dir/training dev-dir = $data-dir/dev #irstlm-dir = $home-dir/irstlm/bin ttable-binarizer = $moses-bin-dir/processPhraseTable decoder = $moses-bin-dir/moses input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -l $input-extension -threads 4" output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -l $output-extension" input-truecaser = $moses-script-dir/recaser/truecase.perl output-truecaser = $moses-script-dir/recaser/truecase.perl detruecaser = $moses-script-dir/recaser/detruecase.perl input-extension = de output-extension = en pair-extension = de-en ################################################################# # PARALLEL CORPUS PREPARATION: # create a tokenized, sentence-aligned corpus, ready for training [CORPUS] max-sentence-length = 80 [CORPUS:project-syndicate] raw-stem = $train-dir/news-commentary-v8.$pair-extension [LM] ### tool to be used for language model training # for instance: ngram-count (SRILM), train-lm-on-disk.perl (Edinburgh) # #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores 4 -irst-dir $irstlm-dir -temp-dir $working-dir/tmp" #settings = "-s msb -p 0" #order = 3 #type = 8 #lm-binarizer = $moses-bin-dir/build_binary # path to lmplz binary lmplz = $moses-bin-dir/lmplz # order of the language model order = 3 # additional parameters to lmplz (check lmplz help message) settings = "-T $working-dir/tmp -S 10G" # this tells EMS to use lmplz and tells EMS where lmplz is located lm-training = "$moses-script-dir/generic/trainlm-lmplz.perl -lmplz $lmplz" lm-binarizer = $moses-bin-dir/build_binary [LM:project-syndicate] raw-corpus = $train-dir/news-commentary-v8.$pair-extension.$output-extension ################################################################# # TRANSLATION MODEL TRAINING [TRAINING] ### training script to be used: either a legacy script or # current moses training script (default) # #script = $moses-script-dir/training/train-model.perl ### general options # script = $moses-script-dir/training/train-model.perl training-options = "-mgiza -mgiza-cpus 4 -cores 4 \ -parallel -sort-buffer-size 10G -sort-batch-size 253 \ -sort-compress gzip -sort-parallel 10" parallel = yes ### symmetrization method to obtain word alignments from giza output # (commonly used: grow-diag-final-and) # #alignment-symmetrization-method = berkeley alignment-symmetrization-method = grow-diag-final-and ### lexicalized reordering: specify orientation type # (default: only distance-based reordering model) # lexicalized-reordering = msd-bidirectional-fe ### if word alignment (giza symmetrization) should be skipped, # point to word alignment files # #word-alignment = ### if phrase extraction should be skipped, # point to stem for extract files # #extracted-phrases = ### if phrase table training should be skipped, # point to phrase translation table # #phrase-translation-table = ### if reordering table training should be skipped, # point to reordering table # #reordering-table = ### if training should be skipped, # point to a configuration file that contains # pointers to all relevant model files # #config = ### TUNING: finding good weights for model components [TUNING] ### instead of tuning with this setting, old weights may be recycled ### tuning script to be used # tuning-script = $moses-script-dir/training/mert-moses.pl tuning-settings = "-mertdir $moses-bin-dir -threads 4" ### specify the corpus used for tuning # it should contain 100s if not 1000s of sentences # raw-input = $dev-dir/news-test2008.$input-extension raw-reference = $dev-dir/news-test2008.$output-extension ### size of n-best list used (typically 100) # nbest = 100 ### ranges for weights for random initialization # if not specified, the tuning script will use generic ranges # it is not clear, if this matters # # lambda = ### additional flags for the decoder # decoder-settings = "-threads 4" ### if tuning should be skipped, specify this here # and also point to a configuration file that contains # pointers to all relevant model files # #config = ####################################################### ## TRUECASER: train model to truecase corpora and input [TRUECASER] ### script to train truecaser models # trainer = $moses-script-dir/recaser/train-truecaser.perl ### training data # raw input needs to be still tokenized, # also also tokenized input may be specified # raw-stem = CORPUS:raw-stem ### trained model # #truecase-model = ################################## ## EVALUATION: score system output [EVALUATION] ### prepare system output for scoring # this may include detokenization and wrapping output in sgm # (needed for nist-bleu, ter, meteor) # detokenizer = "$moses-script-dir/tokenizer/detokenizer.perl -l $output-extension" decoder-settings = "-threads 4" ### should output be scored case-sensitive (default: no)? # # case-sensitive = yes ### BLEU # multi-bleu = "$moses-script-dir/generic/multi-bleu.perl -lc" # ibm-bleu = ### TER: translation error rate (BBN metric) based on edit distance # # ter = $edinburgh-script-dir/tercom_v6a.pl ### METEOR: gives credit to stem / worknet synonym matches # # meteor = [EVALUATION:newstest2010] raw-input = $dev-dir/newstest2011.$input-extension raw-reference = $dev-dir/newstest2011.$output-extension [REPORTING] ### what to do with result (default: store in file evaluation/report) # # email = [email protected] ____________________ I hope anybody can help or suggest me what to do. Thank you and kind regards Daniel
issuefiles.tar.gz
Description: application/gzip
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
