Re: [Moses-support] error in installing srilm
Hi Arththika, Have you looked at the FAQ ? http://www.speech.sri.com/projects/srilm/manpages/srilm-faq.7.html I had set up SRILM long ago, and we have tcl support anyway. But, if you don't have tcl in your path or on your system, you might want to look at A1) part d. hope that helps. On Sun, Dec 8, 2013 at 8:58 PM, Arththika Paramanathan < arthiparamanat...@gmail.com> wrote: > Hi, > I faced some problems in installing srilm. Can anyone help me?. I attached > the error file. > Thank you > > -- > regards, > P.Arththika > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] error in installing srilm
Hi, I faced some problems in installing srilm. Can anyone help me?. I attached the error file. Thank you -- regards, P.Arththika srilm-error Description: Binary data ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] WARNING: directory exists but does not match parameters:
Hi, the filtering script attempts to re-use the model in the filtered directory. However, the filtered directory was called previously erroneously, so you get the error message. What should you do? Delete the filtered directory and run the filtering script afterwards. -phi On Sun, Dec 8, 2013 at 5:31 PM, renubalyan wrote: > Hi, > > I am using the http://www.statmt.org/moses/?n=Moses.Baseline to build the > baseline system. > > I have trained, tuned and tested the sentence successfully. > > However, I am stuck up at the evaluation step (page 35)- while I am > filtering using the following command: > > renu@sandeep-RS:~/Desktop/working$ > /home/renu/Desktop/mosesdecoder/scripts/training/filter-model-given-input.pl > filtered-newstest2011 mert-work/moses.ini > /home/renu/Desktop/corpus/newstest2011.true.fr -Binarizer > /home/renu/Desktop/mosesdecoder/bin/processPhraseTable > > I get the following error: > > WARNING: directory exists but does not match parameters: > (mert-work/moses.ini ne mert-work/moses.ini || > /home/renu/Desktop/working/filtered-newstest2011/input.8828 ne > /home/renu/Desktop/corpus/newstest2011.true.fr) > > > I do not understand the reason for the above error. > > Kindly help. > > Thanks > Renu > > --- > This e-mail is for the sole use of the intended recipient(s) and may > contain confidential and privileged information. If you are not the > intended recipient, please contact the sender by reply e-mail and destroy > all copies and the original message. Any unauthorized review, use, > disclosure, dissemination, forwarding, printing or copying of this email > is strictly prohibited and appropriate legal action will be taken. > --- > > ___ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] using Moses in Monolingual dialogue setting
Hi, I'm using Moses in monolingual dialogue setting as in http://aritter.github.io/mt_chat.pdf,where source and target are both in English and target is a response to source.I'd like to propose a little thought experiment in this setting, and hear what you think would happen. Suppose we have a conversation with six utterances, A1,B1,A2,B2,A3,B3 where A and B indicate speakers,and the number indicates n-th statement by the speaker. They are all in one conversation of continuous topic. Now suppose we train it using Moses in two different ways as following:1) Source file contains A1, A2, A3 and target contains B1, B2, B3 so that A1-B1 is a pair and so on.2) Source contains A1,B1,A2,B2,A3 and target contains B1,A2,B2,A3,B3, taking advantage of the fact that response is a stimulus to the next response. Then, How will the results be different and why?Since GIZA++ gets alignment in both directions, will 2) result in any of A1~B3 being the translation of any other? This may be a strange question, but I would really like to get your insight. ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
[Moses-support] WARNING: directory exists but does not match parameters:
Hi, I am using the http://www.statmt.org/moses/?n=Moses.Baseline to build the baseline system. I have trained, tuned and tested the sentence successfully. However, I am stuck up at the evaluation step (page 35)- while I am filtering using the following command: renu@sandeep-RS:~/Desktop/working$ /home/renu/Desktop/mosesdecoder/scripts/training/filter-model-given-input.pl filtered-newstest2011 mert-work/moses.ini /home/renu/Desktop/corpus/newstest2011.true.fr -Binarizer /home/renu/Desktop/mosesdecoder/bin/processPhraseTable I get the following error: WARNING: directory exists but does not match parameters: (mert-work/moses.ini ne mert-work/moses.ini || /home/renu/Desktop/working/filtered-newstest2011/input.8828 ne /home/renu/Desktop/corpus/newstest2011.true.fr) I do not understand the reason for the above error. Kindly help. Thanks Renu --- This e-mail is for the sole use of the intended recipient(s) and may contain confidential and privileged information. If you are not the intended recipient, please contact the sender by reply e-mail and destroy all copies and the original message. Any unauthorized review, use, disclosure, dissemination, forwarding, printing or copying of this email is strictly prohibited and appropriate legal action will be taken. --- ___ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support
Re: [Moses-support] error during testing
the file model/reordering-table.* is not empty but the file evaluation/*.filtered.*/reordering-table.1.*is! my test set is not empty. thank you for your answers. On Sun, Dec 8, 2013 at 3:29 PM, Hieu Hoang wrote: > everything looks ok, I'm not sure why it's segfaulting > > is the file > model/reordering-table.* > empty? If it is, then you should look in the log file > steps/*/TRAINING_build-reordering.*.STDERR > > or is > evaluation/*.filtered.*/reordering-table.1.* > empty? is your test set empty? > > > > On 8 December 2013 09:47, amir haghighi wrote: > >> yes, the parallel data is UTF8.(one is UTF8 and one is ascii). >> all of the pre-processioning steps are done with moses scripts. >> >> here is the EMS config file content: >> >> >> ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### >> >> >> [GENERAL] >> >> ### directory in which experiment is run >> # >> working-dir = /opt/tools/workingEms >> >> # specification of the language pair >> input-extension = En >> output-extension = Fa >> pair-extension = En-Fa >> >> ### directories that contain tools and data >> # >> # moses >> moses-src-dir = >> /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0 >> # >> # moses binaries >> moses-bin-dir = $moses-src-dir/bin >> # >> # moses scripts >> moses-script-dir = $moses-src-dir/scripts >> # >> # directory where GIZA++/MGIZA programs resides >> external-bin-dir = $moses-src-dir/tools >> # >> # srilm >> #srilm-dir = $moses-src-dir/srilm/bin/i686 >> # >> # irstlm >> irstlm-dir = /opt/tools/irstlm/bin >> # >> # randlm >> #randlm-dir = $moses-src-dir/randlm/bin >> # >> # data >> toy-data = /opt/tools/dataset/mizan >> >> ### basic tools >> # >> # moses decoder >> decoder = $moses-bin-dir/moses >> >> # conversion of phrase table into binary on-disk format >> ttable-binarizer = $moses-bin-dir/processPhraseTable >> >> # conversion of rule table into binary on-disk format >> #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2" >> >> # tokenizers - comment out if all your data is already tokenized >> input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l >> $input-extension" >> output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l >> $output-extension" >> >> # truecasers - comment out if you do not use the truecaser >> input-truecaser = $moses-script-dir/recaser/truecase.perl >> output-truecaser = $moses-script-dir/recaser/truecase.perl >> detruecaser = $moses-script-dir/recaser/detruecase.perl >> >> ### generic parallelizer for cluster and multi-core machines >> # you may specify a script that allows the parallel execution >> # parallizable steps (see meta file). you also need specify >> # the number of jobs (cluster) or cores (multicore) >> # >> #generic-parallelizer = >> $moses-script-dir/ems/support/generic-parallelizer.perl >> #generic-parallelizer = >> $moses-script-dir/ems/support/generic-multicore-parallelizer.perl >> >> ### cluster settings (if run on a cluster machine) >> # number of jobs to be submitted in parallel >> # >> #jobs = 10 >> >> # arguments to qsub when scheduling a job >> #qsub-settings = "" >> >> # project for priviledges and usage accounting >> #qsub-project = iccs_smt >> >> # memory and time >> #qsub-memory = 4 >> #qsub-hours = 48 >> >> ### multi-core settings >> # when the generic parallelizer is used, the number of cores >> # specified here >> cores = 8 >> >> # >> # PARALLEL CORPUS PREPARATION: >> # create a tokenized, sentence-aligned corpus, ready for training >> >> [CORPUS] >> >> ### long sentences are filtered out, since they slow down GIZA++ >> # and are a less reliable source of data. set here the maximum >> # length of a sentence >> # >> max-sentence-length = 80 >> >> [CORPUS:toy] >> >> ### command to run to get raw corpus files >> # >> # get-corpus-script = >> >> ### raw corpus files (untokenized, but sentence aligned) >> # >> raw-stem = $toy-data/M_Tr >> >> ### tokenized corpus files (may contain long sentences) >> # >> #tokenized-stem = >> >> ### if sentence filtering should be skipped, >> # point to the clean training data >> # >> #clean-stem = >> >> ### if corpus preparation should be skipped, >> # point to the prepared training data >> # >> #lowercased-stem = >> >> # >> # LANGUAGE MODEL TRAINING >> >> [LM] >> >> ### tool to be used for language model training >> # srilm >> #lm-training = $srilm-dir/ngram-count >> #settings = "-interpolate -kndiscount -unk" >> >> # irstlm training >> # msb = modified kneser ney; p=0 no singleton pruning >> #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores >> $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp" >> #settings = "-s msb -p 0" >> >> # order of the language model >> order = 5 >> >> ### tool to be used for training randomized language mode
Re: [Moses-support] error during testing
everything looks ok, I'm not sure why it's segfaulting is the file model/reordering-table.* empty? If it is, then you should look in the log file steps/*/TRAINING_build-reordering.*.STDERR or is evaluation/*.filtered.*/reordering-table.1.* empty? is your test set empty? On 8 December 2013 09:47, amir haghighi wrote: > yes, the parallel data is UTF8.(one is UTF8 and one is ascii). > all of the pre-processioning steps are done with moses scripts. > > here is the EMS config file content: > > > ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### > > > [GENERAL] > > ### directory in which experiment is run > # > working-dir = /opt/tools/workingEms > > # specification of the language pair > input-extension = En > output-extension = Fa > pair-extension = En-Fa > > ### directories that contain tools and data > # > # moses > moses-src-dir = > /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0 > # > # moses binaries > moses-bin-dir = $moses-src-dir/bin > # > # moses scripts > moses-script-dir = $moses-src-dir/scripts > # > # directory where GIZA++/MGIZA programs resides > external-bin-dir = $moses-src-dir/tools > # > # srilm > #srilm-dir = $moses-src-dir/srilm/bin/i686 > # > # irstlm > irstlm-dir = /opt/tools/irstlm/bin > # > # randlm > #randlm-dir = $moses-src-dir/randlm/bin > # > # data > toy-data = /opt/tools/dataset/mizan > > ### basic tools > # > # moses decoder > decoder = $moses-bin-dir/moses > > # conversion of phrase table into binary on-disk format > ttable-binarizer = $moses-bin-dir/processPhraseTable > > # conversion of rule table into binary on-disk format > #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2" > > # tokenizers - comment out if all your data is already tokenized > input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l > $input-extension" > output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l > $output-extension" > > # truecasers - comment out if you do not use the truecaser > input-truecaser = $moses-script-dir/recaser/truecase.perl > output-truecaser = $moses-script-dir/recaser/truecase.perl > detruecaser = $moses-script-dir/recaser/detruecase.perl > > ### generic parallelizer for cluster and multi-core machines > # you may specify a script that allows the parallel execution > # parallizable steps (see meta file). you also need specify > # the number of jobs (cluster) or cores (multicore) > # > #generic-parallelizer = > $moses-script-dir/ems/support/generic-parallelizer.perl > #generic-parallelizer = > $moses-script-dir/ems/support/generic-multicore-parallelizer.perl > > ### cluster settings (if run on a cluster machine) > # number of jobs to be submitted in parallel > # > #jobs = 10 > > # arguments to qsub when scheduling a job > #qsub-settings = "" > > # project for priviledges and usage accounting > #qsub-project = iccs_smt > > # memory and time > #qsub-memory = 4 > #qsub-hours = 48 > > ### multi-core settings > # when the generic parallelizer is used, the number of cores > # specified here > cores = 8 > > # > # PARALLEL CORPUS PREPARATION: > # create a tokenized, sentence-aligned corpus, ready for training > > [CORPUS] > > ### long sentences are filtered out, since they slow down GIZA++ > # and are a less reliable source of data. set here the maximum > # length of a sentence > # > max-sentence-length = 80 > > [CORPUS:toy] > > ### command to run to get raw corpus files > # > # get-corpus-script = > > ### raw corpus files (untokenized, but sentence aligned) > # > raw-stem = $toy-data/M_Tr > > ### tokenized corpus files (may contain long sentences) > # > #tokenized-stem = > > ### if sentence filtering should be skipped, > # point to the clean training data > # > #clean-stem = > > ### if corpus preparation should be skipped, > # point to the prepared training data > # > #lowercased-stem = > > # > # LANGUAGE MODEL TRAINING > > [LM] > > ### tool to be used for language model training > # srilm > #lm-training = $srilm-dir/ngram-count > #settings = "-interpolate -kndiscount -unk" > > # irstlm training > # msb = modified kneser ney; p=0 no singleton pruning > #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores $cores > -irst-dir $irstlm-dir -temp-dir $working-dir/tmp" > #settings = "-s msb -p 0" > > # order of the language model > order = 5 > > ### tool to be used for training randomized language model from scratch > # (more commonly, a SRILM is trained) > # > #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8" > > ### script to use for binary table format for irstlm or kenlm > # (default: no binarization) > > # irstlm > #lm-binarizer = $irstlm-dir/compile-lm > > # kenlm, also set type to 8 > #lm-binarizer = $moses-bin-dir/build_binary > #type = 8 > > ### script to create quantized lang
Re: [Moses-support] error during testing
yes, the parallel data is UTF8.(one is UTF8 and one is ascii). all of the pre-processioning steps are done with moses scripts. here is the EMS config file content: ### CONFIGURATION FILE FOR AN SMT EXPERIMENT ### [GENERAL] ### directory in which experiment is run # working-dir = /opt/tools/workingEms # specification of the language pair input-extension = En output-extension = Fa pair-extension = En-Fa ### directories that contain tools and data # # moses moses-src-dir = /opt/tools/mosesdecoder-RELEASE-1.0/mosesdecoder-RELEASE-1.0 # # moses binaries moses-bin-dir = $moses-src-dir/bin # # moses scripts moses-script-dir = $moses-src-dir/scripts # # directory where GIZA++/MGIZA programs resides external-bin-dir = $moses-src-dir/tools # # srilm #srilm-dir = $moses-src-dir/srilm/bin/i686 # # irstlm irstlm-dir = /opt/tools/irstlm/bin # # randlm #randlm-dir = $moses-src-dir/randlm/bin # # data toy-data = /opt/tools/dataset/mizan ### basic tools # # moses decoder decoder = $moses-bin-dir/moses # conversion of phrase table into binary on-disk format ttable-binarizer = $moses-bin-dir/processPhraseTable # conversion of rule table into binary on-disk format #ttable-binarizer = "$moses-bin-dir/CreateOnDiskPt 1 1 5 100 2" # tokenizers - comment out if all your data is already tokenized input-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l $input-extension" output-tokenizer = "$moses-script-dir/tokenizer/tokenizer.perl -a -l $output-extension" # truecasers - comment out if you do not use the truecaser input-truecaser = $moses-script-dir/recaser/truecase.perl output-truecaser = $moses-script-dir/recaser/truecase.perl detruecaser = $moses-script-dir/recaser/detruecase.perl ### generic parallelizer for cluster and multi-core machines # you may specify a script that allows the parallel execution # parallizable steps (see meta file). you also need specify # the number of jobs (cluster) or cores (multicore) # #generic-parallelizer = $moses-script-dir/ems/support/generic-parallelizer.perl #generic-parallelizer = $moses-script-dir/ems/support/generic-multicore-parallelizer.perl ### cluster settings (if run on a cluster machine) # number of jobs to be submitted in parallel # #jobs = 10 # arguments to qsub when scheduling a job #qsub-settings = "" # project for priviledges and usage accounting #qsub-project = iccs_smt # memory and time #qsub-memory = 4 #qsub-hours = 48 ### multi-core settings # when the generic parallelizer is used, the number of cores # specified here cores = 8 # # PARALLEL CORPUS PREPARATION: # create a tokenized, sentence-aligned corpus, ready for training [CORPUS] ### long sentences are filtered out, since they slow down GIZA++ # and are a less reliable source of data. set here the maximum # length of a sentence # max-sentence-length = 80 [CORPUS:toy] ### command to run to get raw corpus files # # get-corpus-script = ### raw corpus files (untokenized, but sentence aligned) # raw-stem = $toy-data/M_Tr ### tokenized corpus files (may contain long sentences) # #tokenized-stem = ### if sentence filtering should be skipped, # point to the clean training data # #clean-stem = ### if corpus preparation should be skipped, # point to the prepared training data # #lowercased-stem = # # LANGUAGE MODEL TRAINING [LM] ### tool to be used for language model training # srilm #lm-training = $srilm-dir/ngram-count #settings = "-interpolate -kndiscount -unk" # irstlm training # msb = modified kneser ney; p=0 no singleton pruning #lm-training = "$moses-script-dir/generic/trainlm-irst2.perl -cores $cores -irst-dir $irstlm-dir -temp-dir $working-dir/tmp" #settings = "-s msb -p 0" # order of the language model order = 5 ### tool to be used for training randomized language model from scratch # (more commonly, a SRILM is trained) # #rlm-training = "$randlm-dir/buildlm -falsepos 8 -values 8" ### script to use for binary table format for irstlm or kenlm # (default: no binarization) # irstlm #lm-binarizer = $irstlm-dir/compile-lm # kenlm, also set type to 8 #lm-binarizer = $moses-bin-dir/build_binary #type = 8 ### script to create quantized language model format (irstlm) # (default: no quantization) # #lm-quantizer = $irstlm-dir/quantize-lm ### script to use for converting into randomized table format # (default: no randomization) # #lm-randomizer = "$randlm-dir/buildlm -falsepos 8 -values 8" ### each language model to be used has its own section here [LM:toy] ### command to run to get raw corpus files # #get-corpus-script = "" ### raw corpus (untokenized) # raw-corpus = $toy-data/M_Tr.$output-extension ### tokenized corpus files (may contain long sentences) # #tokenized-corpus = ### if corpus preparation should be skipped, # point to the prepared language mode