Sorry for bumping the thread again..... here's what I found after running the command by hand as suggested.
run1.init.opt: * 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.500000 -1.000000 0.200000 0.200000 0.200000 0.200000 0.200000 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 * Things noticed. *run1.features.dat has columns 3,6 all zeroes * *run1.scores.dat has columns 1,3,5,7 all zeroes * run1.features.dat: *FEATURES_TXT_BEGIN_0 0 100 14 d_0 d_1 d_2 d_3 d_4 d_5 d_6 lm_0 w_0 tm_0 tm_1 tm_2 tm_3 tm_4 -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.6613 -80.0959 -48.9898 -27.8995 17.9981 0 -20.6133 0 0 -27.4523 0 0 -1804.31 -18 -81.3889 -80.0948 -50.4425 -27.871 17.9981 -8 -16.9628 0 -1.28775 -22.4881 0 -1.16898 -1804.31 -18 -77.4669 -80.9714 -49.7377 -28.9981 17.9981 0 -21.414 0 0 -26.4089 0 0 -1804.31 -18 -79.1945 -80.9702 -51.1904 -28.9696 17.9981 -8 -16.8088 0 -1.28775 -23.0796 0 -1.16898 -1804.31 -18 -78.6891 -79.6433 -50.0519 -28.2473 17.9981 -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.1464 -79.9766 -48.9898 -28.83 17.9981* run1.scores.dat * SCORES_TXT_BEGIN_0 0 100 9 BLEU 0 18 0 17 0 16 0 15 31 0 18 0 17 0 16 0 15 31 0 18 0 17 0 16 0 15 31 0 18 0 17 0 16 0 15 31 0 18 0 17 0 16 0 15 31* *Issue: in the MIRA run BLEU score is initialized to zero * Hoping the information would help in solving the issue. Thanking you, Jayendra Rakesh. On Wed, Jun 5, 2013 at 1:52 PM, jayendra rakesh <jayendra.rak...@gmail.com>wrote: > Hi Phi, > > Thanks for the reply, I made a hand run of the command as you have > suggested and was able to repeat the crash. I have checked run1.init.opt > file, it seems to be fine > > run1.init.opt: > * 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.500000 > -1.000000 0.200000 0.200000 0.200000 0.200000 0.200000 > 0 0 0 0 0 0 0 0 0 0 0 0 0 0 > 1 1 1 1 1 1 1 1 1 1 1 1 1 1 > * > But I noticed a thing in both run1.scores.dat and run1.features.dat files, > that many *features*(*columns 3,6 are all zero)* and *scores*(*columns > 1,3,5,7 are all zero*) are having zero values, because of which i think > BLEU score is initialized to zero in mert.log > > run1.features.dat: > *FEATURES_TXT_BEGIN_0 0 100 14 d_0 d_1 d_2 d_3 d_4 d_5 d_6 lm_0 w_0 tm_0 > tm_1 tm_2 tm_3 tm_4 > -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.6613 -80.0959 > -48.9898 -27.8995 17.9981 > 0 -20.6133 0 0 -27.4523 0 0 -1804.31 -18 -81.3889 -80.0948 -50.4425 > -27.871 17.9981 > -8 -16.9628 0 -1.28775 -22.4881 0 -1.16898 -1804.31 -18 -77.4669 -80.9714 > -49.7377 -28.9981 17.9981 > 0 -21.414 0 0 -26.4089 0 0 -1804.31 -18 -79.1945 -80.9702 -51.1904 > -28.9696 17.9981 > -8 -16.8088 0 -1.28775 -23.0796 0 -1.16898 -1804.31 -18 -78.6891 -79.6433 > -50.0519 -28.2473 17.9981 > -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.1464 -79.9766 > -48.9898 -28.83 17.9981* > > > run1.scores.dat > * > SCORES_TXT_BEGIN_0 0 100 9 BLEU > 0 18 0 17 0 16 0 15 31 > 0 18 0 17 0 16 0 15 31 > 0 18 0 17 0 16 0 15 31 > 0 18 0 17 0 16 0 15 31 > 0 18 0 17 0 16 0 15 31* > > > Hoping the information would help in pointing out the issue. > > Regards, > Jayendra Rakesh > > > > On Sun, Jun 2, 2013 at 5:19 PM, Philipp Koehn <pko...@inf.ed.ac.uk> wrote: > >> Hi, >> >> I have been running kbest MIRA with factored models many times, never >> with any problems, so "this should work". >> >> The error is in the step: /tools/mosesdecoder-master_2/bin/kbmira -J >> 100 -C 0.001 --dense-init run1.init.opt --ffile run1.features.data >> [...] >> >> so that's where to start. >> >> Check if the features file looks sane. >> Check the run1.init.opt file. >> Run the step by hand. >> >> If this does not work, send us the input files for this command (maybe >> even a smaller subset, if you can reproduce the error). >> >> -phi >> >> On Sun, Jun 2, 2013 at 10:56 AM, jayendra rakesh >> <jayendra.rak...@gmail.com> wrote: >> > Hi, >> > My EMS setup (factored,MIRA) crashes at tuning stage after single run. >> > config.toy: (attaching only training and tuning sections) >> > # TRANSLATION MODEL TRAINING >> > >> > [TRAINING] >> > >> > ### training script to be used: either a legacy script or >> > # current moses training script (default) >> > # >> > script = $moses-script-dir/training/train-model.perl >> > >> > ### general options >> > # these are options that are passed on to train-model.perl, for instance >> > # * "-mgiza -mgiza-cpus 8" to use mgiza instead of giza >> > # * "-sort-buffer-size 8G -sort-compress gzip" to reduce on-disk sorting >> > # * "-sort-parallel 8 -cores 8" to speed up phrase table building >> > # >> > #training-options = "" >> > >> > ### factored training: specify here which factors used >> > # if none specified, single factor training is assumed >> > # (one translation step, surface to surface) >> > # >> > input-factors = word pos >> > output-factors = word pos >> > alignment-factors = "word -> word" >> > translation-factors = "word+pos -> word+pos" >> > reordering-factors = "word -> word" >> > #generation-factors = "pos -> word" >> > decoding-steps = "t0" >> > >> > ### parallelization of data preparation step >> > # the two directions of the data preparation can be run in parallel >> > # comment out if not needed >> > # >> > parallel = yes >> > >> > ### pre-computation for giza++ >> > # giza++ has a more efficient data structure that needs to be >> > # initialized with snt2cooc. if run in parallel, this may reduces >> > # memory requirements. set here the number of parts >> > # >> > #run-giza-in-parts = 5 >> > >> > ### symmetrization method to obtain word alignments from giza output >> > # (commonly used: grow-diag-final-and) >> > # >> > alignment-symmetrization-method = grow-diag-final-and >> > >> > ### use of berkeley aligner for word alignment >> > # >> > #use-berkeley = true >> > #alignment-symmetrization-method = berkeley >> > #berkeley-train = $moses-script-dir/ems/support/berkeley-train.sh >> > #berkeley-process = $moses-script-dir/ems/support/berkeley-process.sh >> > #berkeley-jar = /your/path/to/berkeleyaligner-1.1/berkeleyaligner.jar >> > #berkeley-java-options = "-server -mx30000m -ea" >> > #berkeley-training-options = "-Main.iters 5 5 -EMWordAligner.numThreads >> 8" >> > #berkeley-process-options = "-EMWordAligner.numThreads 8" >> > #berkeley-posterior = 0.5 >> > >> > ### use of baseline alignment model (incremental training) >> > # >> > #baseline = 68 >> > #baseline-alignment-model = >> > "$working-dir/training/prepared.$baseline/$input-extension.vcb \ >> > # $working-dir/training/prepared.$baseline/$output-extension.vcb \ >> > # >> > >> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.cooc >> > \ >> > # >> > >> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.cooc >> > \ >> > # >> > >> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.thmm.5 >> > \ >> > # >> > >> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.hhmm.5 >> > \ >> > # >> > >> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.thmm.5 >> > \ >> > # >> > >> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.hhmm.5" >> > >> > ### if word alignment should be skipped, >> > # point to word alignment files >> > # >> > #word-alignment = $working-dir/model/aligned.1 >> > >> > ### filtering some corpora with modified Moore-Lewis >> > # specify corpora to be filtered and ratio to be kept, either before or >> > after word alignment >> > #mml-filter-corpora = toy >> > #mml-before-wa = "-proportion 0.9" >> > #mml-after-wa = "-proportion 0.9" >> > >> > ### create a bilingual concordancer for the model >> > # >> > #biconcor = $moses-script-dir/ems/biconcor/biconcor >> > >> > ### lexicalized reordering: specify orientation type >> > # (default: only distance-based reordering model) >> > # >> > lexicalized-reordering = msd-bidirectional-fe >> > >> > ### hierarchical rule set >> > # >> > #hierarchical-rule-set = true >> > >> > ### settings for rule extraction >> > # >> > #extract-settings = "" >> > max-phrase-length = 5 >> > >> > ### add extracted phrases from baseline model >> > # >> > #baseline-extract = $working-dir/model/extract.$baseline >> > # >> > # requires aligned parallel corpus for re-estimating lexical translation >> > probabilities >> > #baseline-corpus = $working-dir/training/corpus.$baseline >> > #baseline-alignment = >> > $working-dir/model/aligned.$baseline.$alignment-symmetrization-method >> > >> > ### unknown word labels (target syntax only) >> > # enables use of unknown word labels during decoding >> > # label file is generated during rule extraction >> > # >> > #use-unknown-word-labels = true >> > >> > ### if phrase extraction should be skipped, >> > # point to stem for extract files >> > # >> > # extracted-phrases = >> > >> > ### settings for rule scoring >> > # >> > score-settings = "--GoodTuring" >> > >> > ### include word alignment in phrase table >> > # >> > include-word-alignment-in-rules = yes >> > >> > ### sparse lexical features >> > # >> > #sparse-lexical-features = "target-word-insertion top 50, >> > source-word-deletion top 50, word-translation top 50 50, phrase-length" >> > >> > ### domain adaptation settings >> > # options: sparse, any of: indicator, subset, ratio >> > #domain-features = "subset" >> > >> > ### if phrase table training should be skipped, >> > # point to phrase translation table >> > # >> > # phrase-translation-table = >> > >> > ### if reordering table training should be skipped, >> > # point to reordering table >> > # >> > # reordering-table = >> > >> > ### filtering the phrase table based on significance tests >> > # Johnson, Martin, Foster and Kuhn. (2007): "Improving Translation >> Quality >> > by Discarding Most of the Phrasetable" >> > # options: -n number of translations; -l 'a+e', 'a-e', or a positive >> real >> > value -log prob threshold >> > #salm-index = /path/to/project/salm/Bin/Linux/Index/IndexSA.O64 >> > #sigtest-filter = "-l a+e -n 50" >> > >> > ### if training should be skipped, >> > # point to a configuration file that contains >> > # pointers to all relevant model files >> > # >> > #config-with-reused-weights = >> > >> > ##################################################### >> > ### TUNING: finding good weights for model components >> > >> > [TUNING] >> > >> > ### instead of tuning with this setting, old weights may be recycled >> > # specify here an old configuration file with matching weights >> > # >> > #weight-config = $working-dir/model/weight.ini >> > >> > ### tuning script to be used >> > # >> > tuning-script = $moses-script-dir/training/mert-moses.pl >> > tuning-settings = "-mertdir $moses-bin-dir --batch-mira >> --return-best-dev >> > --batch-mira-args '-J 100 -C 0.001'" >> > >> > ### specify the corpus used for tuning >> > # it should contain 1000s of sentences >> > # >> > input-sgm = $toy-data/dev.en.sgm >> > #raw-input = >> > #tokenized-input = $toy-data/dev.en >> > factorized-input = $toy-data/dev.en >> > #factorized-input = >> > #input = >> > # >> > reference-sgm = $toy-data/dev.hi.sgm >> > #raw-reference = >> > factorized-reference = $toy-data/dev.hi >> > #factorized-reference = >> > #reference = >> > >> > ### size of n-best list used (typically 100) >> > # >> > nbest = 100 >> > >> > ### ranges for weights for random initialization >> > # if not specified, the tuning script will use generic ranges >> > # it is not clear, if this matters >> > # >> > # lambda = >> > >> > ### additional flags for the filter script >> > # >> > filter-settings = "" >> > >> > ### additional flags for the decoder >> > # >> > decoder-settings = "" >> > >> > ### if tuning should be skipped, specify this here >> > # and also point to a configuration file that contains >> > # pointers to all relevant model files >> > # >> > #config = >> > >> > >> > >> > >> > TUNING_tune.1.STDERR file has the following lines >> > >> > >> > >> > >> > >> > Translating line 1078 in thread id 139965279725312 >> > Translating line 1079 in thread id 139965279725312 >> > Translating line 1080 in thread id 139965279725312 >> > Translating line 1081 in thread id 139965279725312 >> > The decoder returns the scores in this order: d d d d d d d lm w tm tm >> tm tm >> > tm >> > Executing: gzip -f run1.best100.out >> > Scoring the nbestlist. >> > exec: /home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/extractor.sh >> > Executing: /home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/extractor.sh > >> > extract.out 2> extract.err >> > Executing: \cp -f init.opt run1.init.opt >> > Executing: echo 'not used' > weights.txt >> > exec: /tools/mosesdecoder-master_2/bin/kbmira -J 100 -C 0.001 >> --dense-init >> > run1.init.opt --ffile run1.features.dat --scfile run1.scores.dat$ >> > Executing: /tools/mosesdecoder-master_2/bin/kbmira -J 100 -C 0.001 >> > --dense-init run1.init.opt --ffile run1.features.dat --scfile >> run1.score$ >> > Executing: \cp -f extract.err run1.extract.err >> > Executing: \cp -f extract.out run1.extract.out >> > Executing: \cp -f mert.out run1.mert.out >> > cp: cannot stat `mert.out': No such file or directory >> > Exit code: 1 >> > Died at /tools/mosesdecoder-master_2/scripts/training/mert-moses.plline >> > 956. >> > cp: cannot stat >> `/home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/moses.ini': >> > No such file or directory >> > >> > >> > >> > >> > >> > Opening mert.log shows that the BLEU score is initialized to a value of >> > zero. On a side note, BLEU score seems to initialize fine in case of >> > non-factored models >> > >> > >> > >> > >> > kbmira with c=0.001 decay=0.999 no_shuffle=0 >> > Initialising random seed from system clock >> > ..........Initial BLEU = 0 >> > 0/1082 updates, avg loss = 0, BLEU = 0 >> > 0/1082 updates, avg loss = 0, BLEU = 0 >> > 0/1082 updates, avg loss = 0, BLEU = 0 >> > 0/1082 updates, avg loss = 0, BLEU = 0 >> > 0/1082 updates, avg loss = 0, BLEU = 0 >> > . >> > . >> > . >> > >> > Kindly do suggest a solution. >> > >> > Thanking you, >> > -- >> > - Jayendra Rakesh. >> > BTech CSD. >> > >> > _______________________________________________ >> > Moses-support mailing list >> > Moses-support@mit.edu >> > http://mailman.mit.edu/mailman/listinfo/moses-support >> > >> > > > > -- > - Jayendra Rakesh. > BTech CSD. > -- - Jayendra Rakesh. BTech CSD.
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support