Sorry for bumping the thread again.....
here's what I found after running the command by hand as suggested.

run1.init.opt:
* 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.500000
-1.000000 0.200000 0.200000 0.200000 0.200000 0.200000
 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 1 1 1 1 1 1 1 1 1 1 1 1 1 1
*
Things noticed.
*run1.features.dat has columns 3,6 all  zeroes
*
*run1.scores.dat has columns 1,3,5,7 all zeroes
*

run1.features.dat:
*FEATURES_TXT_BEGIN_0 0 100 14 d_0 d_1 d_2 d_3 d_4 d_5 d_6 lm_0 w_0 tm_0
tm_1 tm_2 tm_3 tm_4
-8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.6613 -80.0959
-48.9898 -27.8995 17.9981
0 -20.6133 0 0 -27.4523 0 0 -1804.31 -18 -81.3889 -80.0948 -50.4425 -27.871
17.9981
-8 -16.9628 0 -1.28775 -22.4881 0 -1.16898 -1804.31 -18 -77.4669 -80.9714
-49.7377 -28.9981 17.9981
0 -21.414 0 0 -26.4089 0 0 -1804.31 -18 -79.1945 -80.9702 -51.1904 -28.9696
17.9981
-8 -16.8088 0 -1.28775 -23.0796 0 -1.16898 -1804.31 -18 -78.6891 -79.6433
-50.0519 -28.2473 17.9981
-8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.1464 -79.9766
-48.9898 -28.83 17.9981*


run1.scores.dat
*
SCORES_TXT_BEGIN_0 0 100 9 BLEU
0 18 0 17 0 16 0 15 31
0 18 0 17 0 16 0 15 31
0 18 0 17 0 16 0 15 31
0 18 0 17 0 16 0 15 31
0 18 0 17 0 16 0 15 31*

*Issue: in the MIRA run BLEU score is initialized to zero
*
Hoping the information would help in solving the issue.

Thanking you,
Jayendra Rakesh.


On Wed, Jun 5, 2013 at 1:52 PM, jayendra rakesh
<jayendra.rak...@gmail.com>wrote:

> Hi Phi,
>
> Thanks for the reply, I made a hand run of the command as you have
> suggested and was able to repeat the crash. I have checked run1.init.opt
> file, it seems to be fine
>
> run1.init.opt:
> * 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.300000 0.500000
> -1.000000 0.200000 0.200000 0.200000 0.200000 0.200000
>  0 0 0 0 0 0 0 0 0 0 0 0 0 0
>  1 1 1 1 1 1 1 1 1 1 1 1 1 1
> *
> But I noticed a thing in both run1.scores.dat and run1.features.dat files,
> that many *features*(*columns 3,6 are all zero)* and *scores*(*columns
> 1,3,5,7 are all zero*) are having zero values, because of which i think
> BLEU score is initialized to zero in mert.log
>
> run1.features.dat:
> *FEATURES_TXT_BEGIN_0 0 100 14 d_0 d_1 d_2 d_3 d_4 d_5 d_6 lm_0 w_0 tm_0
> tm_1 tm_2 tm_3 tm_4
> -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.6613 -80.0959
> -48.9898 -27.8995 17.9981
> 0 -20.6133 0 0 -27.4523 0 0 -1804.31 -18 -81.3889 -80.0948 -50.4425
> -27.871 17.9981
> -8 -16.9628 0 -1.28775 -22.4881 0 -1.16898 -1804.31 -18 -77.4669 -80.9714
> -49.7377 -28.9981 17.9981
> 0 -21.414 0 0 -26.4089 0 0 -1804.31 -18 -79.1945 -80.9702 -51.1904
> -28.9696 17.9981
> -8 -16.8088 0 -1.28775 -23.0796 0 -1.16898 -1804.31 -18 -78.6891 -79.6433
> -50.0519 -28.2473 17.9981
> -8 -16.1621 0 -1.28775 -23.5316 0 -1.16898 -1804.31 -18 -79.1464 -79.9766
> -48.9898 -28.83 17.9981*
>
>
> run1.scores.dat
> *
> SCORES_TXT_BEGIN_0 0 100 9 BLEU
> 0 18 0 17 0 16 0 15 31
> 0 18 0 17 0 16 0 15 31
> 0 18 0 17 0 16 0 15 31
> 0 18 0 17 0 16 0 15 31
> 0 18 0 17 0 16 0 15 31*
>
>
> Hoping the information would help in pointing out the issue.
>
> Regards,
> Jayendra Rakesh
>
>
>
> On Sun, Jun 2, 2013 at 5:19 PM, Philipp Koehn <pko...@inf.ed.ac.uk> wrote:
>
>> Hi,
>>
>> I have been running kbest MIRA with factored models many times, never
>> with any problems, so "this should work".
>>
>> The error is in the step:  /tools/mosesdecoder-master_2/bin/kbmira -J
>> 100 -C 0.001  --dense-init run1.init.opt  --ffile run1.features.data
>> [...]
>>
>> so that's where to start.
>>
>> Check if the features file looks sane.
>> Check the run1.init.opt file.
>> Run the step by hand.
>>
>> If this does not work, send us the input files for this command (maybe
>> even a smaller subset, if you can reproduce the error).
>>
>> -phi
>>
>> On Sun, Jun 2, 2013 at 10:56 AM, jayendra rakesh
>> <jayendra.rak...@gmail.com> wrote:
>> > Hi,
>> > My EMS setup (factored,MIRA) crashes at tuning stage after single run.
>> > config.toy: (attaching only training and tuning sections)
>> > # TRANSLATION MODEL TRAINING
>> >
>> > [TRAINING]
>> >
>> > ### training script to be used: either a legacy script or
>> > # current moses training script (default)
>> > #
>> > script = $moses-script-dir/training/train-model.perl
>> >
>> > ### general options
>> > # these are options that are passed on to train-model.perl, for instance
>> > # * "-mgiza -mgiza-cpus 8" to use mgiza instead of giza
>> > # * "-sort-buffer-size 8G -sort-compress gzip" to reduce on-disk sorting
>> > # * "-sort-parallel 8 -cores 8" to speed up phrase table building
>> > #
>> > #training-options = ""
>> >
>> > ### factored training: specify here which factors used
>> > # if none specified, single factor training is assumed
>> > # (one translation step, surface to surface)
>> > #
>> > input-factors = word pos
>> > output-factors = word pos
>> > alignment-factors = "word -> word"
>> > translation-factors = "word+pos -> word+pos"
>> > reordering-factors = "word -> word"
>> > #generation-factors = "pos -> word"
>> > decoding-steps = "t0"
>> >
>> > ### parallelization of data preparation step
>> > # the two directions of the data preparation can be run in parallel
>> > # comment out if not needed
>> > #
>> > parallel = yes
>> >
>> > ### pre-computation for giza++
>> > # giza++ has a more efficient data structure that needs to be
>> > # initialized with snt2cooc. if run in parallel, this may reduces
>> > # memory requirements. set here the number of parts
>> > #
>> > #run-giza-in-parts = 5
>> >
>> > ### symmetrization method to obtain word alignments from giza output
>> > # (commonly used: grow-diag-final-and)
>> > #
>> > alignment-symmetrization-method = grow-diag-final-and
>> >
>> > ### use of berkeley aligner for word alignment
>> > #
>> > #use-berkeley = true
>> > #alignment-symmetrization-method = berkeley
>> > #berkeley-train = $moses-script-dir/ems/support/berkeley-train.sh
>> > #berkeley-process =  $moses-script-dir/ems/support/berkeley-process.sh
>> > #berkeley-jar = /your/path/to/berkeleyaligner-1.1/berkeleyaligner.jar
>> > #berkeley-java-options = "-server -mx30000m -ea"
>> > #berkeley-training-options = "-Main.iters 5 5 -EMWordAligner.numThreads
>> 8"
>> > #berkeley-process-options = "-EMWordAligner.numThreads 8"
>> > #berkeley-posterior = 0.5
>> >
>> > ### use of baseline alignment model (incremental training)
>> > #
>> > #baseline = 68
>> > #baseline-alignment-model =
>> > "$working-dir/training/prepared.$baseline/$input-extension.vcb \
>> > #  $working-dir/training/prepared.$baseline/$output-extension.vcb \
>> > #
>> >
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.cooc
>> > \
>> > #
>> >
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.cooc
>> > \
>> > #
>> >
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.thmm.5
>> > \
>> > #
>> >
>> $working-dir/training/giza.$baseline/${output-extension}-$input-extension.hhmm.5
>> > \
>> > #
>> >
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.thmm.5
>> > \
>> > #
>> >
>> $working-dir/training/giza-inverse.$baseline/${input-extension}-$output-extension.hhmm.5"
>> >
>> > ### if word alignment should be skipped,
>> > # point to word alignment files
>> > #
>> > #word-alignment = $working-dir/model/aligned.1
>> >
>> > ### filtering some corpora with modified Moore-Lewis
>> > # specify corpora to be filtered and ratio to be kept, either before or
>> > after word alignment
>> > #mml-filter-corpora = toy
>> > #mml-before-wa = "-proportion 0.9"
>> > #mml-after-wa = "-proportion 0.9"
>> >
>> > ### create a bilingual concordancer for the model
>> > #
>> > #biconcor = $moses-script-dir/ems/biconcor/biconcor
>> >
>> > ### lexicalized reordering: specify orientation type
>> > # (default: only distance-based reordering model)
>> > #
>> > lexicalized-reordering = msd-bidirectional-fe
>> >
>> > ### hierarchical rule set
>> > #
>> > #hierarchical-rule-set = true
>> >
>> > ### settings for rule extraction
>> > #
>> > #extract-settings = ""
>> > max-phrase-length = 5
>> >
>> > ### add extracted phrases from baseline model
>> > #
>> > #baseline-extract = $working-dir/model/extract.$baseline
>> > #
>> > # requires aligned parallel corpus for re-estimating lexical translation
>> > probabilities
>> > #baseline-corpus = $working-dir/training/corpus.$baseline
>> > #baseline-alignment =
>> > $working-dir/model/aligned.$baseline.$alignment-symmetrization-method
>> >
>> > ### unknown word labels (target syntax only)
>> > # enables use of unknown word labels during decoding
>> > # label file is generated during rule extraction
>> > #
>> > #use-unknown-word-labels = true
>> >
>> > ### if phrase extraction should be skipped,
>> > # point to stem for extract files
>> > #
>> > # extracted-phrases =
>> >
>> > ### settings for rule scoring
>> > #
>> > score-settings = "--GoodTuring"
>> >
>> > ### include word alignment in phrase table
>> > #
>> > include-word-alignment-in-rules = yes
>> >
>> > ### sparse lexical features
>> > #
>> > #sparse-lexical-features = "target-word-insertion top 50,
>> > source-word-deletion top 50, word-translation top 50 50, phrase-length"
>> >
>> > ### domain adaptation settings
>> > # options: sparse, any of: indicator, subset, ratio
>> > #domain-features = "subset"
>> >
>> > ### if phrase table training should be skipped,
>> > # point to phrase translation table
>> > #
>> > # phrase-translation-table =
>> >
>> > ### if reordering table training should be skipped,
>> > # point to reordering table
>> > #
>> > # reordering-table =
>> >
>> > ### filtering the phrase table based on significance tests
>> > # Johnson, Martin, Foster and Kuhn. (2007): "Improving Translation
>> Quality
>> > by Discarding Most of the Phrasetable"
>> > # options: -n number of translations; -l 'a+e', 'a-e', or a positive
>> real
>> > value -log prob threshold
>> > #salm-index = /path/to/project/salm/Bin/Linux/Index/IndexSA.O64
>> > #sigtest-filter = "-l a+e -n 50"
>> >
>> > ### if training should be skipped,
>> > # point to a configuration file that contains
>> > # pointers to all relevant model files
>> > #
>> > #config-with-reused-weights =
>> >
>> > #####################################################
>> > ### TUNING: finding good weights for model components
>> >
>> > [TUNING]
>> >
>> > ### instead of tuning with this setting, old weights may be recycled
>> > # specify here an old configuration file with matching weights
>> > #
>> > #weight-config = $working-dir/model/weight.ini
>> >
>> > ### tuning script to be used
>> > #
>> > tuning-script = $moses-script-dir/training/mert-moses.pl
>> > tuning-settings = "-mertdir $moses-bin-dir --batch-mira
>> --return-best-dev
>> > --batch-mira-args '-J 100 -C 0.001'"
>> >
>> > ### specify the corpus used for tuning
>> > # it should contain 1000s of sentences
>> > #
>> > input-sgm = $toy-data/dev.en.sgm
>> > #raw-input =
>> > #tokenized-input = $toy-data/dev.en
>> > factorized-input = $toy-data/dev.en
>> > #factorized-input =
>> > #input =
>> > #
>> > reference-sgm = $toy-data/dev.hi.sgm
>> > #raw-reference =
>> > factorized-reference = $toy-data/dev.hi
>> > #factorized-reference =
>> > #reference =
>> >
>> > ### size of n-best list used (typically 100)
>> > #
>> > nbest = 100
>> >
>> > ### ranges for weights for random initialization
>> > # if not specified, the tuning script will use generic ranges
>> > # it is not clear, if this matters
>> > #
>> > # lambda =
>> >
>> > ### additional flags for the filter script
>> > #
>> > filter-settings = ""
>> >
>> > ### additional flags for the decoder
>> > #
>> > decoder-settings = ""
>> >
>> > ### if tuning should be skipped, specify this here
>> > # and also point to a configuration file that contains
>> > # pointers to all relevant model files
>> > #
>> > #config =
>> >
>> >
>> >
>> >
>> > TUNING_tune.1.STDERR file has the following lines
>> >
>> >
>> >
>> >
>> >
>> > Translating line 1078  in thread id 139965279725312
>> > Translating line 1079  in thread id 139965279725312
>> > Translating line 1080  in thread id 139965279725312
>> > Translating line 1081  in thread id 139965279725312
>> > The decoder returns the scores in this order: d d d d d d d lm w tm tm
>> tm tm
>> > tm
>> > Executing: gzip -f run1.best100.out
>> > Scoring the nbestlist.
>> > exec: /home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/extractor.sh
>> > Executing: /home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/extractor.sh >
>> > extract.out 2> extract.err
>> > Executing: \cp -f init.opt run1.init.opt
>> > Executing: echo 'not used' > weights.txt
>> > exec: /tools/mosesdecoder-master_2/bin/kbmira -J 100 -C 0.001
>>  --dense-init
>> > run1.init.opt  --ffile run1.features.dat --scfile run1.scores.dat$
>> > Executing: /tools/mosesdecoder-master_2/bin/kbmira -J 100 -C 0.001
>> > --dense-init run1.init.opt  --ffile run1.features.dat --scfile
>> run1.score$
>> > Executing: \cp -f extract.err run1.extract.err
>> > Executing: \cp -f extract.out run1.extract.out
>> > Executing: \cp -f mert.out run1.mert.out
>> > cp: cannot stat `mert.out': No such file or directory
>> > Exit code: 1
>> > Died at /tools/mosesdecoder-master_2/scripts/training/mert-moses.plline
>> > 956.
>> > cp: cannot stat
>> `/home/eilmt/wrk-dir/wrk-jhu-fact/tuning/tmp.1/moses.ini':
>> > No such file or directory
>> >
>> >
>> >
>> >
>> >
>> > Opening mert.log shows that the  BLEU score is initialized to a value of
>> > zero. On a side note, BLEU score seems to initialize fine in case of
>> > non-factored models
>> >
>> >
>> >
>> >
>> > kbmira with c=0.001 decay=0.999 no_shuffle=0
>> > Initialising random seed from system clock
>> > ..........Initial BLEU = 0
>> > 0/1082 updates, avg loss = 0, BLEU = 0
>> > 0/1082 updates, avg loss = 0, BLEU = 0
>> > 0/1082 updates, avg loss = 0, BLEU = 0
>> > 0/1082 updates, avg loss = 0, BLEU = 0
>> > 0/1082 updates, avg loss = 0, BLEU = 0
>> > .
>> > .
>> > .
>> >
>> > Kindly do suggest a solution.
>> >
>> > Thanking you,
>> > --
>> > - Jayendra Rakesh.
>> >    BTech CSD.
>> >
>> > _______________________________________________
>> > Moses-support mailing list
>> > Moses-support@mit.edu
>> > http://mailman.mit.edu/mailman/listinfo/moses-support
>> >
>>
>
>
>
> --
> - Jayendra Rakesh.
>    BTech CSD.
>



-- 
- Jayendra Rakesh.
   BTech CSD.
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to