Hi, I tried to run MERT manually with my own decoder. It seems that mert-moses.pl executes following two commands for each run.
/home/leona/mosesdecoder/mert/extractor --scconfig case:true --scfile run1.scores.dat --ffile run1.features.dat -r /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref0,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref1,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref2,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref3 -n run1.best100.out.gz > extract.out 2> extract.err /home/leona/mosesdecoder/mert/mert -d 14 --scconfig case:true -n 20 --ffile run1.features.dat --scfile run1.scores.dat --ifile run1.init.opt 2> mert.log where extractor outputs score the file (run*.score.dat) and feature file (run*.features.dat), and mert outputs weights.txt and stderr messages. My own decoder uses 7 features as follows. head -n1 run1.best100 0 ||| I name is Tanaka 希洛 grams of I want in your place for one a room . ||| phrase-f2e: -30.924257 lex-f2e: -27.824498 phrase-e2f: -22.619237 lex-e2f: -19.450964 phrase-penalty: -89.140054 n5gram: -99.894705 d2gram: 0.000000 ||| -389.853714627 Therefore I adjust the argument (mert with -d 7). $/home/leona/mosesdecoder/mert/extractor --scconfig case:true --scfile run1.scores.dat --ffile run1.features.dat -r /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref0,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref1,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref2,/home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref3 -n run1.best100 > extract.out 2> extract.err Binary write mode is NOT selected Scorer type: BLEU Scorer config string: case:true name: case value: true Using scorer regularisation strategy: none Using scorer regularisation window: 0 Using case preservation: 1 Using reference length strategy: closest Loading reference from /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref0 . Loading reference from /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref1 . Loading reference from /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref2 . Loading reference from /home72/leona/IWSLT10.zh-en/tuning/reference.tok.2.ref3 . References loaded : [0] seconds Data::score_type BLEU Data::Scorer type from Scorer: BLEU BleuScorer: 9 ScoreData: number_of_scores: 9 Previous data loaded : [0] seconds loading nbest from run1.100best Nbest entries loaded and scored : [1] seconds Binary write mode is NOT selected Binary write mode is NOT selected saving the array into run1.features.dat saving the array into run1.scores.dat Stopping... : [1] seconds $/home/leona/mosesdecoder/mert/mert -d 7 --scconfig case:true -n 20 --ffile run1.features.dat --scfile run1.scores.dat --ifile run1.init.opt 2> mert.log However, this gives the following error. Seeding random numbers with system clock Scorer config string: case:true name: case value: true Using scorer regularisation strategy: none Using scorer regularisation window: 0 Using case preservation: 1 Using reference length strategy: closest Data::score_type BLEU Data::Scorer type from Scorer: BLEU BleuScorer: 9 ScoreData: number_of_scores: 9 Loading Data from: run1.score.dat and run1.features.dat loading feature data from run1.features.dat loading score data from run1.score.dat Data loaded : [0] seconds error size mismatch between FeatureData and Scorer Do you have any suggestion? -- Hwidong Na <[email protected]> KLE lab, POSTECH, KOREA _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
