Hi, It seems that whenever I use the option "-snt2cooc snt2cooc.pl" the training fails (see below for error log). When I try on the same corpora (rather small because of memory limitation) it works.
Does anyone have a clue about this ? Reading more sentence pairs into memory ... [sent:3400000] Train total # sentence pairs (weighted): 6.74606e+06 Size of source portion of the training corpus: 2.05275e+08 tokens Size of the target portion of the training corpus: 2.40869e+08 tokens In source portion of the training corpus, only 4208312 unique tokens appeared In target portion of the training corpus, only 1428380 unique tokens appeared lambda for PP calculation in IBM-1,IBM-2,HMM:= 2.40869e+08/(2.12021e+08-6.74606e+06)== 1.1734 Dictionary Loading complete Inputfile in /local/para_corpora/pattr/claims/giza.en-de/en-de.cooc ERROR: Execution of: /nas/fgmehlin/bin/mgizapp/mgiza -CoocurrenceFile /local/para_corpora/pattr/claims/giza.en-de/en-de.cooc -c /local/para_corpora/pattr/claims/corpus/en-de-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /local/para_corpora/pattr/claims/giza.en-de/en-de -onlyaldumps 1 -p0 0.999 -s /local/para_corpora/pattr/claims/corpus/de.vcb -t /local/para_corpora/pattr/claims/corpus/en.vcb died with signal 11, without coredump ... Reading more sentence pairs into memory ... Reading more sentence pairs into memory ... ... [sent:3500000]Compacted Vocabulary, eliminated 1 entries 1428381 remains Compacted Vocabulary, eliminated 1 entries 4208312 remains Train total # sentence pairs (weighted): 6.74606e+06 Size of source portion of the training corpus: 2.40869e+08 tokens Size of the target portion of the training corpus: 2.05275e+08 tokens In source portion of the training corpus, only 1428381 unique tokens appeared In target portion of the training corpus, only 4208311 unique tokens appeared lambda for PP calculation in IBM-1,IBM-2,HMM:= 2.05275e+08/(2.47615e+08-6.74606e+06)== 0.852227 Dictionary Loading complete Inputfile in /local/para_corpora/pattr/claims/giza.de-en/de-en.cooc ERROR: Execution of: /nas/fgmehlin/bin/mgizapp/mgiza -CoocurrenceFile /local/para_corpora/pattr/claims/giza.de-en/de-en.cooc -c /local/para_corpora/pattr/claims/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /local/para_corpora/pattr/claims/giza.de-en/de-en -onlyaldumps 1 -p0 0.999 -s /local/para_corpora/pattr/claims/corpus/en.vcb -t /local/para_corpora/pattr/claims/corpus/de.vcb died with signal 11, without coredump The command I used to start the training is the following : /local/moses/mosesdecoder/scripts/training/train-model.perl --root-dir . --corpus clean_lc_utf8 --f de --e en -cores 4 --parallel -external-bin-dir /nas/fgmehlin/bin/mgizapp -mgiza -mgiza-cpus 4 -snt2cooc snt2cooc.pl -lm 0:5:/local/para_corpora/pattr/claims/lm.arpa Thank you in advance, Floran
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support