Hi,

It seems that whenever I use the option "-snt2cooc snt2cooc.pl" the training 
fails (see below for error log). When I try on the same corpora (rather small 
because of memory limitation) it works.

Does anyone have a clue about this ?

Reading more sentence pairs into memory ...
[sent:3400000]
 Train total # sentence pairs (weighted): 6.74606e+06
Size of source portion of the training corpus: 2.05275e+08 tokens
Size of the target portion of the training corpus: 2.40869e+08 tokens
In source portion of the training corpus, only 4208312 unique tokens appeared
In target portion of the training corpus, only 1428380 unique tokens appeared
lambda for PP calculation in IBM-1,IBM-2,HMM:= 
2.40869e+08/(2.12021e+08-6.74606e+06)== 1.1734
Dictionary Loading complete
Inputfile in /local/para_corpora/pattr/claims/giza.en-de/en-de.cooc
ERROR: Execution of: /nas/fgmehlin/bin/mgizapp/mgiza  -CoocurrenceFile 
/local/para_corpora/pattr/claims/giza.en-de/en-de.cooc -c 
/local/para_corpora/pattr/claims/corpus/en-de-int-train.snt -m1 5 -m2 0 -m3 3 
-m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 
-nsmooth 4 -o /local/para_corpora/pattr/claims/giza.en-de/en-de -onlyaldumps 1 
-p0 0.999 -s /local/para_corpora/pattr/claims/corpus/de.vcb -t 
/local/para_corpora/pattr/claims/corpus/en.vcb
  died with signal 11, without coredump
...
Reading more sentence pairs into memory ...
Reading more sentence pairs into memory ...
...
[sent:3500000]Compacted Vocabulary, eliminated 1 entries 1428381 remains
Compacted Vocabulary, eliminated 1 entries 4208312 remains
 Train total # sentence pairs (weighted): 6.74606e+06
Size of source portion of the training corpus: 2.40869e+08 tokens
Size of the target portion of the training corpus: 2.05275e+08 tokens
In source portion of the training corpus, only 1428381 unique tokens appeared
In target portion of the training corpus, only 4208311 unique tokens appeared
lambda for PP calculation in IBM-1,IBM-2,HMM:= 
2.05275e+08/(2.47615e+08-6.74606e+06)== 0.852227
Dictionary Loading complete
Inputfile in /local/para_corpora/pattr/claims/giza.de-en/de-en.cooc
ERROR: Execution of: /nas/fgmehlin/bin/mgizapp/mgiza  -CoocurrenceFile 
/local/para_corpora/pattr/claims/giza.de-en/de-en.cooc -c 
/local/para_corpora/pattr/claims/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 
-m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 
-nsmooth 4 -o /local/para_corpora/pattr/claims/giza.de-en/de-en -onlyaldumps 1 
-p0 0.999 -s /local/para_corpora/pattr/claims/corpus/en.vcb -t 
/local/para_corpora/pattr/claims/corpus/de.vcb
  died with signal 11, without coredump

The command I used to start the training is the following :

/local/moses/mosesdecoder/scripts/training/train-model.perl --root-dir . 
--corpus clean_lc_utf8 --f de --e en -cores 4 --parallel -external-bin-dir 
/nas/fgmehlin/bin/mgizapp -mgiza -mgiza-cpus 4 -snt2cooc snt2cooc.pl -lm 
0:5:/local/para_corpora/pattr/claims/lm.arpa

Thank you in advance,

Floran
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to