Hello everyone,

I have tried to train a tree-based model. The corpus was previously annotated 
with syntax information accordingly to the manual.

I use the parallel corpus newcommentary-v10 in german - english, and used the 
Berkeley Parser for german

and the collins parser for english text (with the wrapper scripts from Moses).

After parsing I trained the model with following command:


moses/ubuntu-17.04/moses/scripts/training/train-model.perl \
-mgiza \
--mgiza-cpus 4 \
--root-dir train-short \
--corpus corpus/nc.parsed.short \
--f de --e en \
--lm 0:5:$PWD/lm/europarl-v7.de-en.blm.en:8 \
--hierarchical \
--glue-grammar \
--source-syntax \
--target-syntax \
--extract-options="--MaxSpan 999" \
-external-bin-dir /home/renat_sakenov/moses/ubuntu-17.04/training-tools -cores=4


which led to following error:


==========================================================
Hmm Training Started at: Sat Jun  1 10:50:26 2019

-----------
Hmm: Iteration 1
Dump files 0 it 1 noIterations 5 dumpFreq 0
ERROR: Execution of: 
/home/renat_sakenov/moses/ubuntu-17.04/training-tools/mgiza  -CoocurrenceFile 
/home/renat_sakenov/moses-2/tree-based-parsed/train-short/giza.de-en/de-en.cooc 
-c 
/home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/de-en-int-train.snt
 -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 
4 -nodumps 1 -nsmooth 4 -o 
/home/renat_sakenov/moses-2/tree-based-parsed/train-short/giza.de-en/de-en 
-onlyaldumps 1 -p0 0.999 -s 
/home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/en.vcb -t 
/home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/de.vcb
  died with signal 11, without coredump


This seems similar to this issue: 
https://www.mail-archive.com/moses-support@mit.edu/msg11270.html
I am using Moses Release 4.0, Collins Parser Version 1.0, MXPOST Version 1.0 
and Berkeley Parser Version 1.1
I have attached the complete output from the trainings procedure.

At some point this message appears:

Reading more sentence pairs into memory ...
ERROR: Forbidden zero sentence length 0
Corpus fits in memory, corpus has: 9256 sentence pairs.

But I'm certain there are no empty lines in the trainings-corpora.


Is anyone familiar with this problem?


Thank you in advance


Best,


Renat Sakenov


Attachment: nohup.out
Description: nohup.out

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to