Hello everyone,
I have tried to train a tree-based model. The corpus was previously annotated with syntax information accordingly to the manual. I use the parallel corpus newcommentary-v10 in german - english, and used the Berkeley Parser for german and the collins parser for english text (with the wrapper scripts from Moses). After parsing I trained the model with following command: moses/ubuntu-17.04/moses/scripts/training/train-model.perl \ -mgiza \ --mgiza-cpus 4 \ --root-dir train-short \ --corpus corpus/nc.parsed.short \ --f de --e en \ --lm 0:5:$PWD/lm/europarl-v7.de-en.blm.en:8 \ --hierarchical \ --glue-grammar \ --source-syntax \ --target-syntax \ --extract-options="--MaxSpan 999" \ -external-bin-dir /home/renat_sakenov/moses/ubuntu-17.04/training-tools -cores=4 which led to following error: ========================================================== Hmm Training Started at: Sat Jun 1 10:50:26 2019 ----------- Hmm: Iteration 1 Dump files 0 it 1 noIterations 5 dumpFreq 0 ERROR: Execution of: /home/renat_sakenov/moses/ubuntu-17.04/training-tools/mgiza -CoocurrenceFile /home/renat_sakenov/moses-2/tree-based-parsed/train-short/giza.de-en/de-en.cooc -c /home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/de-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3 -model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 4 -nodumps 1 -nsmooth 4 -o /home/renat_sakenov/moses-2/tree-based-parsed/train-short/giza.de-en/de-en -onlyaldumps 1 -p0 0.999 -s /home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/en.vcb -t /home/renat_sakenov/moses-2/tree-based-parsed/train-short/corpus/de.vcb died with signal 11, without coredump This seems similar to this issue: https://www.mail-archive.com/moses-support@mit.edu/msg11270.html I am using Moses Release 4.0, Collins Parser Version 1.0, MXPOST Version 1.0 and Berkeley Parser Version 1.1 I have attached the complete output from the trainings procedure. At some point this message appears: Reading more sentence pairs into memory ... ERROR: Forbidden zero sentence length 0 Corpus fits in memory, corpus has: 9256 sentence pairs. But I'm certain there are no empty lines in the trainings-corpora. Is anyone familiar with this problem? Thank you in advance Best, Renat Sakenov
nohup.out
Description: nohup.out
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support