Hello!
I am trying to establish a working version of Moses for the purposes of our
project. I followed guidelines from the Moses Web pages (Baseline System, ...)
and it was mostly successful, except for the usage of mgiza.
History of what I did:
My System: virtual machine with Ubuntu 14.04 x64, 2 cores, 12 GB of memory.
1. installed release 3.0 from Web page
tried with commands from "Baseline System" ==> mgiza fails with signal 11,
coredump
2. compiled and installed latest version of mgiza from Github
tried with commands from "Baseline System" ==> mgiza fails with signal 11,
coredump
3. compiled and installed latest version of GIZA++ from Github
tried with commands from "Baseline System" ==> all OK
4. compiled and installed latest version of moses, GIZA++ and mgiza from Github
tried with commands from "Baseline System" ==> OK with GIZA++, fail with
mgiza
Basically, for calling GIZA++/mgiza I use the same command with the same input
files, the only difference is the following two switches:
GIZAOPT="-mgiza -mgiza-cpus 2"
Command:
$HOME/mosesdecoder/scripts/training/train-model.perl -cores 2 $GIZAOPT
-root-dir train -corpus $HOME/corpus/news-commentary-v8.fr-en.clean -f fr -e en
-alignment grow-diag-final-and -reordering msd-bidirectional-fe -lm
0:3:$HOME/lm/news-commentary-v8.fr-en.blm.en:8 -external-bin-dir
$HOME/mosesdecoder/training-tools 2>&1 > train.out
If GIZA++ is called (when GIZAOPT=""), all is OK, when mgiza is called (when
GIZAOP="-mgiza ..."), mgiza fails with:
Executing: $HOME/mosesdecoder/training-tools/mgiza -CoocurrenceFile
$HOME/tm/train/giza.fr-en/fr-en.cooc -c
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3
-model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 2 -nodumps 1 -nsmooth 4
-o $HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
Starting MGIZA
Initializing Global Paras
DEBUG: EnterERROR: Execution of: $HOME/mosesdecoder/training-tools/mgiza
-CoocurrenceFile $HOME/tm/train/giza.fr-en/fr-en.cooc -c
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3
-model1dumpfrequency 1 -model4smoothfactor 0.4 -ncpus 2 -nodumps 1 -nsmooth 4
-o $HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
died with signal 11, with coredump
GIZA++ on the other hand works as follows:
Executing: $HOME/mosesdecoder/training-tools/GIZA++ -CoocurrenceFile
$HOME/tm/train/giza.fr-en/fr-en.cooc -c
$HOME/tm/train/corpus/fr-en-int-train.snt -m1 5 -m2 0 -m3 3 -m4 3
-model1dumpfrequency 1 -model4smoothfactor 0.4 -nodumps 1 -nsmooth 4 -o
$HOME/tm/train/giza.fr-en/fr-en -onlyaldumps 1 -p0 0.999 -s
$HOME/tm/train/corpus/en.vcb -t $HOME/tm/train/corpus/fr.vcb
Reading vocabulary file from:$HOME/tm/train/corpus/en.vcb
Reading vocabulary file from:$HOME/tm/train/corpus/fr.vcb
1
2
...
What can I do to help determine where mgiza fails and get it up & running?
Sub-question: is it really worth running mgiza instead of GIZA++?
Best regards,
Matjaz
PS: I changed /home/... to $HOME in the above examples.
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support