Thanks Barry. RE: -- I don't know if it makes any difference --
I had the same question, so asked the question. From Qin's answer and my tests, MGIZA's snt2cooc works just the same as GIZA's snt2cooc.out. I see no reason to install both MGIZA and GIZA. So, rather than an invasive modification to the train-model.perl script, I wrapped MGIZA's snt2cooc. There's a slight performance hit from the additional piping, but it's an acceptable trade-off for my requirement for fewer installed packages. Yes, I also use the -mziza-cpus and -mgiza switches in train-model.perl. But there are a few other hurdles to fully removing (more accurately, never installing) GIZA++. The command line interface was one. The other was the dependency on a file named GIZA++ in BINDIR. Tom On Sun, 7 Nov 2010 10:33:36 +0000, Barry Haddow <bhad...@inf.ed.ac.uk> wrote: > Hi Tom > > There's already an option to train-model.perl (-mgiza) to specify that > mgiza > should be used, and you can use -mgiza-cpus to specify the number of cpus. > I > just use sntcooc.out from GIZA++ with mgiza - I don't know if it makes any > difference. To use mgiza, you need the binary (mgizapp) and the merge- > alignments.py script in the bin directory for moses. > > There's an option in ems to parallelise sntcooc. Just add > run-giza-in-parts = n > to the TRAINING section, with n being the number of parts. > > best regards > Barry > > On Sunday 07 Nov 2010 05:45:52 supp...@precisiontranslationtools.com wrote: >> Thanks Qin, >> >> Thanks, that's good to know. By using the MGIZA++ >> versions, I don't need to install and track the extra package. >> >> RE: --- "I >> don't think it can be used without changing Moses script" --- >> >> Almost >> true. I'm contributing the shell script below. It's a wrapper to use >> MGIZA++'s snt2cooc with same command line interface as GIZA++'s >> snt2cooc.out. It works fine with train-model.perl. I think it will help >> others. Please feel free to add it to the source distribution. To use >> Moses and MGIZA++ without installing GIZA++, update the Moses/MGIZA++ >> install instructions: >> >> ---------------------------- >> >> The MGIZA binary and the script merge_alignment.py need to be copied >> in you binary directory that contains GIZA++ (also note the name >> change for the binary). >> >> cp bin/mgiza BINDIR/mgizapp >> cp bin/snt2cooc BINDIR >> cp bin/mkcls BINDIR >> cp scripts/merge_alignment.py BINDIR >> cp scripts/snt2cooc.out BINDIR >> ln -sfn BINDIR/mgizapp BINDIR/GIZA++ >> >> MGIZA works with the training script train-model.perl. You indicate its >> use (opposed to regular GIZA++) with the switch -mgiza. The switch >> -mgiza-cpus NUMBER allows you to specify the number of CPUs. >> >> ---------------------------- >> >> Note the GIZA++ symbolic >> link just prevents several scripts from failing when they look for >> GIZA++ >> Add this shell script to the source code scripts folder. It's a >> wrapper around snt2cooc. To work, it needs to be placed in the same >> folder >> with snt2cooc. >> >> --------------- >> snt2cooc.out >> --------------- >> >> #! >> /bin/bash >> # GIZA++ snt2cooc.out vcb1 vcb2 snt12 < input > output >> # MGIZA++ >> snt2cooc output vcb1 vcb2 snt12 > output >> >> usage() { >> echo "Usage: >> snt2cooc.out vcb1 vcb2 snt12" >> echo "Converts GIZA++ snt-format into plain >> text." >> exit 1 >> } >> >> if [ $# -ne 3 ] ; then usage ; fi >> cat /dev/stdin | >> ${0%/*}/snt2cooc /tmp/ $1 $2 $3 >> cat /tmp/snt2cooc.out >> rm -f >> /tmp/snt2cooc.out >> exit 0 >> >> On Sat, 6 Nov 2010 08:17:07 -0400, Qin Gao >> wrote: >> >> Hi, >> >> The mkcls and snt2cooc utility in mgiza++ are almost >> unchanged from GIZA++. But snt2cooc's command line interface is changed: >> instead of writing to STDOUT, it writes to a file. I don't think it can >> be >> used without changing Moses script, and the change is mainly to support >> Hadoop based training system Chaski. >> >> So a shorter answer is they are >> almost the same but you don't need to use them in Moses. >> >> BTW, snt2cooc can >> be easily paralllelized, just split the corpus, run snt2cooc on different >> parts, merge the output with sort -m -n and uniq the final >> output. >> >> Best >> --Q >> >> On Sat, Nov 6, 2010 at 3:20 AM, wrote: >> >> MGIZA++ and >> GIZA++ build mkcls and snt2cooc (snt2cooc.out). Other than the file sizes >> (below), and command line interfaces between snt2cooc and snt2cooc.out, >> are >> there any other differences between the pairs of utilities? For example, >> are the snt2cooc and mkcls utilities in MGIZA++ also >> multi-threaded? >> >> mkcls >> giza-pp 230,610 bytes >> mgizapp 1,271,447 >> bytes >> >> snt2cooc.out >> giza-pp 1,511,969 bytes >> snt2cooc >> mgizapp 202,773 >> bytes >> >> Thanks, >> Tom >> _______________________________________________ >> >> Moses-support mailing list >> Moses-support@mit.edu >> [2] >> http://mailman.mit.edu/mailman/listinfo/moses-support [3] >> >> >> >> >> Links: >> ------ >> [1] mailto:supp...@precisiontranslationtools.com >> [2] >> mailto:Moses-support@mit.edu >> [3] >> http://mailman.mit.edu/mailman/listinfo/moses-support >> _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support