Thanks Barry.

RE: -- I don't know if it makes any difference --

I had the same question, so asked the question. From Qin's answer and my
tests, MGIZA's snt2cooc works just the same as GIZA's snt2cooc.out.

I see no reason to install both MGIZA and GIZA. So, rather than an
invasive modification to the train-model.perl script, I wrapped MGIZA's
snt2cooc. There's a slight performance hit from the additional piping, but
it's an acceptable trade-off for my requirement for fewer installed
packages. 

Yes, I also use the -mziza-cpus and -mgiza switches in train-model.perl.
But there are a few other hurdles to fully removing (more accurately, never
installing) GIZA++. The command line interface was one. The other was the
dependency on a file named GIZA++ in BINDIR. 

Tom

On Sun, 7 Nov 2010 10:33:36 +0000, Barry Haddow <bhad...@inf.ed.ac.uk>
wrote:
> Hi Tom
> 
> There's already an option to train-model.perl (-mgiza) to specify that
> mgiza 
> should be used, and you can use -mgiza-cpus to specify the number of
cpus.
> I 
> just use sntcooc.out from GIZA++ with mgiza - I don't know if it makes
any 
> difference. To use mgiza, you need the binary (mgizapp) and the merge-
> alignments.py script in the bin directory for moses.
> 
> There's an option in ems to parallelise sntcooc. Just add 
> run-giza-in-parts = n
> to the TRAINING section, with n being the number of parts.
> 
> best regards
> Barry
> 
> On Sunday 07 Nov 2010 05:45:52 supp...@precisiontranslationtools.com
wrote:
>> Thanks Qin,
>> 
>> Thanks, that's good to know. By using the MGIZA++
>> versions, I don't need to install and track the extra package.
>> 
>> RE: --- "I
>> don't think it can be used without changing Moses script" ---
>> 
>> Almost
>> true. I'm contributing the shell script below. It's a wrapper to use
>> MGIZA++'s snt2cooc with same command line interface as GIZA++'s
>> snt2cooc.out. It works fine with train-model.perl. I think it will help
>> others. Please feel free to add it to the source distribution. To use
>> Moses and MGIZA++ without installing GIZA++, update the Moses/MGIZA++
>> install instructions:
>> 
>> ----------------------------
>> 
>> The MGIZA binary and the script merge_alignment.py need to be copied
>> in you binary directory that contains GIZA++ (also note the name
>> change for the binary).
>> 
>>  cp bin/mgiza BINDIR/mgizapp
>>  cp bin/snt2cooc BINDIR
>>  cp bin/mkcls BINDIR
>>  cp scripts/merge_alignment.py BINDIR
>>  cp scripts/snt2cooc.out BINDIR
>>  ln -sfn BINDIR/mgizapp BINDIR/GIZA++
>> 
>> MGIZA works with the training script train-model.perl. You indicate its
>> use (opposed to regular GIZA++) with the switch -mgiza. The switch
>> -mgiza-cpus NUMBER allows you to specify the number of CPUs.
>> 
>> ----------------------------
>> 
>> Note the GIZA++ symbolic
>> link just prevents several scripts from failing when they look for
>> GIZA++
>> Add this shell script to the source code scripts folder. It's a
>> wrapper around snt2cooc. To work, it needs to be placed in the same
>> folder
>> with snt2cooc.
>> 
>> ---------------
>> snt2cooc.out
>> ---------------
>> 
>> #!
>> /bin/bash
>> # GIZA++ snt2cooc.out vcb1 vcb2 snt12 < input > output
>> # MGIZA++
>> snt2cooc output vcb1 vcb2 snt12 > output
>> 
>> usage() {
>>  echo "Usage:
>> snt2cooc.out vcb1 vcb2 snt12"
>>  echo "Converts GIZA++ snt-format into plain
>> text."
>>  exit 1
>>  }
>> 
>> if [ $# -ne 3 ] ; then usage ; fi
>> cat /dev/stdin |
>> ${0%/*}/snt2cooc /tmp/ $1 $2 $3
>> cat /tmp/snt2cooc.out
>> rm -f
>> /tmp/snt2cooc.out
>> exit 0
>> 
>> On Sat, 6 Nov 2010 08:17:07 -0400, Qin Gao
>> wrote:
>> 
>> Hi,
>> 
>> The mkcls and snt2cooc utility in mgiza++ are almost
>> unchanged from GIZA++. But snt2cooc's command line interface is
changed:
>> instead of writing to STDOUT, it writes to a file. I don't think it can
>> be
>> used without changing Moses script, and the change is mainly to support
>> Hadoop based training system Chaski.
>> 
>> So a shorter answer is they are
>> almost the same but you don't need to use them in Moses.
>> 
>> BTW, snt2cooc can
>> be easily paralllelized, just split the corpus, run snt2cooc on
different
>> parts, merge the output with sort -m -n and uniq the final
>> output.
>> 
>> Best
>> --Q
>> 
>>  On Sat, Nov 6, 2010 at 3:20 AM,  wrote:
>> 
>> MGIZA++ and
>> GIZA++ build mkcls and snt2cooc (snt2cooc.out). Other than the file
sizes
>> (below), and command line interfaces between snt2cooc and snt2cooc.out,
>> are
>> there any other differences between the pairs of utilities? For
example,
>> are the snt2cooc and mkcls utilities in MGIZA++ also
>> multi-threaded?
>> 
>> mkcls
>>  giza-pp 230,610 bytes
>>  mgizapp 1,271,447
>> bytes
>> 
>> snt2cooc.out
>>  giza-pp 1,511,969 bytes
>> snt2cooc
>>  mgizapp 202,773
>> bytes
>> 
>> Thanks,
>> Tom
>> _______________________________________________
>> 
>> Moses-support mailing list
>> Moses-support@mit.edu
>> [2]
>> http://mailman.mit.edu/mailman/listinfo/moses-support [3]
>> 
>> 
>> 
>> 
>> Links:
>> ------
>> [1] mailto:supp...@precisiontranslationtools.com
>> [2]
>> mailto:Moses-support@mit.edu
>> [3]
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to