Hi,

The mkcls and snt2cooc utility in mgiza++ are almost unchanged from GIZA++.
But snt2cooc's command line interface is changed: instead of writing to
STDOUT, it writes to a file.  I don't think it can be used without changing
Moses script, and the change is mainly to support Hadoop based training
system Chaski.

So a shorter answer is they are almost the same but you don't need to use
them in Moses.

BTW, snt2cooc can be easily paralllelized, just split the corpus, run
snt2cooc on different parts, merge the output with sort -m -n and uniq the
final output.

Best
--Q


On Sat, Nov 6, 2010 at 3:20 AM, <supp...@precisiontranslationtools.com>wrote:

> MGIZA++ and GIZA++ build mkcls and snt2cooc (snt2cooc.out). Other than the
> file sizes (below), and command line interfaces between snt2cooc and
> snt2cooc.out, are there any other differences between the pairs of
> utilities? For example, are the snt2cooc and mkcls utilities in MGIZA++ also
> multi-threaded?
>
> mkcls
>     giza-pp         230,610 bytes
>     mgizapp    1,271,447 bytes
>
> snt2cooc.out
>     giza-pp    1,511,969 bytes
> snt2cooc
>     mgizapp     202,773 bytes
>
> Thanks,
> Tom
>
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to