Hi, The mkcls and snt2cooc utility in mgiza++ are almost unchanged from GIZA++. But snt2cooc's command line interface is changed: instead of writing to STDOUT, it writes to a file. I don't think it can be used without changing Moses script, and the change is mainly to support Hadoop based training system Chaski.
So a shorter answer is they are almost the same but you don't need to use them in Moses. BTW, snt2cooc can be easily paralllelized, just split the corpus, run snt2cooc on different parts, merge the output with sort -m -n and uniq the final output. Best --Q On Sat, Nov 6, 2010 at 3:20 AM, <supp...@precisiontranslationtools.com>wrote: > MGIZA++ and GIZA++ build mkcls and snt2cooc (snt2cooc.out). Other than the > file sizes (below), and command line interfaces between snt2cooc and > snt2cooc.out, are there any other differences between the pairs of > utilities? For example, are the snt2cooc and mkcls utilities in MGIZA++ also > multi-threaded? > > mkcls > giza-pp 230,610 bytes > mgizapp 1,271,447 bytes > > snt2cooc.out > giza-pp 1,511,969 bytes > snt2cooc > mgizapp 202,773 bytes > > Thanks, > Tom > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support