Hi, GIZA++ can run more memory-efficient when it knows in advance which words can possibly align to which other words (because they occur in the same sentence pair). Hence, snt2cooc is collecting a list of words that may co-occur prior to running GIZA++.
Since snt2cooc can also run into memory problems (2GB limit on 32-bit machines), experiment.perl includes the option to break up the corpus on run it on parts. This setting is called "run-giza-in-parts", which is slighly misleading (it's snt2cooc that's run in parts, not GIZA++). -phi On Sat, Apr 7, 2012 at 2:50 AM, Fong Po Po <fongpui...@yahoo.com.hk> wrote: > Dear all: > What is snt2cooc.out used to do in training of Moses? > Thanks! > Best Regards, > > Fong Pui Chi > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > >
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support