can sntcooc be made more memory efficient?

for a start, it seems to be designed to do a calc of each <source,target) word pair but the count is not used.

it seems a bit silly that the simple initializer for giza++ takes up way more memory than giza++ itself.

On 08/04/2012 04:21, Philipp Koehn wrote:
Hi,

GIZA++ can run more memory-efficient when it knows in advance
which words can possibly align to which other words (because they
occur in the same sentence pair). Hence, snt2cooc is collecting a list
of words that may co-occur prior to running GIZA++.

Since snt2cooc can also run into memory problems (2GB limit on
32-bit machines), experiment.perl includes the option to break
up the corpus on run it on parts. This setting is called "run-giza-in-parts",
which is slighly misleading (it's snt2cooc that's run in parts, not
GIZA++).

-phi

On Sat, Apr 7, 2012 at 2:50 AM, Fong Po Po <fongpui...@yahoo.com.hk <mailto:fongpui...@yahoo.com.hk>> wrote:

    Dear all:
                     What is snt2cooc.out used to do in training of Moses?
                     Thanks!
    Best Regards,
Fong Pui Chi


    _______________________________________________
    Moses-support mailing list
    Moses-support@mit.edu <mailto:Moses-support@mit.edu>
    http://mailman.mit.edu/mailman/listinfo/moses-support




_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to