Dear Moses, The PYTHONIOENCODING environment variable sets Python's default encoding because Python is too lame to default to utf-8. When is not set, and a particular flavor of python is installed, merge_alignment.py from mgiza fails. That triggers a snippet of train-model.perl output below:
Executing: /external-bin/merge_alignment.py /experiment/giza.f-e/f-e.A3.final.part*>/experiment/giza.f-e/f-e.A3.final Traceback (most recent call last): File "/external-bin/merge_alignment.py", line 57, in <module> sys.stdout.write("%s%s%s"%(sents[i][0],sents[i][1],sents[i][2])); UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in position 84: ordinal not in range(128) Exit code: 1 Executing: rm -f /experiment/giza.f-e/f-e.A3.final.gz Executing: gzip /experiment/giza.f-e/f-e.A3.final Waiting for second GIZA process... However, when run with -last-step 2 (so stop after GIZA), the train-model.perl script still returns 0. So I think there are two bugs: 1. Set (or at least document) PYTHONIOENCODING=utf-8 in the environment. Google doesn't see it anywhere on statmt.org. 2. train-model.perl should return non-zero on failure of a command like this. Kenneth _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support