Dear Moses,

        The PYTHONIOENCODING environment variable sets Python's default
encoding because Python is too lame to default to utf-8.  When is not
set, and a particular flavor of python is installed, merge_alignment.py
from mgiza fails.  That triggers a snippet of train-model.perl output below:

Executing: /external-bin/merge_alignment.py
/experiment/giza.f-e/f-e.A3.final.part*>/experiment/giza.f-e/f-e.A3.final
Traceback (most recent call last):
  File "/external-bin/merge_alignment.py", line 57, in <module>
    sys.stdout.write("%s%s%s"%(sents[i][0],sents[i][1],sents[i][2]));
UnicodeEncodeError: 'ascii' codec can't encode character '\ufeff' in
position 84: ordinal not in range(128)
Exit code: 1
Executing: rm -f /experiment/giza.f-e/f-e.A3.final.gz
Executing: gzip /experiment/giza.f-e/f-e.A3.final
Waiting for second GIZA process...

However, when run with -last-step 2 (so stop after GIZA), the
train-model.perl script still returns 0.  So I think there are two bugs:

1. Set (or at least document) PYTHONIOENCODING=utf-8 in the environment.
 Google doesn't see it anywhere on statmt.org.

2. train-model.perl should return non-zero on failure of a command like
this.

Kenneth
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to