ah. I've rolled back Ken's change 'cos I need it to work with Python 2.7. I've set the env variable in train-model.perl just before the call to merge-alignment.py. That should patch ken's problem for now.
https://github.com/moses-smt/mosesdecoder/commit/acd3ac964a7df646e15e3c4210853e7b70bebcbf But the better way is adding Rico's code to all python scripts On 14 November 2014 13:20, Rico Sennrich <rico.sennr...@gmx.ch> wrote: > Hieu Hoang <Hieu.Hoang@...> writes: > > > Ken - should we add encoding on open to all python scripts, rather than > set the PYTHONIOENCODING env variable? That's basically what happens with > the perl scripts/ > > > > What python/Linux version are you using? I don't see it on my version > (Python 2.7.3, Ubuntu 12.04) > > Hi all, > > It's kinda tricky to have consistent encoding between Python 2.X and Python > 3. The patch to merge_alignment.py will fail under 2.X. I suggest to use > io.open instead, which works with all versions from 2.6 up. And if any > string processing is done, I suggest using 'from __future__ import > unicode_literals' to ensure that all string literals are interpreted as > unicode, and making sure that all input/output is UTF-8 (including > stdin/stdout/stderr). I usually do this with the following code block: > > import codecs > if sys.version_info < (3,0,0): > sys.stdin = codecs.getreader('UTF-8')(sys.stdin) > sys.stdout = codecs.getwriter('UTF-8')(sys.stdout) > sys.stderr = codecs.getwriter('UTF-8')(sys.stderr) > > best, > Rico > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support