New submission from Curtis Doty: I first stumbled across this bug attempting to install use pip's cool editable mode:
$ pip install -e git+git://github.com/appliedsec/pygeoip.git#egg=pygeoip Obtaining pygeoip from git+git://github.com/appliedsec/pygeoip.git#egg=pygeoip Cloning git://github.com/appliedsec/pygeoip.git to ./src/pygeoip Running setup.py egg_info for package pygeoip Traceback (most recent call last): File "<string>", line 16, in <module> File "/home/curtis/python/3.3.3/lib/python3.3/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1098: ordinal not in range(128) Complete output from command python setup.py egg_info: Traceback (most recent call last): File "<string>", line 16, in <module> File "/home/curtis/python/3.3.3/lib/python3.3/encodings/ascii.py", line 26, in decode return codecs.ascii_decode(input, self.errors)[0] UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 1098: ordinal not in range(128) ---------------------------------------- Cleaning up... Command python setup.py egg_info failed with error code 1 in /home/curtis/python/2013-11-20/src/pygeoip Storing complete log in /home/curtis/.pip/pip.log It turns out this is related to a local LANG=C environment. If I set LANG=en_US.UTF-8, the problem goes away. But it seems pip/python3 open() should be more intelligently handling this. Worse, the file in this case https://github.com/appliedsec/pygeoip/blob/master/setup.py already has a source code decorator *declaring* it as utf-8. Ugly workaround patch is to force pip to always use 8-bit encoding on setup.py: --- pip.orig/req.py 2013-11-19 15:53:49.000000000 -0800 +++ pip/req.py 2013-11-20 16:37:23.642656132 -0800 @@ -281,7 +281,7 @@ def replacement_run(self): writer(self, ep.name, os.path.join(self.egg_info,ep.name)) self.find_sources() egg_info.egg_info.run = replacement_run -exec(compile(open(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec')) +exec(compile(open(__file__,encoding='utf_8').read().replace('\\r\\n', '\\n'), __file__, 'exec')) """ def egg_info_data(self, filename): @@ -687,7 +687,7 @@ exec(compile(open(__file__).read().repla ## FIXME: should we do --install-headers here too? call_subprocess( [sys.executable, '-c', - "import setuptools; __file__=%r; exec(compile(open(__file__).read().replace('\\r\\n', '\\n'), __file__, 'exec'))" % self.setup_py] + "import setuptools; __file__=%r; exec(compile(open(__file__,encoding='utf_8').read().replace('\\r\\n', '\\n'), __file__, 'exec'))" % self.setup_py] + list(global_options) + ['develop', '--no-deps'] + list(install_options), cwd=self.source_dir, filter_stdout=self._filter_install, But that only treats the symptom. Root cause appears to be in python3 as demonstrated by this simple script: wrong-codec.py: #! /bin/env python3 from urllib.request import urlretrieve urlretrieve('https://raw.github.com/appliedsec/pygeoip/master/setup.py', filename='setup.py') # if LANC=C then locale.py:getpreferredencoding()->'ANSI_X3.4-1968' foo= open('setup.py') # bang! ascii_decode() cannot handle the unicode bar= foo.read() This does not occur in python2. Is this bug in pip or python3? ---------- components: Unicode messages: 203673 nosy: GreenKey, ezio.melotti, haypo priority: normal severity: normal status: open title: open() fails to autodetect utf-8 if LANG=C type: crash versions: Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19685> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com