New submission from Sridhar Ratnakumar <sridh...@activestate.com>: It'd be nice to get the encoding used by a specific Python file. Considering that 'print' uses sys.stdout.encoding which is always set to None when the Python process is run by subprocess, knowing the source encoding is absolutely necessary in decoding the output generated by that script.
eg: Run 'python setup.py --author' in the python-wifi-0.3.1 source package as a subprocess.Popen(...) call.. and print the stdout.read() string; you'll get encoding error.. unless you do stdout.read().decode('latin1') .. where latin1 is specified as a coding: line in setup.py. The following function tries to detect the coding, but this guess work not necessary when this is integrated with the standard library whose implementation maps directly to that of PEP 263. +def get_python_source_encoding(python_file): + """Detect the encoding used in the file ``python_file`` + Detection is done as per http://www.python.org/dev/peps/pep-0263/ + """ + first_two_lines = open(python_file).readlines()[:2] + coding_line_regexp = ".*coding[:=]\s*([-\w.]+).*" + + for line in first_two_lines: + m = re.match(coding_line_regexp, line) + if m: + return m.group(1) + + # if no encoding is defined, use the default encoding + return 'ascii' ref: subprocess encoding mess: http://bugs.python.org/issue6135 ---------- components: Interpreter Core, Library (Lib) messages: 89097 nosy: lemburg, loewis, srid severity: normal status: open title: API to get source encoding as defined by PEP 263 type: feature request versions: Python 2.7, Python 3.1, Python 3.2 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue6240> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com