[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Changes by Guido van Rossum: -- resolution: - fixed status: open - closed __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Christian Heimes added the comment: The bug was fixed in r58553 together with http://bugs.python.org/issue1267. Please close this bug. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Guido van Rossum added the comment: Can you suggest a patch? Adding Brett Cannon to the list, possibly his import-in-python would supersede this? -- nosy: +brett.cannon, gvanrossum __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Christian Heimes added the comment: Can you suggest a patch? Adding Brett Cannon to the list, possibly his import-in-python would supersede this? No, I can't suggest a patch. I don't know how we could get the encoding from the tokenizer or AST. Brett is obviously the best man to fix the problem. :) Christian __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Guido van Rossum added the comment: No, I can't suggest a patch. I don't know how we could get the encoding from the tokenizer or AST. Try harder. :-) Look at the code that accomplishes this feat in the regular parser... __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Christian Heimes added the comment: Try harder. :-) Look at the code that accomplishes this feat in the regular parser... I've already found the methods that find the encoding in Parser/tokenizer.c: check_coding_spec() and friends. But it seems like a waste of time to use PyTokenizer_FromFile() just to find the encoding. *reading* Mmh ... It's not a waste of time if I can stop the tokenizer. I think it may be possible to use the tokenizer to get the encoding efficiently. I could read until tok_state-read_coding_spec or tok_state-indent != 0. Do you know a better way to stop the tokenizer when the line isn't a special comment line # -*-? Christian __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Guido van Rossum added the comment: Call PyTokenizer_Get until the line number is 2? On 10/15/07, Christian Heimes [EMAIL PROTECTED] wrote: Christian Heimes added the comment: Try harder. :-) Look at the code that accomplishes this feat in the regular parser... I've already found the methods that find the encoding in Parser/tokenizer.c: check_coding_spec() and friends. But it seems like a waste of time to use PyTokenizer_FromFile() just to find the encoding. *reading* Mmh ... It's not a waste of time if I can stop the tokenizer. I think it may be possible to use the tokenizer to get the encoding efficiently. I could read until tok_state-read_coding_spec or tok_state-indent != 0. Do you know a better way to stop the tokenizer when the line isn't a special comment line # -*-? Christian __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Brett Cannon added the comment: No, my work has the exact same problem. Actually, this bug report has confirmed for me why heapq could not be imported when I accidentally forced all open text files to use UTF-8. I just have not gotten around to trying to solve this issue yet. But since importlib just uses open() directly it has the same problems. Since it looks like TextIOWrapper does not let one change the encoding after it has been set, some subclass might need to be written that reads Looks for the the stanza or else immediately stops and uses the expected encoding (UTF-8 in the case of Py3K or ASCII for 2.6). That or expose some C function that takes a file path or open file that returns a code object. But I have bigger fish to fry as my attempt to get around open() being defined in site.py is actually failing once I clobbered my .pyc files as codecs requires importing modules, even for ASCII encoding. __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
Changes by Alexandre Vassalotti: -- nosy: +alexandre.vassalotti __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue1278] imp.find_module() ignores -*- coding: Latin-1 -*-
New submission from Christian Heimes: imp.find_module() returns an io.TextIOWrapper instance first value. The encoding of the TextIOWrapper isn't set from a -*- coding: Latin-1 -*- line. import imp imp.find_module(heapq) (io.TextIOWrapper object at 0xb7c8f50c, '/home/heimes/dev/python/py3k/Lib/heapq.py', ('.py', 'U', 1)) imp.find_module(heapq)[0].read() Traceback (most recent call last): File stdin, line 1, in module File /home/heimes/dev/python/py3k/Lib/io.py, line 1224, in read res += decoder.decode(self.buffer.read(), True) File /home/heimes/dev/python/py3k/Lib/codecs.py, line 291, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode bytes in position 1428-1430: invalid data imp.find_module(heapq)[0].encoding 'UTF-8' imp.find_module(heapq)[0].readline() '# -*- coding: Latin-1 -*-\n' -- components: Interpreter Core messages: 56431 nosy: tiran severity: normal status: open title: imp.find_module() ignores -*- coding: Latin-1 -*- type: behavior versions: Python 3.0 __ Tracker [EMAIL PROTECTED] http://bugs.python.org/issue1278 __ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com