[issue3594] PyTokenizer_FindEncoding() never succeeds
Antoine Pitrou [EMAIL PROTECTED] added the comment: I don't understand the whole decoding machinery in the tokenizer, but the patch looks ok to me. (tested in debug mode under Linux and Windows) -- nosy: +pitrou ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Benjamin Peterson [EMAIL PROTECTED] added the comment: The patch also looks pretty harmless to me. :) -- nosy: +benjamin.peterson ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Hye-Shik Chang [EMAIL PROTECTED] added the comment: pitrou, that's because Python source code can't be correctly tokenized when it's encoded in few odd encodings like iso-2022 or shift-jis which utilizes \, (, ) and as second byte of two-byte character sequence. For example, '\x81\\' is HORIZONTAL BAR in shift-jis, exec('print \x81\\') fails. because of is ignored by second byte of '\x81\\'. -- nosy: +hyeshik.chang ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Brett Cannon [EMAIL PROTECTED] added the comment: Committed in r66209. -- resolution: - accepted status: open - closed ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Changes by Brett Cannon [EMAIL PROTECTED]: -- keywords: +needs review ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
New submission from Brett Cannon [EMAIL PROTECTED]: Turns out that PyTokenizer_FindEncoding() never properly succeeds because the tok_state used by it does not have tok-filename set, which is an error condition in the tokenizer. This error has been masked by the one place the function is used, imp.find_module() because a NULL return is never checked for an error, but instead just assumes the default source encoding suffices. -- components: Extension Modules messages: 71397 nosy: brett.cannon, christian.heimes priority: critical severity: normal status: open title: PyTokenizer_FindEncoding() never succeeds versions: Python 3.0 ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Brett Cannon [EMAIL PROTECTED] added the comment: I have not bothered to check if this exists in 2.6, but I don't see why it would be any different. -- type: - behavior ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Brett Cannon [EMAIL PROTECTED] added the comment: Turns out that the NULL return value can signal an error that manifests itself as SyntaxError(encoding problem: with BOM) thanks to the lack of tok-filename being set in Parser/tokenizer.c:fp_setreadl() which is called by check_coding_spec() and assumes that since tok-encoding was never set (because fp_setreadl() returned an error value) that it had something to do with the BOM. The only reason this was found is because my bootstrapping of importlib into Py3K, at some point, triggers a PyErr_Occurred() which finally notices the error. ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue3594] PyTokenizer_FindEncoding() never succeeds
Brett Cannon [EMAIL PROTECTED] added the comment: Attached is a patch that fixes where the error occurs. By opening the file by either file name or file descriptor, the problem goes away. Once this patch is accepted then PyErr_Occurred() should be added to all uses of PyTokenizer_FindEncoding(). -- keywords: +patch Added file: http://bugs.python.org/file11153/fix_findencoding.diff ___ Python tracker [EMAIL PROTECTED] http://bugs.python.org/issue3594 ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com