[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Antoine Pitrou

Antoine Pitrou [EMAIL PROTECTED] added the comment:

I don't understand the whole decoding machinery in the tokenizer, but
the patch looks ok to me. (tested in debug mode under Linux and Windows)

--
nosy: +pitrou

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Benjamin Peterson

Benjamin Peterson [EMAIL PROTECTED] added the comment:

The patch also looks pretty harmless to me. :)

--
nosy: +benjamin.peterson

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Hye-Shik Chang

Hye-Shik Chang [EMAIL PROTECTED] added the comment:

pitrou, that's because Python source code can't be correctly tokenized 
when it's encoded in few odd encodings like iso-2022 or shift-jis which 
utilizes \, (, ) and  as second byte of two-byte character sequence.

For example, '\x81\\' is HORIZONTAL BAR in shift-jis,

exec('print \x81\\')

fails. because of  is ignored by second byte of '\x81\\'.

--
nosy: +hyeshik.chang

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-09-03 Thread Brett Cannon

Brett Cannon [EMAIL PROTECTED] added the comment:

Committed in r66209.

--
resolution:  - accepted
status: open - closed

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-08-21 Thread Brett Cannon

Changes by Brett Cannon [EMAIL PROTECTED]:


--
keywords: +needs review

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-08-18 Thread Brett Cannon

New submission from Brett Cannon [EMAIL PROTECTED]:

Turns out that PyTokenizer_FindEncoding() never properly succeeds
because the tok_state used by it does not have tok-filename set, which
is an error condition in the tokenizer. This error has been masked by
the one place the function is used, imp.find_module() because a NULL
return is never checked for an error, but instead just assumes the
default source encoding suffices.

--
components: Extension Modules
messages: 71397
nosy: brett.cannon, christian.heimes
priority: critical
severity: normal
status: open
title: PyTokenizer_FindEncoding() never succeeds
versions: Python 3.0

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-08-18 Thread Brett Cannon

Brett Cannon [EMAIL PROTECTED] added the comment:

I have not bothered to check if this exists in 2.6, but I don't see why
it would be any different.

--
type:  - behavior

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-08-18 Thread Brett Cannon

Brett Cannon [EMAIL PROTECTED] added the comment:

Turns out that the NULL return value can signal an error that manifests
itself as SyntaxError(encoding problem: with BOM) thanks to the lack
of tok-filename being set in Parser/tokenizer.c:fp_setreadl() which is
called by check_coding_spec() and assumes that since tok-encoding was
never set (because fp_setreadl() returned an error value) that it had
something to do with the BOM.

The only reason this was found is because my bootstrapping of importlib
into Py3K, at some point, triggers a PyErr_Occurred() which finally
notices the error.

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue3594] PyTokenizer_FindEncoding() never succeeds

2008-08-18 Thread Brett Cannon

Brett Cannon [EMAIL PROTECTED] added the comment:

Attached is a patch that fixes where the error occurs. By opening the
file by either file name or file descriptor, the problem goes away. Once
this patch is accepted then PyErr_Occurred() should be added to all uses
of PyTokenizer_FindEncoding().

--
keywords: +patch
Added file: http://bugs.python.org/file11153/fix_findencoding.diff

___
Python tracker [EMAIL PROTECTED]
http://bugs.python.org/issue3594
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com