[issue25388] tokenizer crash/misbehavior -- heap use-after-free

Terry J. Reedy Fri, 16 Oct 2015 19:36:55 -0700

Terry J. Reedy added the comment:

According to 
https://docs.python.org/3/reference/lexical_analysis.html#lexical-analysis, the 
encoding of a sourcefile (in Python 3) defaults to utf-8* and a decoding error 
is (should be) reported as a SyntaxError. Since 
b"\x7f\x00\x00\n''s\x01\xfd\n'S" is not invalid as utf-8, I expect a 
UnicodeDecodeError converted to SyntaxError.


* compile(bytes, filename, mode) defaults to latin1 instead.  It has no 
decoding problem, but quits with "ValueError: source code string cannot contain 
null bytes".  On 2.7, I might expect that as the error.

I expect '''self.assertIn(b"Non-UTF-8", res.err)''' to always fail because 
error messages are strings, not bytes.  That aside, have you ever seen that 
particular text (as a string) in a SyntaxError message?).

Why do you think the crash is during the tokenizing phase?  I could not see 
anything in the AS report.

----------
nosy: +terry.reedy
versions: +Python 3.5

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue25388>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue25388] tokenizer crash/misbehavior -- heap use-after-free

Reply via email to