Sean Gillespie added the comment:
Went ahead and did it since I had the time - the issue is that when doing a
token of lookahead to see whether an 'async' at a top-level begins an 'async
def' function or if it is an identifier. A shallow copy of the current token is
made and given to another call to tok_get, which frees the token's buffer if a
decoding error occurs. Since the shallow copy cloned the token's buffer
pointer, the still-live token contains a freed pointer to its buffer that gets
freed again later on.
By explicitly nulling-out the token's buffer pointer like tok_get does if the
copied token's buffer pointer was nulled out, we avoid the double-free issue
and present the correct syntax error:
$ ./python vuln.py
File "vuln.py", line 1
SyntaxError: Non-UTF-8 code starting with '\xef' in file vuln.py on line 2, but
no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
William Bowling's second program is also fixed with this change, with one
additional wrinkle: if a token contains a null byte as the
first character, an invalid write occurs when we attempt to replace the null
character with a newline. This fix checks to make sure
that this is not the case before performing the newline insertion.
With this change, both of William Bowling's programs pass valgrind and
present the appropriate syntax error. I tried to add this to the couroutine
syntax tests, but any way to load the file outside of giving it to ./python
itself fails (correctly) because the program contains a null byte.
--
keywords: +patch
Added file: http://bugs.python.org/file41995/tokenizer_double_free.patch
___
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue26000>
___
___
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com