[issue1943] improved allocation of PyUnicode objects

Marc-Andre Lemburg Mon, 01 Feb 2010 02:39:26 -0800

Marc-Andre Lemburg <[email protected]> added the comment:

Antoine Pitrou wrote:
> 
> Antoine Pitrou <[email protected]> added the comment:
> 
>> I find that the null termination for 8-bit strings makes low-level
>> parsing operations (e.g., parsing a numeric string) safer and easier:
> 
> Not to mention faster. The new IO library makes use of it (for newline
> detection), on both bytestrings and unicode strings.


I'd consider that a bug. Esp. the IO lib should be 8-bit clean
in the sense that it doesn't add any special meaning to NUL
characters or code points.

Besides, using a for-loop with a counter is both safer and faster
than checking each an every character for NUL.

Just think of what can happen if you have buggy code that overwrites
the NUL byte in some corner case situation and then use the assumption
of having the NUL byte as terminator - a classical buffer overrun.

If you're lucky, you get a segfault. If not, you end up with
data corruption or manipulation of data which could lead to
unwanted code execution.

The Python Unicode API deliberately tries to always use the combination
of a Py_UNICODE* pointer and a length integer to avoid such issues.

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue1943>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1943] improved allocation of PyUnicode objects

Reply via email to