New submission from Dave Malcolm <dmalc...@redhat.com>:

Running this code (seen via 
http://bugs.python.org/file10013/python-2.5.2-unicode_resize-utf16.py for issue 
2620):
>>> msg = 'A'*2147483647 ; msg.decode('utf16')
leads to the python process exiting with an assertion failure:
python: Objects/exceptions.c:1787: PyUnicodeDecodeError_Create: Assertion 
`length < 2147483647' failed.

The reaason is that PyUnicodeDecodeError_Create contains this code:
PyObject *
PyUnicodeDecodeError_Create(
    const char *encoding, const char *object, Py_ssize_t length,
    Py_ssize_t start, Py_ssize_t end, const char *reason)
{
    assert(length < INT_MAX);
    assert(start < INT_MAX);
    assert(end < INT_MAX);
    return PyObject_CallFunction(PyExc_UnicodeDecodeError, "ss#nns",
                                 encoding, object, length, start, end, reason);
}

In the example above, we're creating a buffer containing a very large but 
odd-numbered of bytes, and trying to UTF-16 decode it, which requires an even 
number of bytes.

This leads to a UnicodeDecodeError with a very large value for each of length, 
start, and end, and although they fit in Py_ssize_t on 64-bit, they don't fit 
in an int.  

It seems that this will affect any other decoding errors for buffers that are 
>= INT_MAX in size: unicode decode errors will cause the python process to bail 
out with an assert failure.

It appears that throughout the UnicodeDecodeError representation that length, 
start and size are to be of size Py_ssize_t, rather than int: they are stored 
in fields of type Py_ssize_t, they are printed using format "%zd", which is 
indeed a Py_ssize_t (see 
http://docs.python.org/c-api/string.html#PyString_FromFormat ).

In PyUnicodeDecodeError_Create, "ss#nns" is:
  - "s" (string) [char *]: "encoding"
  - "s#" (string) [char *, int]:"object", "length"

  - "n" (int) [Py_ssize_t]: "start"
  - "n" (int) [Py_ssize_t]: "end"
  - "s" (string) [char *]: "reason"

See Python/modsupport.c: do_mkvalue: "s#" uses this logic:
                                if (flags & FLAG_SIZE_T)
                                        n = va_arg(*p_va, Py_ssize_t);
                                else
                                        n = va_arg(*p_va, int);
where FLAG_SIZE_T is set by _Py_BuildValue_SizeT, but not by Py_BuildValue.  
The latter is what's called by PyObject_CallFunction.

Hence, as written, "length" must fit within an "int", but "start" and "end" 
don't have to.

"s#" calls PyString_FromStringAndSize(str, n) upon the data, which takes a 
Py_ssize_t.  It's going to be big, but may well fit in RAM (but might not).

The invoked function leads to a call to: UnicodeError_init, which calls:
    if (!PyArg_ParseTuple(args, "O!O!nnO!",
        &PyString_Type, &self->encoding,
        objecttype, &self->object,
        &self->start,
        &self->end,
        &PyString_Type, &self->reason)) {

"O!": (object) [typeobject, PyObject *]: &PyString_Type, &self->encoding,
"O!": (object) [typeobject, PyObject *]: objecttype, &self->object  (objecttype 
is passed as &PyString_Type)
"n": (integer) [Py_ssize_t]: &self->start,
"n": (integer) [Py_ssize_t]: &self->end,
"O!": (object) [typeobject, PyObject *]: &PyString_Type, &self->reason,

So it looks like the only place in construction where we actually restrict to 
"int" is in that "s#" in PyUnicodeDecodeError_Create, which looks fixable.

After construction:  various calls to PyString_FromFormat for start and end 
using format "%zd", which is indeed a Py_ssize_t (see 
http://docs.python.org/c-api/string.html#PyString_FromFormat )

So it looks like the only issue here is the restriction to "int" in 
PyUnicodeDecodeError_Create due to the use of "s#" in the call to 
PyObject_CallFunction; apart from that, it looks like the assertions can be 
removed.  

I'll attach a patch which removes this restriction.

(for my reference, I'm tracking this as 
https://bugzilla.redhat.com/show_bug.cgi?id=540518 )

----------
components: Unicode
messages: 108404
nosy: dmalcolm
priority: normal
severity: normal
status: open
title: PyUnicodeDecodeError_Create asserts that various arguments are less than 
INT_MAX
type: crash
versions: Python 2.7

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue9058>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to