[issue27946] issues in elementtree and elsewhere due to PyDict_GetItem
New submission from tehybel: I would like to describe an issue in the _elementtree module, and then propose a fix which would prevent this type of bug everywhere in the codebase. The issue exists in _elementtree_Element_get_impl in /Modules/_elementtree.c. Here is the code: static PyObject * _elementtree_Element_get_impl(ElementObject *self, PyObject *key, PyObject *default_value) { PyObject* value; if (...) value = default_value; else { value = PyDict_GetItem(self->extra->attrib, key); ... } ... } We look up "key" in the dictionary, that is, in self->extra->attrib. This is done with the call to PyDict_GetItem(self->extra->attrib, key). We need to hash the "key" object in order to find it in the dictionary. This sometimes requires calling the key's __hash__ function, i.e., it might call on to python code. What happens if the python code gets the dictionary (self->extra->attrib) freed? Then PyDict_GetItem will use it after it has been freed. I've attached a proof-of-concept script which causes a use-after-free through _elementtree_Element_get_impl due to this issue. We could fix this by calling Py_INCREF on self->extra->attrib before calling PyDict_GetItem. But grepping for "PyDict_GetItem.*\->" shows that there are many places in the codebase where this situation occurs. Some of these may not have much impact, but some of them likely will. All these bugs could be fixed at once by changing PyDict_GetItem to call Py_INCREF on the dictionary before it hashes the key. Here's a PoC for the _elementtree module. --- begin script --- import _elementtree as et state = { "tag": "tag", "_children": [], "attrib": "attr", "tail": "tail", "text": "text", } class X: def __hash__(self): e.__setstate__(state) # this frees e->extra->attrib return 13 e = et.Element("elem", {1: 2}) e.get(X()) --- end script --- Running it (64-bits Python 3.5.2, --with-pydebug) causes a use-after-free with control of the program counter: (gdb) r ./poc13.py Starting program: /home/xx/Python-3.5.2/python ./poc13.py Program received signal SIGSEGV, Segmentation fault. 0x004939af in PyDict_GetItem (op=0x76d5b1a8, key=0x76d528e0) at Objects/dictobject.c:1079 1079ep = (mp->ma_keys->dk_lookup)(mp, key, hash, &value_addr); (gdb) p mp->ma_keys->dk_lookup $8 = (dict_lookup_func) 0x7b7b7b7b7b7b7b7b -- components: Extension Modules messages: 274276 nosy: tehybel priority: normal severity: normal status: open title: issues in elementtree and elsewhere due to PyDict_GetItem versions: Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue27946> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27945] five dictobject issues
Changes by tehybel : -- versions: +Python 3.5, Python 3.6 ___ Python tracker <http://bugs.python.org/issue27945> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27945] five dictobject issues
New submission from tehybel: Here I'll describe five distinct issues I found. Common to them all is that they reside in the built-in dictionary object. Four of them are use-after-frees and one is an array-out-of-bounds indexing bug. All of the described functions reside in /Objects/dictobject.c. Issue 1: use-after-free when initializing a dictionary Initialization of a dictionary happens via the function dict_init which calls dict_update_common. From there, PyDict_MergeFromSeq2 may be called, and that is where this issue resides. In PyDict_MergeFromSeq2 we retrieve a sequence of size 2 with this line: fast = PySequence_Fast(item, ""); After checking its size, we take out a key and value: key = PySequence_Fast_GET_ITEM(fast, 0); value = PySequence_Fast_GET_ITEM(fast, 1); Then we call PyDict_GetItem. This calls back to Python code if the key has a __hash__ function. From there the "item" sequence could get modified, resulting in "key" or "value" getting used after having been freed. Here's a PoC: --- class X: def __hash__(self): pair[:] = [] return 13 pair = [X(), 123] dict([pair]) --- It crashes while trying to use freed memory as a PyObject: (gdb) run ./poc24.py Program received signal SIGSEGV, Segmentation fault. 0x0048fe25 in insertdict (mp=mp@entry=0x76d5c4b8, key=key@entry=0x76d52538, hash=0xd, value=value@entry=0x8d1ac0 ) at Objects/dictobject.c:831 831 MAINTAIN_TRACKING(mp, key, value); (gdb) print *key $26 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdb, ob_type = 0xdbdbdbdbdbdbdbdb} Issue 2: use-after-free in dictitems_contains In the function dictitems_contains we call PyDict_GetItem to look up a value in the dictionary: found = PyDict_GetItem((PyObject *)dv->dv_dict, key); However this "found" variable is borrowed. We then go ahead and compare it: return PyObject_RichCompareBool(value, found, Py_EQ); But PyObject_RichCompareBool could call back into Python code and e.g. release the GIL. As a result, the dictionary may be mutated. Thus "found" could get freed. Then, inside PyObject_RichCompareBool (actually in do_richcompare), the "found" variable gets used after being freed. PoC: --- class X: def __eq__(self, other): d.clear() return NotImplemented d = {0: set()} (0, X()) in d.items() --- Result: (gdb) run ./poc25.py Program received signal SIGSEGV, Segmentation fault. 0x004a03b6 in do_richcompare (v=v@entry=0x76d52468, w=w@entry=0x76ddf7c8, op=op@entry=0x2) at Objects/object.c:673 673 if (!checked_reverse_op && (f = w->ob_type->tp_richcompare) != NULL) { (gdb) print w->ob_type $26 = (struct _typeobject *) 0xdbdbdbdbdbdbdbdb Issue 3: use-after-free in dict_equal In the function dict_equal, we call the "lookdict" function via b->ma_keys->dk_lookup to look up a value: if ((b->ma_keys->dk_lookup)(b, key, ep->me_hash, &vaddr) == NULL) This value's address is stored into the "vaddr" variable and the value is fetched into the "bval" variable: bval = *vaddr; Then we call Py_DECREF(key) which can call back into Python code. This could release the GIL and mutate dictionary b. Therefore "bval" could become freed at this point. We then proceed to use "bval": cmp = PyObject_RichCompareBool(aval, bval, Py_EQ); This results in a use-after-free. PoC: --- class X(): def __del__(self): dict_b.clear() def __eq__(self, other): dict_a.clear() return True def __hash__(self): return 13 dict_a = {X(): 0} dict_b = {X(): X()} dict_a == dict_b --- Result: (gdb) run ./poc26.py Program received signal SIGSEGV, Segmentation fault. PyType_IsSubtype (a=0xdbdbdbdbdbdbdbdb, b=0x87ec60 ) at Objects/typeobject.c:1343 1343mro = a->tp_mro; (gdb) print a $59 = (PyTypeObject *) 0xdbdbdbdbdbdbdbdb Issue 4: use-after-free in _PyDict_FromKeys The function _PyDict_FromKeys takes an iterable as argument. If the iterable is a dict, _PyDict_FromKeys loops over it like this: while (_PyDict_Next(iterable, &pos, &key, &oldvalue, &hash)) { if (insertdict(mp, key, hash, value)) { ... } } However if we look at the comment for PyDict_Next, we see this: * CAUTION: In general, it isn't safe to use PyDict_Next in a loop that * mutates the dict. But insertdict can call on to Python code which might mutate the dict. In that case we perform a use-after-free of the "key" variable. Here's a PoC: --- class X(int): def __hash__(self): return 13 def __eq__(self, other): if len(d) >
[issue27944] two hotshot module issues
New submission from tehybel: Here I'll describe two issues in the "hotshot" module which can be found in /Modules/_hotshot.c. Note that this module is for Python 2.7 only. The issues are (1) an uninitialized variable use and (2) a double free. Issue 1: uninitialized variable usage in unpack_add_info In the function unpack_add_info we have this code: static int unpack_add_info(LogReaderObject *self) { PyObject *key; ... err = unpack_string(self, &key); if (!err) { ... } finally: Py_XDECREF(key); Py_XDECREF(value); return err; } The variable "key" is not initialized to NULL. If the call to unpack_string fails, the code will directly call Py_XDECREF on key which is an uninitialized variable. To fix this we could initialize "key" to NULL. Here's a PoC: --- import hotshot.log WHAT_ADD_INFO = 0x13 open("./tmp", "w").write(chr(WHAT_ADD_INFO)) logreader = hotshot.log.LogReader("./tmp") --- It segfaults here: (gdb) run poc_27_1.py Program received signal SIGSEGV, Segmentation fault. 0x7696e585 in unpack_add_info (self=0x76b82098) at /home/xx/Python-2.7.12/Modules/_hotshot.c:370 370 Py_XDECREF(key); (gdb) p *key $3 = {_ob_next = 0x438b480f74fff883, _ob_prev = 0x53894801508d4808, ob_refcnt = 0x9066c35b00b60f08, ob_type = 0x841f0f2e66} Issue 2: double free (Py_DECREF) in unpack_add_info This is a separate issue from the one described above, but it exists in the same function, unpack_add_info: static int unpack_add_info(LogReaderObject *self) { PyObject *key; PyObject *value = NULL; int err; err = unpack_string(self, &key); if (!err) { err = unpack_string(self, &value); if (err) Py_DECREF(key); else { ... } } finally: Py_XDECREF(key); Py_XDECREF(value); return err; } If the second call to unpack_string fails, we call Py_DECREF(key). Then we reach the "finally" block and again call Py_XDECREF(key). So key is getting its refcount dropped twice. To fix this we could simply remove the "if (err)" clause and turn the "else" into "if (!err)". Then we would only have the single call to Py_XDECREF(key) in the "finally" block. Here's a PoC: --- import hotshot.log content = "\x13\x20" content += "A"*0x20 open("./tmp", "w").write(content) logreader = hotshot.log.LogReader("./tmp") --- When run, python dies with SIGABRT here (because it's a debug build with Py_REF_DEBUG defined; otherwise memory would silently be corrupted): (gdb) r ./poc_27_2.py Fatal Python error: /home/xx/Python-2.7.12/Modules/_hotshot.c:370 object at 0x76b7e220 has negative ref count -2604246222170760230 Program received signal SIGABRT, Aborted. 0x77143295 in raise () from /usr/lib/libc.so.6 -- components: Extension Modules messages: 274274 nosy: tehybel priority: normal severity: normal status: open title: two hotshot module issues versions: Python 2.7 ___ Python tracker <http://bugs.python.org/issue27944> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27867] various issues due to misuse of PySlice_GetIndicesEx
New submission from tehybel: Here I will describe 6 issues with various core objects (bytearray, list) and the array module. Common to them all is that they arise due to a misuse of the function PySlice_GetIndicesEx. This type of issue results in out-of-bounds array indexing which leads to memory disclosure, use-after-frees or memory corruption, depending on the circumstances. For each issue I've attached a proof-of-concept script which either prints leaked heap memory or segfaults on my machine (64-bit linux, --with-pydebug, python 3.5.2). Issue 1: out-of-bounds indexing when taking a bytearray's subscript While taking the subscript of a bytearray, the function bytearray_subscript in /Objects/bytearrayobject.c calls PySlice_GetIndicesEx to validate the given indices. Some of these indices might be objects with an __index__ method, and thus PySlice_GetIndicesEx could call back into python code. If the evaluation of the indices modifies the bytearray, the indices might no longer be safe, despite PySlice_GetIndicesEx saying so. Here is a PoC which lets us read out 64 bytes of uninitialized memory from the heap: --- class X: def __index__(self): b[:] = [] return 1 b = bytearray(b"A"*0x1000) print(b[0:64:X()]) --- Here's the result on my system: $ ./python poc17.py bytearray(b'\x00\x00\x00\x00\x00\x00\x00\x00\xb0\xce\x86\x9ff\x7f\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00') Issue 2: memory corruption in bytearray_ass_subscript This issue is similar to the one above. The problem exists when assigning to a bytearray via subscripting. The relevant function is bytearray_ass_subscript. The relevant line is again the one calling PySlice_GetIndicesEx. Here's a PoC which leads to memory corruption of the heap: --- class X: def __index__(self): del b[0:0x1] return 1 b = bytearray(b"A"*0x1) b[0:0x8000:X()] = bytearray(b"B"*0x8000) --- Here's the result of running it: (gdb) r poc20.py Program received signal SIGSEGV, Segmentation fault. PyCFunction_NewEx (ml=0x8b4140 , self=self@entry=0x77f0e898, module=module@entry=0x0) at Objects/methodobject.c:31 31 free_list = (PyCFunctionObject *)(op->m_self); (gdb) p op $13 = (PyCFunctionObject *) 0x4242424242424242 Issue 3: use-after-free when taking the subscript of a list This issue is similar to the one above, but it occurs when taking the subscript of a list rather than a bytearray. The relevant code is in list_subscript which exists in /Objects/listobject.c. Here's a PoC: --- class X: def __index__(self): b[:] = [1, 2, 3] return 2 b = [123]*0x1000 print(b[0:64:X()]) --- It results in a segfault here because of a use-after-free: (gdb) run ./poc18.py Program received signal SIGSEGV, Segmentation fault. 0x00483553 in list_subscript (self=0x76d53988, item=) at Objects/listobject.c:2441 2441Py_INCREF(it); (gdb) p it $2 = (PyObject *) 0xfbfbfbfbfbfbfbfb Issue 4: use-after-free when assigning to a list via subscripting The same type of issue exists in list_ass_subscript where we assign to the list using a subscript. Here's a PoC which also results in a use-after-free: --- class X: def __index__(self): b[:] = [1, 2, 3] return 2 b = [123]*0x1000 b[0:64:X()] = [0]*32 --- (gdb) r poc19.py Program received signal SIGSEGV, Segmentation fault. 0x00483393 in list_ass_subscript (self=, item=, value=) at Objects/listobject.c:2603 2603Py_DECREF(garbage[i]); (gdb) p garbage[i] $4 = (PyObject *) 0xfbfbfbfbfbfbfbfb Issue 5: out-of-bounds indexing in array_subscr Same type of issue. The problem is in the function array_subscr in /Modules/arraymodule.c. Here's a PoC which leaks and prints uninitialized memory from the heap: --- import array class X: def __index__(self): del a[:] a.append(2) return 1 a = array.array("b") for _ in range(0x10): a.append(1) print(a[0:0x10:X()]) --- And the result: $ ./python poc22.py array('b', [2, -53, -53, -53, -5, -5, -5, -5, -5, -5, -5, -5, 0, 0, 0, 0]) Issue 6: out-of-bounds indexing in array_ass_subscr Same type of issue, also in the array module. Here's a PoC which segfaults here: --- import array class X: def __index__(self): del a[:] return 1 a = array.array("b") a.frombytes(b"A"*0x100) del a[::X()] --- How should these be fixed? I would suggest that in each instance we could add a check after calling PySlice_GetIndicesEx. The check should validate that the "length" argument passed to PySlice_GetIndicesEx did not change during the call. But ma
[issue27863] multiple issues in _elementtree module
New submission from tehybel: I'll describe 7 issues in the /Modules/_elementtree.c module here. They include multiple use-after-frees, type confusions and instances of out-of-bounds array indexing. Issue 1: use-after-free in element_get_text The problematic code looks like this: LOCAL(PyObject*) element_get_text(ElementObject* self) { /* return borrowed reference to text attribute */ PyObject* res = self->text; if (JOIN_GET(res)) { res = JOIN_OBJ(res); if (PyList_CheckExact(res)) { res = list_join(res); if (!res) return NULL; self->text = res; } } return res; } As we can see, if res is a list, we call list_join with res, which is self->text. list_join will decrease self->text's reference count. When that happens, arbitrary python code can run. If that code uses self->text, a use-after-free occurs. PoC (Proof-of-Concept segfaulting script): --- import _elementtree as et class X(str): def __del__(self): print(elem.text) b = et.TreeBuilder() b.start("test") b.data(["", X("")]) b.start("test2") elem = b.close() print(elem.text) --- Issue 2: use-after-free in element_get_tail The same type of issue also exists in element_get_tail and should be fixed there, too. Issue 3: type confusion in elementiter_next The function elementiter_next iterates over a tree consisting of elements. Each element has an array of children. Before doing any casts, most of the elementtree code is careful to check that these children are, indeed, elements; that is, that their type is correct. The problem is that elementiter_next does not validate these child objects before performing a cast. Here is the relevant line: elem = (ElementObject *)cur_parent->extra->children[child_index]; If the child is not an element, a type confusion occurs. Here's a PoC: - import _elementtree as et state = { "tag": "tag", "_children": [1,2,3], "attrib": "attr", "tail": "tail", "text": "text", } e = et.Element("ttt") e.__setstate__(state) for x in e.iter(): print(x) - Issue 4: array-out-of-bounds indexing in element_subscr This issue occurs when taking the subscript of an element. This is done using the element_subscr function. The function calls PySlice_GetIndicesEx: if (PySlice_GetIndicesEx(item, self->extra->length, &start, &stop, &step, &slicelen) < 0) { return NULL; } The problem is that to evaluate the indices, PySlice_GetIndicesEx might call back into python code. That code might cause the element's self->extra->length to change. If this happens, the variables "start", "stop" and "step" might no longer be appropriate. The code then uses these variables for array indexing: for (cur = start, i = 0; i < slicelen; cur += step, i++) { PyObject* item = self->extra->children[cur]; ... } But this could go out of bounds and interpret arbitrary memory as a PyObject. Here's a PoC that reproduces this: --- import _elementtree as et class X: def __index__(self): e[:] = [] return 1 e = et.Element("elem") for _ in range(10): e.insert(0, et.Element("child")) print(e[0:10:X()]) --- Running it results in a segfault: (gdb) r ./poc14.py Starting program: /home/xx/Python-3.5.2/python ./poc14.py Program received signal SIGSEGV, Segmentation fault. 0x0049f933 in PyObject_Repr (v=0x768af058) at Objects/object.c:471 471 if (Py_TYPE(v)->tp_repr == NULL) (gdb) print *v $37 = {_ob_next = 0xdbdbdbdbdbdbdbdb, _ob_prev = 0xdbdbdbdbdbdbdbdb, ob_refcnt = 0xdbdbdbdbdbdbdbdc, ob_type = 0xdbdbdbdbdbdbdbdb} As we can see, "v" is freed memory with arbitrary contents. Issue 5: array-out-of-bounds indexing in element_ass_subscr A separate issue of the same type, also due to a call to PySlice_GetIndicesEx, exists in element_ass_subscr. Here's a proof-of-concept script for that: --- import _elementtree as et class X: def __index__(self): e[:] = [] return 1 e = et.Element("elem") for _ in range(10): e.insert(0, et.Element("child")) e[0:10:X()] = [] --- To fix these, I believe we could check whether self->extra->length changed after calling PySlice_GetIndicesEx, and bail out if so. (You can grep the codebase for "changed size during iteration" for examples of some similarish cases.) Issue 6: use-after-free in treebuilder_handle_start In the function treebuilder_handle_start we see these lines: if (treebuilder_set_eleme
[issue27861] sqlite3 type confusion and multiple frees
New submission from tehybel: The first issue is a type confusion which resides in the sqlite3 module, in the file connection.c. The function pysqlite_connection_cursor takes an optional argument, a factory callable: if (!PyArg_ParseTupleAndKeywords(args, kwargs, "|O", kwlist, &factory)) { return NULL; } If the factory callable is given, it is called to initialize a cursor: cursor = PyObject_CallFunction(factory, "O", self); After this the cursor, which is a PyObject *, is cast directly to a pysqlite_Cursor without performing any type checking: if (cursor && self->row_factory != Py_None) { Py_INCREF(self->row_factory); Py_XSETREF(((pysqlite_Cursor *)cursor)->row_factory, self->row_factory); } Here is a small script which is tested on Python-3.5.2, 64-bit, with --with-pydebug enabled: --- begin script --- import sqlite3 conn = sqlite3.connect('poc2.db') conn.row_factory = 12 conn.cursor(lambda x: "A"*0x1) --- end script --- When run, this produces a segfault: (gdb) r ./poc2.py Program received signal SIGSEGV, Segmentation fault. 0x76496ad8 in pysqlite_connection_cursor (self=0x768cc370, args=, kwargs=) at /home/xx/Python-3.5.2/Modules/_sqlite/connection.c:322 warning: Source file is more recent than executable. 322 Py_XSETREF(((pysqlite_Cursor *)cursor)->row_factory, self->row_factory); (gdb) p cursor $13 = (PyObject *) 0xa46b90 (gdb) p self->row_factory $14 = (PyObject *) 0x8d05f0 (gdb) x/3i $pc => 0x76496ad8 : movrax,QWORD PTR [rdi+0x10] 0x76496adc : subrax,0x1 0x76496ae0 : movQWORD PTR [rdi+0x10],rax (gdb) p $rdi $15 = 0x4141414141414141 An arbitrary word in memory is decremented. -- The second issue exists in the function pysqlite_connection_set_isolation_level which resides in /Modules/_sqlite/connection.c. It can result in memory getting freed multiple times. The problem is that the variable self->isolation_level is not cleared before being DECREF'd. The code looks like this: static int pysqlite_connection_set_isolation_level(pysqlite_Connection* self, PyObject* isolation_level) { ... Py_XDECREF(self->isolation_level); ... } This call to Py_XDECREF can trigger an arbitrary amount of python code, e.g. via self->isolation_level's __del__ method. That code could then call pysqlite_connection_set_isolation_level again, which would trigger another Py_XDECREF call on the same self->isolation_level, which can thus be freed an arbitrary number of times. One way to fix this is to use Py_CLEAR instead. Here's a proof-of-concept script which results in a segfault here: --- begin script --- import sqlite3 class S(str): def __del__(self): conn.isolation_level = S("B") conn = sqlite3.connect('poc6.db') conn.isolation_level = S("A") conn.isolation_level = "" --- end script --- When run it segfaults here, with Python-3.5.2 and --with-pydebug enabled: (gdb) r ./poc6.py Starting program: /home/xx/Python-3.5.2/python ./poc6.py Program received signal SIGSEGV, Segmentation fault. _Py_ForgetReference (op=op@entry=0x76d81b80) at Objects/object.c:1757 1757if (op == &refchain || (gdb) bt #0 _Py_ForgetReference (op=op@entry=0x76d81b80) at Objects/object.c:1757 #1 0x0049f8c0 in _Py_Dealloc (op=0x76d81b80) at Objects/object.c:1785 #2 0x0046ced8 in method_dealloc (im=im@entry=0x77f25de8) at Objects/classobject.c:198 #3 0x0049f8c5 in _Py_Dealloc (op=op@entry=0x77f25de8) at Objects/object.c:1786 ... -- components: Extension Modules messages: 273659 nosy: ghaering, tehybel priority: normal severity: normal status: open title: sqlite3 type confusion and multiple frees versions: Python 2.7, Python 3.5 ___ Python tracker <https://bugs.python.org/issue27861> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27760] integer overflow in binascii.b2a_qp
tehybel added the comment: The patch seems correct to me. -- nosy: +tehybel ___ Python tracker <http://bugs.python.org/issue27760> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27758] integer overflow in the _csv module's join_append_data function
tehybel added the comment: Thanks for fixing this. I looked at the patch and it seems correct. -- nosy: +tehybel ___ Python tracker <http://bugs.python.org/issue27758> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com