Josh Rosenberg <shadowranger+pyt...@gmail.com> added the comment:

Victor found the same bug I found while I was composing this, posting only to 
incorporate proposed solution:

I *think* I have a cause for this, but someone else with greater understanding 
of the cycle collector should check me if the suggested fix has non-trivial 
performance implications (I suspect the answer is no, performance is 
unaffected).

slice_richcompare borrows its behavior from tuple by creating a temporary tuple 
for each slice, the delegating to the tuple comparison ( 
https://github.com/python/cpython/blob/master/Objects/sliceobject.c#L591 ).

Problem is, it uses borrowed references when creating said tuples, not owned 
references. Because test_slice's BadCmp.__eq__ is implemented in Python, the 
comparison can be interrupted by cycle collection during the __eq__ call. When 
then happens, there are precisely two references to the BadCmp object:

1. In the slice (owned)
2. In the temporary tuple (borrowed)

When a cycle collection occurs during the comparison, and subtract_refs ( 
https://github.com/python/cpython/blob/master/Modules/gcmodule.c#L398 ) is 
called, the BadCmp object in question is visited via both the slice and the 
tuple, and since it has no non-container objects referencing it, it ends up 
with the initial reference count of 1 attempting to drop to -1, and the 
assertion is violated. While the code of gcmodule.c appears to have been 
refactored since 3.7 so the assert occurs in a different function, with a 
slightly different message, it would break the same way in both 3.7 and master, 
and whether or not it triggers the bug, the broken behavior of 
slice_richcompare hasn't changed for a *long* time. 

Underlying problem would seem to be slice's richcompare believing it's okay to 
make a tuple from borrowed references, then make a call on it that can trigger 
calls into Python level code (and therefore into the cycle collector); 
everything else is behaving correctly here. I'm guessing the only reason it's 
not seen in the wild is that slices based on Python defined types are almost 
never compared at all, let alone compared on debug builds that would be 
checking the assert and with an accelerated cycle collection cycle that would 
make a hit likely.

Solution would be to stop trying to microoptimize slice_richcompare to avoid 
reference count manipulation and just build a proper tuple. It would even 
simplify the code since we could just use PyTuple_Pack, reducing custom code by 
replacing:

    t1 = PyTuple_New(3);
    if (t1 == NULL)
        return NULL;
    t2 = PyTuple_New(3);
    if (t2 == NULL) {
        Py_DECREF(t1);
        return NULL;
    }

    PyTuple_SET_ITEM(t1, 0, ((PySliceObject *)v)->start);
    PyTuple_SET_ITEM(t1, 1, ((PySliceObject *)v)->stop);
    PyTuple_SET_ITEM(t1, 2, ((PySliceObject *)v)->step);
    PyTuple_SET_ITEM(t2, 0, ((PySliceObject *)w)->start);
    PyTuple_SET_ITEM(t2, 1, ((PySliceObject *)w)->stop);
    PyTuple_SET_ITEM(t2, 2, ((PySliceObject *)w)->step);

with:

    t1 = PyTuple_Pack(3, ((PySliceObject *)v)->start, ((PySliceObject 
*)v)->stop, ((PySliceObject *)v)->step);
    if (t1 == NULL)
        return NULL;
    t2 = PyTuple_Pack(3, ((PySliceObject *)w)->start, ((PySliceObject 
*)w)->stop, ((PySliceObject *)w)->step);
    if (t2 == NULL) {
        Py_DECREF(t1);
        return NULL;
    }

and makes cleanup simpler, since you can just delete:

    PyTuple_SET_ITEM(t1, 0, NULL);
    PyTuple_SET_ITEM(t1, 1, NULL);
    PyTuple_SET_ITEM(t1, 2, NULL);
    PyTuple_SET_ITEM(t2, 0, NULL);
    PyTuple_SET_ITEM(t2, 1, NULL);
    PyTuple_SET_ITEM(t2, 2, NULL);

and let the DECREFs for t1/t2 do their work normally.

If for some reason the reference count manipulation is unacceptable, this 
*could* switch between two behaviors depending on whether or not 
start/stop/step are of known types (e.g. if all are NoneType/int, this could 
use the borrowed refs code path safely) where a call back into Python level 
code is impossible; given that slices are usually made of None and/or ints, 
this would remove most of the cost for the common case, at the expense of more 
complicated code. Wouldn't help numpy types though, and I suspect the cost of 
pre-checking the types for all six values involved would eliminate most of the 
savings.

Sorry for not submitting a proper PR; the work machine I use during the day is 
not suitable for development (doesn't even have Python installed).

----------
nosy: +josh.r

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue35961>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to