Nathaniel Smith added the comment:

Actually, I fail at grep, and actually there are already tons of good tests for 
__bases__ assignment in test_descr.py. So, patch attached, and analysis follows.


Background on __class__ assignment
----------------------------------

While it's certainly surprising, assignment to obj.__class__ and to 
type.__bases__ has been an explicitly supported feature ever since the advent 
of new-style classes back in 2.ancient, and typeobject.c goes to great lengths 
to handle all the weird resulting cases (including stuff like "what if __del__ 
changes the __class__", "what if someone mutates __bases__ while we are in the 
middle of method lookup inside a metaclass's mro() method", etc.).

To make this safe, of course, we need to somehow check that after making the 
assignment then the resulting object will still be a valid, self-consistent 
Python object. The types in question need to have consistent in-memory 
representations, that their __dict__ slot (if present) is at the same offset, 
that their memory management callbacks are consistent (if we mutate an object 
from type A to type B, then A and B must be written in such a way that it's 
legal to pass an object allocated with A.tp_new to B.tp_dealloc), etc.

This compatibility checking occurs in two places: object_set_class and 
type_set_bases (and several utility functions that they share).

For historical reasons, there are two different C representations for type 
objects in Python: the original representation, which involves 
statically-allocated structs, and the newer representation, which involves 
heap-allocated structs. Aside from their allocation, there are a bunch of 
arcane little differences in things like reference counting, how you set them, 
etc., which are mostly motivated by the need to maintain compatibility with old 
code using the original representation -- all the places where they differ are 
places where the newer representation does something more sensible, it's not 
possible to create a statically-allocated type except by writing C code that 
*doesn't* use the stable ABI, etc. Otherwise they're supposed to work the same 
(eliminating differences between built-in and user-defined classes was the 
whole point of new-style classes!), but all these quirky implementation 
differences do leak out into little semantic differences in various places.

One of the places where they've historically leaked out is in __class__ and 
__bases__ assignment. When the compatibility-checking code for types was 
originally written, it punted on trying to handle the quirky differences 
between statically-allocated and heap-allocated type objects, and just declared 
that all statically-allocated types were incompatible with each other.


The previous patch
------------------

The patch that is causing all the debate here was reviewed in issue22986, after 
extensive discussion on python-dev, and the actual diff can be seen here:
    https://hg.python.org/cpython/rev/c0d25de5919e/

It did two things:

1) It cleaned up a bunch of nasty stuff in the compatibility-checking and 
assignment code. This fixed some real bugs (e.g., a case where incompatible 
classes could be considered compatible based on the contents of random memory 
found by following bogus pointers), and in the process made it robust enough to 
handle all types, both statically-allocated and heap-allocated.

2) It removed the check in object_set_class that protected the downstream code 
from ever seeing statically-allocated types - that's the 'if' check here:
    https://hg.python.org/cpython/rev/c0d25de5919e/#l3.97

As far as we know, this is all working exactly as designed! For example, 
Serhiy's int subclass at the beginning of the thread is compatible with the int 
class in the sense that the modified version of the '42' object is still a 
valid Python object, all its fields are valid, it won't crash the interpreter 
when deallocated, etc.

But of course what we didn't think of is that the interpreter assumes that 
instances of some particular built-in types are truly immutable, and the 
interpreter's internal invariants are violated if you can in any way modify an 
'int' object in place. Doh.


This new patch
--------------

So the attached patch modifies the original patch by leaving the cleanups in 
place, but puts back a guard in object_set_class that disallows __class__ 
assignment for statically-allocated types EXCEPT that it still allows it 
specifically in the case of ModuleType subclasses. It also adds a big comment 
explaining the purpose of the check, and adds tests to make sure that __class__ 
assignment is disallowed for builtin immutable types.

Analysis of the mergeability of this patch:

- We still believe that the actual compatibility checking code is strictly 
better than it used to be, so in any case where these parts of the changes 
affect Python's semantics, the new results should be better (i.e. less likely 
to break memory safety and crash the interpreter).

- The new guard added by this patch is conservative, but allows a strict 
superset of what 3.4 and earlier allowed, so it won't break any existing code.

- The one way that the proposed new guard is less conservative than the one in 
3.4 is that the new one allows ModuleType objects through. Since we believe 
that the compatibility-checking code is now correct for statically-allocated 
types, this should be fine from a memory-safety point of view, and because the 
interpreter definitely does not assume that ModuleType objects are immutable,  
then this should be safe from that perspective as well.

Larry also asked for a "regression test that tried mutating fields on a bunch 
of interned objects". I'm not sure which fields he was thinking of in 
particular, but the only entry points that lead to code touched by the original 
patch are attempts to set __class__ or to set __bases__. *__bases__assignment 
has always been illegal on statically-allocated types, and has remained illegal 
through all of these patches*, plus it already has a bunch of tests, so I left 
it alone. This patch adds tests for __class__ assignment on all the 
potentially-interned types that I could think of.

So..... I think that covers all the bases about what this patch is and why it 
is appropriate for 3.5.0 (assuming that Larry sticks with treating this as a 
release-critical bug). I'll post a follow-up with replies to a few specific 
side comments that people made.

----------
keywords: +patch
Added file: http://bugs.python.org/file40310/issue24912.patch

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue24912>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to