Nathaniel Smith added the comment:
Actually, I fail at grep, and actually there are already tons of good tests for
__bases__ assignment in test_descr.py. So, patch attached, and analysis follows.
Background on __class__ assignment
----------------------------------
While it's certainly surprising, assignment to obj.__class__ and to
type.__bases__ has been an explicitly supported feature ever since the advent
of new-style classes back in 2.ancient, and typeobject.c goes to great lengths
to handle all the weird resulting cases (including stuff like "what if __del__
changes the __class__", "what if someone mutates __bases__ while we are in the
middle of method lookup inside a metaclass's mro() method", etc.).
To make this safe, of course, we need to somehow check that after making the
assignment then the resulting object will still be a valid, self-consistent
Python object. The types in question need to have consistent in-memory
representations, that their __dict__ slot (if present) is at the same offset,
that their memory management callbacks are consistent (if we mutate an object
from type A to type B, then A and B must be written in such a way that it's
legal to pass an object allocated with A.tp_new to B.tp_dealloc), etc.
This compatibility checking occurs in two places: object_set_class and
type_set_bases (and several utility functions that they share).
For historical reasons, there are two different C representations for type
objects in Python: the original representation, which involves
statically-allocated structs, and the newer representation, which involves
heap-allocated structs. Aside from their allocation, there are a bunch of
arcane little differences in things like reference counting, how you set them,
etc., which are mostly motivated by the need to maintain compatibility with old
code using the original representation -- all the places where they differ are
places where the newer representation does something more sensible, it's not
possible to create a statically-allocated type except by writing C code that
*doesn't* use the stable ABI, etc. Otherwise they're supposed to work the same
(eliminating differences between built-in and user-defined classes was the
whole point of new-style classes!), but all these quirky implementation
differences do leak out into little semantic differences in various places.
One of the places where they've historically leaked out is in __class__ and
__bases__ assignment. When the compatibility-checking code for types was
originally written, it punted on trying to handle the quirky differences
between statically-allocated and heap-allocated type objects, and just declared
that all statically-allocated types were incompatible with each other.
The previous patch
------------------
The patch that is causing all the debate here was reviewed in issue22986, after
extensive discussion on python-dev, and the actual diff can be seen here:
https://hg.python.org/cpython/rev/c0d25de5919e/
It did two things:
1) It cleaned up a bunch of nasty stuff in the compatibility-checking and
assignment code. This fixed some real bugs (e.g., a case where incompatible
classes could be considered compatible based on the contents of random memory
found by following bogus pointers), and in the process made it robust enough to
handle all types, both statically-allocated and heap-allocated.
2) It removed the check in object_set_class that protected the downstream code
from ever seeing statically-allocated types - that's the 'if' check here:
https://hg.python.org/cpython/rev/c0d25de5919e/#l3.97
As far as we know, this is all working exactly as designed! For example,
Serhiy's int subclass at the beginning of the thread is compatible with the int
class in the sense that the modified version of the '42' object is still a
valid Python object, all its fields are valid, it won't crash the interpreter
when deallocated, etc.
But of course what we didn't think of is that the interpreter assumes that
instances of some particular built-in types are truly immutable, and the
interpreter's internal invariants are violated if you can in any way modify an
'int' object in place. Doh.
This new patch
--------------
So the attached patch modifies the original patch by leaving the cleanups in
place, but puts back a guard in object_set_class that disallows __class__
assignment for statically-allocated types EXCEPT that it still allows it
specifically in the case of ModuleType subclasses. It also adds a big comment
explaining the purpose of the check, and adds tests to make sure that __class__
assignment is disallowed for builtin immutable types.
Analysis of the mergeability of this patch:
- We still believe that the actual compatibility checking code is strictly
better than it used to be, so in any case where these parts of the changes
affect Python's semantics, the new results should be better (i.e. less likely
to break memory safety and crash the interpreter).
- The new guard added by this patch is conservative, but allows a strict
superset of what 3.4 and earlier allowed, so it won't break any existing code.
- The one way that the proposed new guard is less conservative than the one in
3.4 is that the new one allows ModuleType objects through. Since we believe
that the compatibility-checking code is now correct for statically-allocated
types, this should be fine from a memory-safety point of view, and because the
interpreter definitely does not assume that ModuleType objects are immutable,
then this should be safe from that perspective as well.
Larry also asked for a "regression test that tried mutating fields on a bunch
of interned objects". I'm not sure which fields he was thinking of in
particular, but the only entry points that lead to code touched by the original
patch are attempts to set __class__ or to set __bases__. *__bases__assignment
has always been illegal on statically-allocated types, and has remained illegal
through all of these patches*, plus it already has a bunch of tests, so I left
it alone. This patch adds tests for __class__ assignment on all the
potentially-interned types that I could think of.
So..... I think that covers all the bases about what this patch is and why it
is appropriate for 3.5.0 (assuming that Larry sticks with treating this as a
release-critical bug). I'll post a follow-up with replies to a few specific
side comments that people made.
----------
keywords: +patch
Added file: http://bugs.python.org/file40310/issue24912.patch
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue24912>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com