[Python-Dev] Re: Problems with dict subclassing performance

Jeff Allen Mon, 16 Aug 2021 01:58:05 -0700

On 06/08/2021 20:29, Marco Sulla wrote:

I've done an answer on SO about why subclassing `dict` makes thesubclass so much slower than `dict`. The answer is interesting:https://stackoverflow.com/questions/59912147/why-does-subclassing-in-python-slow-things-down-so-much

What do you think about?

I have spent a lot of time reading typeobject.c over the years I've beenlooking at an alternative implementation. It's quite difficult tofollow, and full of tweaks for special circumstances. So I'm impressedwith the understanding that "user2357112 supports Monica" brings to thesubject. (Yes, I want to call them Monica too, but I don't think that'stheir actual name. ) I don't think I understand it better than they buthere's my reading of that, informed by my reading of typeobject.c, incase it helps.

When a built-in type like dict is defined in C, pointers to its Cimplementation functions are hard-coded into slots in the type object.In order to make each appear as a method to Python, a descriptor iscreated when building the type that delegates to the slot (sosq_contains generates a descriptor __contains__ in the dictionary of thetype.

Conversely, if in a sub-class you define __contains__, then the typebuilder will insert a function pointer in the slot of the new type thatarranges a call to __contains__. This will overwrite whatever was in theslot.

In a C implementation, you can also define methods (by creating aPyMethodDef the tp_methods table) that become descriptors in thedictionary of the type. You would not normally define both a C functionto place in the slot *and* the corresponding method via a PyMethodDef.If you do, the version from the dictionary of the type will win theslot, *unless* you mark the method definition (in its PyMethodDef) asMETH_COEXIST.

This exception is used in the special case of dict (and hardly anywhereelse but set I think). I assume this is because some important codecalls __contains__ via the descriptor, rather than via the slot (whichwould be quicker), and because an explicit definition is faster than adescriptor created automatically to wrap the slot.

Now, when you create a sub-class, the table of slots is copied first,then the type is checked for definitions of special methods, and theseare allowed to overwrite the slot, unless they are slot wrappers on thesame function pointer the slot already contains. I think at this pointthe slot is re-written to contain a wrapper on __contains__, which hasbeen inherited from dict.__contains__, because it isn't a *slot wrapper*on the same function. For example:


    >>> dict.__contains__
   <method '__contains__' of 'dict' objects>
    >>> str.__contains__
   <slot wrapper '__contains__' of 'str' objects>

    >>> class S(str): pass

    >>> S.__contains__
   <slot wrapper '__contains__' of 'str' objects>
    >>> D.__contains__
   <method '__contains__' of 'dict' objects>

I think that when filling the slots of a sub-class, one could check forthe METH_COEXIST flag at the point one checks to see whether thedefinition from look-up on the type is a PyWrapperDescr on the samepointer. One might have to know that the slot and descriptor come fromthe same base. I'm not suggesting this would be worthwhile.

FYI, in the approach I am toying with, the slot wrapper descriptor isalways created from the function definition, then the slot is filledfrom the available definitions by lookup. Defining __contains__ twicewould be impossible or an error. I think this has the semantics requiredby Python, but we'll have to wait for proof.

-- Jeff Allen

_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/ZKMVZ5M3V76SOZH7FOURQ66VFZQY2BTG/
Code of Conduct: http://python.org/psf/codeofconduct/

[Python-Dev] Re: Problems with dict subclassing performance

Reply via email to