Eryk Sun added the comment:
This is a consequence of several factors. It starts with the __init__ method of
ctypes.Array, Array_init. This function doesn't hard-code calling the base
sq_ass_item slot function, Array_ass_item. If it did, it wouldn't be nearly as
slow. Instead it calls the abstract function PySequence_SetItem. Doing it this
way accommodates an array subclass that overrides __setitem__.
What I'd like to do here is check whether the sq_ass_item slot is defined as
Array_ass_item, and if so call it directly instead of PySequence_SetItem. But
it turns out that it's not set as Array_ass_item even if the subclass doesn't
override __setitem__, and more than anything this is the real culprit for the
relative slowness of Array_init.
If a built-in type such as ctypes.Array defines both mp_ass_subscript and
sq_ass_item, then the __setitem__ wrapper_descriptor wraps the more generic
mp_ass_subscript slot function. Then for a subclass, update_one_slot in
Objects/typeobject.c plays it safe when updating the sq_ass_item slot. It sees
that the inherited __setitem__ descriptor doesn't call wrap_sq_setitem, so it
defines the slot in the subclass to use the generic function slot_sq_ass_item.
This generic slot function goes the long way around to look up and bind the
__setitem__ method and convert the Py_ssize_t index to a Python integer, to
call the wrapper that calls the mp_ass_subscript slot. To add insult to injury,
the implementation of this slot for a ctypes Array, Array_ass_subscript, has to
convert back to a Py_ssize_t integer via PyNumber_AsSsize_t.
I don't know if this can be resolved while preserving the generic design of the
initializer. As is, calling PySequence_SetItem in a tight loop is ridiculously
slow. I experimented with calling Array_ass_item directly. With this change
it's as fast as assigning to a slice of the whole array. Actually with a list
it's a bit slower because *t has to be copied to a tuple. But it takes about
the same amount of time as assigning to a slice when t is already a tuple, such
as tuple(range(1000000)).
I doubt any amount of tweaking will make ctypes as fast as an array.array.
ctypes has a generic design to accommodate simple C data, pointers, and
aggregate arrays, structs, and unions. This comes with some cost to
performance. However, you can and should make use of the buffer protocol to use
arrays from the array module or numpy where performance is critical. It's
trivial to create a ctypes array from an object that supports the buffer
protocol. For example:
v = array.array('I', t)
a = (ctypes.c_uint32 * len(v)).from_buffer(v)
There's no need to use the array.array's buffer_info() or ctypes.cast(). The
from_buffer() method creates an array that shares the buffer of the source
object, so it's relatively fast. It's also returning a sized array instead of a
lengthless pointer (though it is possible to cast to an array pointer and
immediately dereference the array).
----------
nosy: +eryksun
_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue27926>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe:
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com