Eryk Sun added the comment:

This is a consequence of several factors. It starts with the __init__ method of 
ctypes.Array, Array_init. This function doesn't hard-code calling the base 
sq_ass_item slot function, Array_ass_item. If it did, it wouldn't be nearly as 
slow. Instead it calls the abstract function PySequence_SetItem. Doing it this 
way accommodates an array subclass that overrides __setitem__. 

What I'd like to do here is check whether the sq_ass_item slot is defined as 
Array_ass_item, and if so call it directly instead of PySequence_SetItem. But 
it turns out that it's not set as Array_ass_item even if the subclass doesn't 
override __setitem__, and more than anything this is the real culprit for the 
relative slowness of Array_init.

If a built-in type such as ctypes.Array defines both mp_ass_subscript and 
sq_ass_item, then the __setitem__ wrapper_descriptor wraps the more generic 
mp_ass_subscript slot function. Then for a subclass, update_one_slot in 
Objects/typeobject.c plays it safe when updating the sq_ass_item slot. It sees 
that the inherited __setitem__ descriptor doesn't call wrap_sq_setitem, so it 
defines the slot in the subclass to use the generic function slot_sq_ass_item. 

This generic slot function goes the long way around to look up and bind the 
__setitem__ method and convert the Py_ssize_t index to a Python integer, to 
call the wrapper that calls the mp_ass_subscript slot. To add insult to injury, 
the implementation of this slot for a ctypes Array, Array_ass_subscript, has to 
convert back to a Py_ssize_t integer via PyNumber_AsSsize_t.

I don't know if this can be resolved while preserving the generic design of the 
initializer. As is, calling PySequence_SetItem in a tight loop is ridiculously 
slow. I experimented with calling Array_ass_item directly. With this change 
it's as fast as assigning to a slice of the whole array. Actually with a list 
it's a bit slower because *t has to be copied to a tuple. But it takes about 
the same amount of time as assigning to a slice when t is already a tuple, such 
as tuple(range(1000000)).

I doubt any amount of tweaking will make ctypes as fast as an array.array. 
ctypes has a generic design to accommodate simple C data, pointers, and 
aggregate arrays, structs, and unions. This comes with some cost to 
performance. However, you can and should make use of the buffer protocol to use 
arrays from the array module or numpy where performance is critical. It's 
trivial to create a ctypes array from an object that supports the buffer 
protocol. For example: 

    v = array.array('I', t)
    a = (ctypes.c_uint32 * len(v)).from_buffer(v)

There's no need to use the array.array's buffer_info() or ctypes.cast(). The 
from_buffer() method creates an array that shares the buffer of the source 
object, so it's relatively fast. It's also returning a sized array instead of a 
lengthless pointer (though it is possible to cast to an array pointer and 
immediately dereference the array).

----------
nosy: +eryksun

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue27926>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to