Re: [Cython] Cython array type: Summary, introducing CEP 518

Dag Sverre Seljebotn Wed, 17 Jun 2009 14:27:05 -0700

Stefan Behnel wrote:
>> There are three levels:
>> 1) Memory-holding object. When Cython allocates, we need a new type 
>> (probably inheriting directly from object), which allocates memory and 
>> stores shape/stride information, and returns the right information in 
>> tp_getbuffer. Would be a 20-liner in Cython unless we want to go with 
>> PyVarObject for allocation in which case I suppose it is a 200-liner in C.
> 
> I'm all for keeping such an implementation in Cython, at least for now. We
> can always optimise/reimplement later.


Sounds right. Would motive doing something about Cython support for 
PyVarObject too (perhaps a magic type you'd inherit from, which has a 
constructor with size and a void* field pointing to the allocated 
buffer? So #152 would get us 90% of the way.)

>> 2) The acquired Py_buffer on that object.
> 
> Erm, buffers are not memory managed, right? They are just the plain structs
> that I think of?

Yes, they need to be wrapped in an object of some kind.

> 
>> That would happen the same way 
>> as with every array; preferably by using memoryview, or if not, another 
>> custom Python type (need to backport memoryview anyway) or if we can't 
>> seem to avoid it a custom refcounted struct.
> 
> If we implement the object ourselves, what do we need memoryview for? Would
> it not just implement the buffer protocol? Without any ref-counting? And
> when coercing to an object, why not just return the object itself? It could
> just have a C-level interface and a Python interface.

I don't follow you quite. But some points I think are relevant:

a) The memory will often live in objects from 3rd party libraries 
accessed through PEP 3118. It seems cleanest to have Cython-allocated 
memory just be another such object, treated in the same fashion.

b) If you load 10 GB of data into memory in one array (yes some numerics 
users do that), copying is usually not an option; copying must always be 
easily predictable and user-managed.

c) When taking slices, you get a new view to the same data, however the 
original Py_buffer, likely stored in a memoryview object, cannot be 
modified by protocol. I.e. the new slice needs to be "stored" elsewhere. 
And the original object can't be told to reallocate/reslice.

> I also read in CEP 517 that resizing is not to be supported. Why not? It
> could just fail with an exception when it notices that there are live
> buffers on it.

It just seemed kind of useless if only Cython-allocated memory can be 
resized, and not C arrays or memory in other Python objects. But we 
could do it for Cython-allocated memory, perhaps by extending PEP 3118 
with some Cython-specific flags etc.

I'd rather leave it for later, but it would be cool to have. It's a 
matter of developer time vs. utility too.

>> 3) Accessing Py_buffer directly is too inefficient, so it must be 
>> unpacked in to a custom struct on the stack which basically holds 
>> shape/stride information and a reference to the Py_buffer-holding thing 
>> in 2). This is the actual variable/temporary type, and is passed-by-value.
>>
>> When taking a slice, the struct in 3) is copied (while adjusting the 
>> shape/strides), increfing the view 2) in the process, and 2) holds on to 1).
> 
> So a slice would be a view? Because if it was a copy, you'd need a separate
> memory object to back it. A view would be easiest handled by the normal
> memoryview(), though, right? And I find a copy much less surprising...

Sorry, another NumPy-ism which I forgot wasn't there in core Python. 
I'll fix the CEP at some point. Yes, slices are views and absolutely 
have to be. Slices being natural and efficient views to work with is 
kind of one of the main points. A NumPy example for you

arr[start:end:32, 3] += value * arr[start:end:32, 1]

is much more convenient than

for i in range(start, end, 32):
     arr[i, 3] += value * arr[i, 1]

(it may not look more convenient, but when you do it 50 times a day, 
with bigger and more complicated expressions, it really is)

Another point is that it is the only way of sanely manipulating very big 
arrays (which BTW can be memory mapped -- a single 10 GB array is rather 
common I think).

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Cython array type: Summary, introducing CEP 518

Reply via email to