Re: [Cython] Cython array type: Summary, introducing CEP 518

Dag Sverre Seljebotn Wed, 17 Jun 2009 09:56:14 -0700

Robert Bradshaw wrote:
> On Jun 15, 2009, at 9:12 AM, Dag Sverre Seljebotn wrote:
> 
>> Thanks to everybody who contributed to the discussion on a Cython  
>> array
>> type last week! Here's a summary to attempt focusing the discussion.
>>
>> There are now two CEPs:
>> - CEP 517, array type: http://wiki.cython.org/enhancements/array
>> - CEP 518, SIMD operations: http://wiki.cython.org/enhancements/simd
>>
>> I mostly just added a "what does this facilitate"-section is added  
>> near
>> the beginning of each, and the multidimensional aspect of the  
>> arrays has
>> been emphasised. No need to reread it.
> 
> Looks good. I assume this supersedes http://wiki.cython.org/ 
> enhancements/buffersyntax ; are there any other wiki pages that are  
> made obsolete by these proposals?


It is connected with http://wiki.cython.org/enhancements/arraytypes, 
although not all points are in (like conversion to list), and 
furthermore it should perhaps be made into your and Stefan's proposed 
type (which you called [int] or int[]) instead with + for concatenation.

BTW, those list-like types can likely share a lot of implementation with 
my proposed int[:]; the major change would be restricting it to 1D, 
different arithmetic behaviour, and, say, default coercion to list 
instead of memoryview (perhaps!, but let's not go there now). But it 
could still be PEP 3118-backed, coerceable from C pointers, etc., and 
share implementation for that. Basically it would be two different 
frontends to the same underlying type.

> I still have some questions, but I am certainly in favor of something  
> like this happening.

Yes, I didn't intend to solve all the questions now, just .

I suppose I'm still waiting for Stefan's opinion though, given his last 
comment last week. If positive, I think me and Kurt can do the main work 
with hammering out the details, though of course you can comment then as 
much (or little) as you want.

(I notice that I'm promoted to lead developer on the Cython front page 
-- thanks! -- but I don't take it for granted that it should be a case 
of majority vote, and at any rate I'd never push for something which 
would make Stefan less interested in the project.)

> One thing that isn't quite clear is how exactly the reference  
> counting/memory allocation is going to work. You give an example of  
> explicitly creating an int[:,:] via int[:100,:100](). Would some kind  
> of memoryview be created in the background? A string to hold the  
> data? (This doesn't have to be decided now, just curious.) This is  
> also needed for the copy "method," or any implicit copying that happens.
 >
 > On a related note, it's still a bit unclear how these things can be
 > passed around and stored. Are they just a Py_buffer + PyObject*? (I'm
 > hoping you're thinking they can be passed around and stored with
 > ease, with allocation either take care of by the corresponding object
 > (which will clean up the memory when it gets collected) or if there
 > is no object attached, the user needs to treat it as they would a raw
 > pointer).

I skipped over it as it is a long story. If you really are curious:

And int[:] is a pass-by-value struct, containing subslice info and a 
reference to an acquired view, which in turn is acquired from a 
memory-holding object.

This seems heavy but really is necesarry due to how PEP 3118 is. ("The 
only problem which can't be solved by another layer of indirection is 
too many layers of indirection.")

In detail:

There are three levels:
1) Memory-holding object. When Cython allocates, we need a new type 
(probably inheriting directly from object), which allocates memory and 
stores shape/stride information, and returns the right information in 
tp_getbuffer. Would be a 20-liner in Cython unless we want to go with 
PyVarObject for allocation in which case I suppose it is a 200-liner in C.

2) The acquired Py_buffer on that object. That would happen the same way 
as with every array; preferably by using memoryview, or if not, another 
custom Python type (need to backport memoryview anyway) or if we can't 
seem to avoid it a custom refcounted struct.

3) Accessing Py_buffer directly is too inefficient, so it must be 
unpacked in to a custom struct on the stack which basically holds 
shape/stride information and a reference to the Py_buffer-holding thing 
in 2). This is the actual variable/temporary type, and is passed-by-value.

When taking a slice, the struct in 3) is copied (while adjusting the 
shape/strides), increfing the view 2) in the process, and 2) holds on to 1).

Then there's fields and global variables which call for seperate 
decisions; probably the most consistent, and efficient in both speed and 
memory, is to store the structs of 3), although it is a bit 
counter-intuitive.

> Until CEP 518, how would arithmetic happen? (Assuming I'm to lazy to  
> write the loops myself) would I create two numpy objects to wrap  
> them, add those objects, and then "unpack" the result. For large  
> enough datasets, that's not too much overhead.

I suppose the real answer is that if you're too lazy to write the loops 
yourself, you'll be using ndarray[int] in the first place :-)

But yes, you can do

cdef int[:] a = ..., b = ...
a = np.array(a) + b

(Or, not 100% sure, but at least

a = np.array(a) + np.array(b)

will work).

Even before NumPy supports PEP 3118 this can work, as we can coerce 
int[:] to our own subclass of memoryview which implements NumPy's 
__toarray__ special function.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Cython array type: Summary, introducing CEP 518

Reply via email to