Robert Bradshaw wrote: > On Jun 15, 2009, at 9:12 AM, Dag Sverre Seljebotn wrote: > >> Thanks to everybody who contributed to the discussion on a Cython >> array >> type last week! Here's a summary to attempt focusing the discussion. >> >> There are now two CEPs: >> - CEP 517, array type: http://wiki.cython.org/enhancements/array >> - CEP 518, SIMD operations: http://wiki.cython.org/enhancements/simd >> >> I mostly just added a "what does this facilitate"-section is added >> near >> the beginning of each, and the multidimensional aspect of the >> arrays has >> been emphasised. No need to reread it. > > Looks good. I assume this supersedes http://wiki.cython.org/ > enhancements/buffersyntax ; are there any other wiki pages that are > made obsolete by these proposals?
It is connected with http://wiki.cython.org/enhancements/arraytypes, although not all points are in (like conversion to list), and furthermore it should perhaps be made into your and Stefan's proposed type (which you called [int] or int[]) instead with + for concatenation. BTW, those list-like types can likely share a lot of implementation with my proposed int[:]; the major change would be restricting it to 1D, different arithmetic behaviour, and, say, default coercion to list instead of memoryview (perhaps!, but let's not go there now). But it could still be PEP 3118-backed, coerceable from C pointers, etc., and share implementation for that. Basically it would be two different frontends to the same underlying type. > I still have some questions, but I am certainly in favor of something > like this happening. Yes, I didn't intend to solve all the questions now, just . I suppose I'm still waiting for Stefan's opinion though, given his last comment last week. If positive, I think me and Kurt can do the main work with hammering out the details, though of course you can comment then as much (or little) as you want. (I notice that I'm promoted to lead developer on the Cython front page -- thanks! -- but I don't take it for granted that it should be a case of majority vote, and at any rate I'd never push for something which would make Stefan less interested in the project.) > One thing that isn't quite clear is how exactly the reference > counting/memory allocation is going to work. You give an example of > explicitly creating an int[:,:] via int[:100,:100](). Would some kind > of memoryview be created in the background? A string to hold the > data? (This doesn't have to be decided now, just curious.) This is > also needed for the copy "method," or any implicit copying that happens. > > On a related note, it's still a bit unclear how these things can be > passed around and stored. Are they just a Py_buffer + PyObject*? (I'm > hoping you're thinking they can be passed around and stored with > ease, with allocation either take care of by the corresponding object > (which will clean up the memory when it gets collected) or if there > is no object attached, the user needs to treat it as they would a raw > pointer). I skipped over it as it is a long story. If you really are curious: And int[:] is a pass-by-value struct, containing subslice info and a reference to an acquired view, which in turn is acquired from a memory-holding object. This seems heavy but really is necesarry due to how PEP 3118 is. ("The only problem which can't be solved by another layer of indirection is too many layers of indirection.") In detail: There are three levels: 1) Memory-holding object. When Cython allocates, we need a new type (probably inheriting directly from object), which allocates memory and stores shape/stride information, and returns the right information in tp_getbuffer. Would be a 20-liner in Cython unless we want to go with PyVarObject for allocation in which case I suppose it is a 200-liner in C. 2) The acquired Py_buffer on that object. That would happen the same way as with every array; preferably by using memoryview, or if not, another custom Python type (need to backport memoryview anyway) or if we can't seem to avoid it a custom refcounted struct. 3) Accessing Py_buffer directly is too inefficient, so it must be unpacked in to a custom struct on the stack which basically holds shape/stride information and a reference to the Py_buffer-holding thing in 2). This is the actual variable/temporary type, and is passed-by-value. When taking a slice, the struct in 3) is copied (while adjusting the shape/strides), increfing the view 2) in the process, and 2) holds on to 1). Then there's fields and global variables which call for seperate decisions; probably the most consistent, and efficient in both speed and memory, is to store the structs of 3), although it is a bit counter-intuitive. > Until CEP 518, how would arithmetic happen? (Assuming I'm to lazy to > write the loops myself) would I create two numpy objects to wrap > them, add those objects, and then "unpack" the result. For large > enough datasets, that's not too much overhead. I suppose the real answer is that if you're too lazy to write the loops yourself, you'll be using ndarray[int] in the first place :-) But yes, you can do cdef int[:] a = ..., b = ... a = np.array(a) + b (Or, not 100% sure, but at least a = np.array(a) + np.array(b) will work). Even before NumPy supports PEP 3118 this can work, as we can coerce int[:] to our own subclass of memoryview which implements NumPy's __toarray__ special function. -- Dag Sverre _______________________________________________ Cython-dev mailing list [email protected] http://codespeak.net/mailman/listinfo/cython-dev
