Re: [Cython] Cython array type: Summary, introducing CEP 518

Stefan Behnel Thu, 18 Jun 2009 10:25:31 -0700

Hi,

Dag Sverre Seljebotn wrote:
> Sorry about the medium-sized length, but I'd like this to be close to my 
> last email on the subject. I'd just refer to Robert's mail, but I guess 
> some more explanation about NumPy semantics is in order for the benefit 
> of non-NumPy-users, so I've made a summary of that.


You honestly call that a summary? ;)

Anyway, thanks for doing that, that was pretty interesting to see.


> Stefan Behnel wrote:
>> Dag Sverre Seljebotn wrote:
>>> Stefan Behnel wrote:
>>>> we have three types:
>>>>
>>>> 1) a dynamic array type
>>>> - allocates memory on creation
>>>> - reallocates on (explicit) resizing, e.g. a .resize() method
>>>> - supports PEP 3118 (and disables shrinking with live buffers)
>>>> - returns a typed value on indexing
>>>> - returns a typed array copy on slicing
>>>> - behaves like a tuple otherwise
>>>>
>>>> 2) a typed memory view
>>>> - created on top of a buffer (or array)
>>>> - never allocates memory (for data, that is)
>>>> - creates a new view object on slicing
>>>> - behaves like an array otherwise
>>> This last point is dangerous as we seem to disagree about what an array
>>> is.
>> It's what I described under 1).
>>
>>>> 3) a SIMD memory view
>>>> - created on top of a buffer, array or memory view
>>>> - supports parallel per-item arithmetic
>>>> - behaves like a memory view otherwise
>>> Good summary. Starting from this: I want int[:,:] to be the combination
>>> of 2) and 3)
>> You mean "3) and not 2)", right? Could you explain why you need a syntax
>> for this if it's only a view?
> 
> I suppose I meant some variation of 3) with some extra bullet points 
> (slicing in particular).

Erm, I though you wanted slicing to return views here? When I wrote "memory
view" in 3), I meant the memory view defined in 2). You might want to
re-read my post...


> We need a syntax because SIMD operations must 
> be handled as a special-case compile-time.

What I'm asking is why this can't work:

        cdef SIMD some_simd_value = SIMD(array[20,20])

or, if you need to add parameters, what about

        cdef SIMD[20,20, stride="whatever"] some_simd_value = \
                        SIMD(array[20,20])

I'm not actually repeating the parameters here, the RHS is for creating the
array object at runtime, the LHS is for declaring the type of the SIMD to
Cython at compile time. Some variation of this should work, I think, and
has the advantage that it's clear where view objects are created. It's just
a lot more explicit.


> 1) Nobody is claming this is elegant or Pythonic. It is catering for a 
> numerical special interest, nothing more nor less.

Memory managed arrays and typed memory views have a much wider applicability.


> That said, here's a long list of what I mean with NumPy semantics, 
> assuming both CEPs are implemented.
> 
> # make x a compile-time-optimizeable 2D view on memoryview(obj)
> cdef int[:,:] x = obj

That's exactly what I'm questioning. This hurts my eyes in an almost
perlish way, and it's totally unclear that this type has SIMD properties.
Given that we wanted to have a template syntax anyway, I'd largely prefer

        cdef SIMD[dim=2] x = obj

If we ever allow typedefs for Cython types, users can just define their own
int2D type or whatever makes sense for them.


> # Indexing
> x[2,3]
> 
> # Access shape, stride info, raw data pointer
> x.shape
> x.strides
> x.data

Sure.


> # Slicing out new view of third row (in two ways)
> y = x[2,:]
> y = x[2,...]

Shouldn't there be one way to do it?


> # Now, modifying y modifies what x points to too.
> # Make a copy so that y points to seperate memory:
> y = y.copy()

That's ok, assuming that the type specific behaviour of y is obvious.


> # Set all entries in array 12
> x[...] = 12

Now, *that* is a weird syntax. This is called broadcasting, right?


> # Set only first row to 10
> x[0, :] = 10

I like that, although it conflicts with the above syntax for broadcasting.
Given the example above, I would have expected

 x[0, ...] = 10


> # Some ways of multiplying all elements with 2
> x *= 2
> x[...] *= 2
> x[:,:] *= 2
> x += x
> x[...] += x

Supporting this efficiently should be easy as it's a pure type feature.


> # A more complicated expression...allocates memory
> x = stdmath.sqrt(x*x + x*(x+1)/(x+2))
> 
> # A more complicated expression...overwrites existing
> # memory
> x[...] = stdmath.sqrt(x*x + x*(x+1)/(x+2))

Supporting this efficiently obviously requires compiler support. Otherwise,
we'd always end up with a copy in between.


> # Boolean operators
> cdef bint[:,:] b # perhaps we could support 8-bit bool too
> b = (x == 2)
> # b is now an array the shape of x, containing True where x[i,j] == 2

That's awful, but I guess it gets better when you think about it long enough.


> # Get sum of elements
> import numpy as np
> np.sum(x)
> 
> # As for printing/coercion to Python object, that remains
> # TBD. Either memoryview, or a pretty-printing subclass
> # of memoryview, implementing NumPy's __toarray__ protocol
> # as well for better compatability

Fine.


> Here's what I do NOT want to include from NumPy:
> 
> # Get sum and mean
> x.sum()
> x.mean()
> # and so on, you have to do np.sum(x).

Sure.


> # "Fancy indexing" is a mess because the returned object
> # (due to implementation constraints) is a copy, not a view,
> # thus being inconsistent with the above. My stance is that
> # this can go in when we can support treating it as a view,
> # instead of following NumPy with making a copy. I have ideas
> # for how to do this.
> 
> # Get the intersecting array of rows 1, 4 and 5 and
> # colums 2 and 1
> new_data_copy = x[[1,4,5], [2,1]]
> 
> # Set the same intersection to 0. This is where NumPy gets
> # really inconsistent; making an exception specifically
> # in __setitem__ for this case.
> x[[1,4,5], [2,1]] = 0
> # modified x
> 
> # If y has length 3, pick out element 0 and 4
> y[[True, False, False, True]]

Right so.


_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Cython array type: Summary, introducing CEP 518

Reply via email to