Re: [Cython] Cython array type: Summary, introducing CEP 518

Dag Sverre Seljebotn Thu, 18 Jun 2009 11:56:35 -0700

Stefan Behnel wrote:
> Hi,
> 
> Dag Sverre Seljebotn wrote:
>> Sorry about the medium-sized length, but I'd like this to be close to my 
>> last email on the subject. I'd just refer to Robert's mail, but I guess 
>> some more explanation about NumPy semantics is in order for the benefit 
>> of non-NumPy-users, so I've made a summary of that.
> 
> You honestly call that a summary? ;)
> 
> Anyway, thanks for doing that, that was pretty interesting to see.


It wasn't perfect as we'll see. Anyway, the best source of this is 
simply to read NumPy docs, this is a good start:

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

>>> You mean "3) and not 2)", right? Could you explain why you need a syntax
>>> for this if it's only a view?
>> I suppose I meant some variation of 3) with some extra bullet points 
>> (slicing in particular).
> 
> Erm, I though you wanted slicing to return views here? When I wrote "memory
> view" in 3), I meant the memory view defined in 2). You might want to
> re-read my post...

I misunderstood what you referred to with memory view. I won't even go 
into details...

>> We need a syntax because SIMD operations must 
>> be handled as a special-case compile-time.
> 
> What I'm asking is why this can't work:
> 
>       cdef SIMD some_simd_value = SIMD(array[20,20])
> 
> or, if you need to add parameters, what about
> 
>       cdef SIMD[20,20, stride="whatever"] some_simd_value = \
>                       SIMD(array[20,20])
> 
> I'm not actually repeating the parameters here, the RHS is for creating the
> array object at runtime, the LHS is for declaring the type of the SIMD to
> Cython at compile time. Some variation of this should work, I think, and
> has the advantage that it's clear where view objects are created. It's just
> a lot more explicit.

OK, I misunderstood you again. Breaking down what you say in two:

1) Syntax of SIMD view type. You propose SIMD[something] instead of 
basetype[something] -- that can obviously be made to work. As all syntax 
it is a matter of taste. I'll certainly go for whatever majority vote 
says here, it's not very important to me, but I vote for 
basetype[something] as it looks a lot more attractive for potential 
numerical users IMO.

2) Acquisition. You could rewrite your example with my syntax like e.g. 
this:

obj = cython_array_type[int](20, 20) # create array object
cdef int[:,:] some_simd
some_simd = int[:,:](obj)

IIUC you would just switch out int[:,:] with SIMD[...] to get something 
close to what you want.

Hmm. I actually like the explicitness that you propose. But it is a 
matter of verbosity/repeating, as replacing line 3 with "some_simd = 
obj" could only mean one thing. I'm +0 on this issue.

In pure Python mode, one would definitely have to do "some_simd = 
int[:,:](obj)" though.


>> # Slicing out new view of third row (in two ways)
>> y = x[2,:]
>> y = x[2,...]
> 
> Shouldn't there be one way to do it?

Sorry: "..." means ":" repeated "as many times as necessary" (including 
0 times). So if you have a 5D array,

x[2, ..., 2]

would mean the same as

x[2,:,:,:,2]

See NumPy docs for further questions here.

>> # Now, modifying y modifies what x points to too.
>> # Make a copy so that y points to seperate memory:
>> y = y.copy()
> 
> That's ok, assuming that the type specific behaviour of y is obvious.
> 
> 
>> # Set all entries in array 12
>> x[...] = 12
> 
> Now, *that* is a weird syntax. This is called broadcasting, right?

Well, I explained ... above, beyond that I can only say that it makes 
perfect sense when you actually use it. It is basically "SIMD assignment".

Broadcasting is something else, and is rules for what happens when you 
use SIMD operators or functions on arrays of different shape or 
dimensionality; this can be allowed in more situations by repeating the 
smaller array following certain rules. Seems wierd but *really*, 
*really* useful.

http://docs.scipy.org/doc/numpy/reference/ufuncs.html#broadcasting


>> # Set only first row to 10
>> x[0, :] = 10
> 
> I like that, although it conflicts with the above syntax for broadcasting.
> Given the example above, I would have expected
> 
>  x[0, ...] = 10

I hope the explanation of ... solved this.

> 
> 
>> # Some ways of multiplying all elements with 2
>> x *= 2
>> x[...] *= 2
>> x[:,:] *= 2
>> x += x
>> x[...] += x
> 
> Supporting this efficiently should be easy as it's a pure type feature.

Yes, but if we do the below it is essentially just a special case which 
it makes no sense to treat differently (scalars, like 2, are 
"broadcasted" up to entire arrays, then the rest is the same).

>> # A more complicated expression...allocates memory
>> x = stdmath.sqrt(x*x + x*(x+1)/(x+2))
>>
>> # A more complicated expression...overwrites existing
>> # memory
>> x[...] = stdmath.sqrt(x*x + x*(x+1)/(x+2))
> 
> Supporting this efficiently obviously requires compiler support. Otherwise,
> we'd always end up with a copy in between.

Yep, that's what NumPy does, and how I get my 3x speedup.

>> # Boolean operators
>> cdef bint[:,:] b # perhaps we could support 8-bit bool too
>> b = (x == 2)
>> # b is now an array the shape of x, containing True where x[i,j] == 2
> 
> That's awful, but I guess it gets better when you think about it long enough.

Again, extremely useful. To check for equality of two arrays, you do

np.any(x == y) # there is an element such that x[i] == y[i]

or

np.all(x == y) # for all i, x[i] == y[i]


-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Cython array type: Summary, introducing CEP 518

Reply via email to