Hoyt Koepke wrote:
> Hello Robert & Dag,
> 
> I'm sending this to you guys first; feel free to bump it to the whole
> list if you think it's appropriate.

Doing so; I think at least it should be available to Kurt as well.

> I'm planning on making time this weekend to attend part of the
> upcoming sage days, and I definitely want to participate in the coding
> sprint and get my hands dirty with some cython development.
> 
> What I'd like to work on, which would be quite relevant in my research
> code, is to write an optimization routine that unrolls numpy array
> expressions involving slices, array operators like +-/*, and scalar
> values into a set of nested loops that avoids the use of any temporary
> array copies.  Thus something like:
> 
> A[0, a:b] = B[0, a:b]*C[a:2*b:2] + D + n
> 
> would get turned into some code that checked broadcasting requirements
> and then into a loop that ran from a to b and looped over the
> appropriate vertices.   Thus we would basically replicate what blitz++
> does with templates or fortran 90 does (so I've heard).  I've sketched
> out an approach to actually coding it up and it seems doable for a
> sage days coding project.

Great that you want to do this :-)

I think doing it for a Sage days project is a bit ambitious, but you 
might be able to get a good start on it. It also depends on what route 
is taken, see below.

> I guess how I see this coming out then is that I'd create a routine
> that takes the cython AST for this expression and a few configuration
> parameters and returns the code for an inline function that accepts as
> parameters all the needed buffer parameters (data ptr, strides, shape,
> and slice parameters if there are some needed (e.g. a, b above),
> constants, etc. ),  and does the appropriate looping and copying.
> Part of the reason I want to do the inline function approach is that I
> want this code to be minimally wrapped up in cython's internals, as a
> tested routine to generate such code might be more generally
> applicable.  Though if there's a stronger argument in favor of the
> other way, that's okay too.

I have given this a lot of thought as well. My thoughts diverge from 
yours in that I want to improve Cython numerics in lot of areas, and 
that the above would come more as a natural consequence than the first 
step. What I'll scetch below is thus a more "patient" route.

My experience is that I'm not sure how good results one can get if one 
is set from the beginning on getting tangible results during a 
conference. A concrete goal like the above seems great for things like a 
GSoC going over one or two months, but perhaps not a week -- the danger 
is that one skips important parts of the design etc. just in order to 
get some example working, but in the end nothing but the example 
works.... (I hope I don't kill your enthusiasm off!, I just want the 
perspective noted :-) ).

Also I should not that I don't have much time the rest of this week to 
be involved in designing this.

Anyway, here's my thoughts. It's pretty much designed to give the 
feeling of "oh no not all this, I just want to do something simple and 
get this done" -- that's OK, just read it and write your rejoinder :-)

A) There's a clash with existing semantics to be resolved. For instance 
if you have a NumPy "matrix" instance, then * does matrix multiplication 
instead, and just implementing this now would break code.

http://wiki.cython.org/enhancements/buffersyntax
http://thread.gmane.org/gmane.comp.python.numeric.general/28439

contains my thoughts on how this could be resolved. I'll use this syntax 
below.

B) I think the feature could be broken down in smaller parts:

1) Efficient slicing (#178):

cdef int[:,:] a
cdef int[:] b = a[3,:]

perhaps also broadcasting?

cdef int[:,:,:] c = a # ?

2) Efficient ndenumerate for-loop construct. This is the "engine" of 
what you want to achieve.

from cython import ndenumerate
cdef int[:,:,:] a = ...
cdef int[:,:,:,:] b = ...

# a[...] = 2 * b[3,...]
for idx, value in ndenumerate(a, b[3,...]):
     a[idx] = 2 * b[3, idx]

Or, perhaps

for a_item, b_item in ndenumerate(a, b[3,...]):
     a_item.set(2*b_item.get())

or similar. The code making sure that the dimension with the smallest 
strides is iterated innermost would be generated here (likely one would 
have a while loop around a for loop; no matter how many dimensions there 
are; see also NumPy's ND-generic iterators... we could also allow this 
to work for "cdef int[...] a", i.e. no declared number of dimensions).

3) Transform "a[...] = 2 * b[3, ...]" into the construct in 2). This is 
easily done using a transform once 2) is done.

3) Ufuncs. I was imagining it like this:

cdef int multiply(int x, int y): return x * y

...
cdef int[:,:] a = ..., b = ..
a[...] = multiply(a, b) # Calls for each item!

So basically "every function with primitive arguments is an ufunc". 
(Fortran has a similar concept, but you need to declare that the 
function has no side-effects -- I say let's just make N invocations part 
of the spec of how this works.)

Obviously this is very easily implemented once 2) is in place.

4) New buffer results (you didn't mention this). Most of this will 
likely be done in Kurt's GSoC. Basically allow Cython to allocate new 
memory, so that you can do

a = 2 * b[3,...]

that is, allocate a new memory area rather than overwrite what is 
already allocated in a.


So..where does this leave your Sage project? Of course you don't need to 
heed this, your patch might get accepted anyway :-) but what I'd suggest 
is to have a crack at doing 2). This can also be done without having a 
new syntax as supporting ndenumerate for the current arrays wouldn't 
hurt at all. If you agree we can move to discussing 2) in more detail.

(BTW if you find time, it being a Sage conference after all, to make 
Sage play more nicely with NumPy so that NumPy would be somewhat usable 
without reassigning Integer and RealNumber I'd be very grateful :-) ).

> Next, what requirements and considerations should I keep in mind when
> designing such a routine?  Here's what I have so far.

Hmm, let's see if I covered it.

> 
> 1.  Since for buffers, the number of dimensions and things like
> contiguity are known at compile time, I want to have these things hard
> coded in.
> 2.  The looping should be ordered so that the smallest dimension is
> copied first (which would have to be determined at runtime, with the
> possible simplification of contiguous arrays).

These two apply directly to 2) above.

> 3.  Ufuncs like exp, cos, sin, etc.

3)

> 3.  Considerations from the current GsoC work ??????

Kurt might change how buffers will be stored in the local scope, but 
nothing very significant. You might want to read over

http://trac.cython.org/cython_trac/query?status=new&status=assigned&status=reopened&order=priority&keywords=~kurtgsoc

(If that is wrapped: Just hit "candidate tickets for Kurt's GSoC" on the 
trac front page)

> 4.  Type escalation stuff.
> 
> Finally, down the line, I'd also like to support functions that do
> reductions like A.sum(axis=0) or A.sum().

I haven't thought much about this. Assuming a new syntax so that we 
don't have to keep NumPy semantics, I think we don't support methods for 
this, but add a decorator on functions:

@cython.buffer.reduction
cdef sum[T](T a, T b): # using future templates as well
     return a + b

and the reduction decorator would add arguments like "axis" etc. and 
just do the right thing.

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Reply via email to