Re: [Cython] Project: Buffer arithmetic operations

Dag Sverre Seljebotn Wed, 13 May 2009 11:33:27 -0700

Roland Schulz wrote:
> 
> 
> On Wed, May 13, 2009 at 3:16 AM, Dag Sverre Seljebotn 
> <[email protected] <mailto:[email protected]>> wrote:
> 
>      > I'm planning on making time this weekend to attend part of the
>      > upcoming sage days, and I definitely want to participate in the
>     coding
>      > sprint and get my hands dirty with some cython development.
>      >
>      > What I'd like to work on, which would be quite relevant in my
>     research
>      > code, is to write an optimization routine that unrolls numpy array
>      > expressions involving slices, array operators like +-/*, and scalar
>      > values into a set of nested loops that avoids the use of any
>     temporary
>      > array copies.  Thus something like:
>      >
>      > A[0, a:b] = B[0, a:b]*C[a:2*b:2] + D + n
>      >
>      > would get turned into some code that checked broadcasting
>     requirements
>      > and then into a loop that ran from a to b and looped over the
>      > appropriate vertices.   Thus we would basically replicate what
>     blitz++
>      > does with templates or fortran 90 does (so I've heard).  I've
>     sketched
>      > out an approach to actually coding it up and it seems doable for a
>      > sage days coding project.
> 
> 
> I think this would be very very useful. But it seems reinventing the 
> wheel to implement the actual vector math. It is not that trivial. Just 
> as one exmaple: sometimes intermediate buffers are faster so you need to 
> have a heuristic to decide that.
> Why not using a C++ template library for that? One could use Blitz++ but 
> I think Eigen (eigen.tuxfamily.org <http://eigen.tuxfamily.org>) is 
> newer and significant faster (e.g. automatically uses SSE). Since 
> everything is in a include files no external libary requirement is added.


Thanks for pitching in; that's some very interesting thoughts!

Here's my immediate reaction:
  - We should avoid reinventing the wheel at all costs. Heuristics for 
intermediate buffers in Cython is probably a no-go.
  - Still, relying explicitly on one specific C++ library makes me uneasy
  - So perhaps what is needed here is some kind of plugin system.

Getting the basics working (just unoptimized, "what you would have 
written in C") is more trivial, and would get us up fast and allow 
operation without an external library.

Then, if there was a good protocol for it, one would ideally be able to 
easily add support for other backends as well. (Which would all be 
required to have the same result, they would just differ in 
implementation detail.)

On one hand, one shouldn't over-design, while on the other hand, a 
vector math library is a fairly sophisticated thing and one could argue 
that those should be made swappable if possible.

>     A) There's a clash with existing semantics to be resolved. For instance
>     if you have a NumPy "matrix" instance, then * does matrix multiplication
>     instead, and just implementing this now would break code.
> 
>     http://wiki.cython.org/enhancements/buffersyntax
>     http://thread.gmane.org/gmane.comp.python.numeric.general/28439
> 
>     contains my thoughts on how this could be resolved. I'll use this syntax
>     below.
> 
> 
> I think this could solved by allowing to define custom mapping of 
> operators/methods of python types/classes to C functions. With that one 
> could define that for Numpy ndarray the element wise multiplication is 
> used and for NumPy matrix the matrix multiplication is used. And both 
> could be translated into calls of Eigen or some other vector library.
> 
> This would allow to directly transform normal NumPy, which tries to 
> avoid using loops, into highly optimized C with SSE without any 
> intermediate buffers.

Well, if you write

   def f(np.ndarray[int] arr):
     print arr * arr

   f(np.matrix(...)) # subclass, so allowed

then the Python version of that would be np.dot but the Cython version 
np.multiply.

The problem is that we really need to know the behaviour compile-time. 
Should we disallow subclasses when a buffer is declared perhaps?

Another perspective is that it is nice to be able to have a consistent 
way of working with any Python 3 buffer. This advantage is not obvious 
yet as there are very few such buffers, but I think in time Python might 
contain more and more of what NumPy does now (i.e. the functions in the 
math module might become "ufuncs" able to operate on any PEP 3118 buffer 
and so on).

But I'm definitely not certain here; more input welcome!

-- 
Dag Sverre
_______________________________________________
Cython-dev mailing list
[email protected]
http://codespeak.net/mailman/listinfo/cython-dev

Re: [Cython] Project: Buffer arithmetic operations

Reply via email to