Re: [Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview)

2012-02-20 Thread Francesc Alted
On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:

 On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
 Hi Dag,
 
 Would you mind elaborating a bit on that example you mentioned at the
 end of your email? I don't quite understand what behavior you would like
 to achieve
 
 Sure, see below. I think we should continue discussion on numpy-discuss.
 
 I wrote:
 
 You need at least a slightly different Python API to get anywhere, so
 numexpr/Theano is the right place to work on an implementation of this
 idea. Of course it would be nice if numexpr/Theano offered something as
 convenient as
 
 with lazy:
 arr = A + B + C # with all of these NumPy arrays
 # compute upon exiting...
 
 More information:
 
 The disadvantage today of using Theano (or numexpr) is that they require 
 using a different API, so that one has to learn and use Theano from the 
 ground up, rather than just slap it on in an optimization phase.
 
 The alternative would require extensive changes to NumPy, so I guess 
 Theano authors or Francesc would need to push for this.
 
 The alternative would be (with A, B, C ndarray instances):
 
 with theano.lazy:
 arr = A + B + C
 
 On __enter__, the context manager would hook into NumPy to override it's 
 arithmetic operators. Then it would build a Theano symbolic tree instead 
 of performing computations right away.
 
 In addition to providing support for overriding arithmetic operators, 
 slicing etc., it would be necesarry for arr to be an ndarray instance 
 which is not yet computed (data-pointer set to NULL, and store a 
 compute-me callback and some context information).
 
 Finally, the __exit__ would trigger computation. For other operations 
 which need the data pointer (e.g., single element lookup) one could 
 either raise an exception or trigger computation.
 
 This is just a rough sketch. It is not difficult in principle, but of 
 course there's really a massive amount of work involved to work support 
 for this into the NumPy APIs.
 
 Probably, we're talking a NumPy 3.0 thing, after the current round of 
 refactorings have settled...
 
 Please: Before discussing this further one should figure out if there's 
 manpower available for it; no sense in hashing out a castle in the sky 
 in details.

I see.  Mark Wiebe already suggested the same thing some time ago:

https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst

 Also it would be better to talk in person about this if 
 possible (I'm in Berkeley now and will attend PyData and PyCon).

Nice.  Most of Continuum crew (me included) will be attending to both 
conferences.  Mark W. will make PyCon only, but will be a good occasion to 
discuss this further.

See you,

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation (was: Proposed Rodmap Overview)

2012-02-20 Thread Olivier Delalleau
Never mind. The link Francesc posted answered my question :)

-=- Olivier

Le 20 février 2012 12:54, Olivier Delalleau delal...@iro.umontreal.ca a
écrit :

 Le 20 février 2012 12:46, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no
  a écrit :

 On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
  Hi Dag,
 
  Would you mind elaborating a bit on that example you mentioned at the
  end of your email? I don't quite understand what behavior you would like
  to achieve

 Sure, see below. I think we should continue discussion on numpy-discuss.

 I wrote:

  You need at least a slightly different Python API to get anywhere, so
  numexpr/Theano is the right place to work on an implementation of this
  idea. Of course it would be nice if numexpr/Theano offered something as
  convenient as
 
  with lazy:
   arr = A + B + C # with all of these NumPy arrays
  # compute upon exiting...

 More information:

 The disadvantage today of using Theano (or numexpr) is that they require
 using a different API, so that one has to learn and use Theano from the
 ground up, rather than just slap it on in an optimization phase.

 The alternative would require extensive changes to NumPy, so I guess
 Theano authors or Francesc would need to push for this.

 The alternative would be (with A, B, C ndarray instances):

 with theano.lazy:
 arr = A + B + C

 On __enter__, the context manager would hook into NumPy to override it's
 arithmetic operators. Then it would build a Theano symbolic tree instead
 of performing computations right away.

 In addition to providing support for overriding arithmetic operators,
 slicing etc., it would be necesarry for arr to be an ndarray instance
 which is not yet computed (data-pointer set to NULL, and store a
 compute-me callback and some context information).

 Finally, the __exit__ would trigger computation. For other operations
 which need the data pointer (e.g., single element lookup) one could
 either raise an exception or trigger computation.

 This is just a rough sketch. It is not difficult in principle, but of
 course there's really a massive amount of work involved to work support
 for this into the NumPy APIs.

 Probably, we're talking a NumPy 3.0 thing, after the current round of
 refactorings have settled...

 Please: Before discussing this further one should figure out if there's
 manpower available for it; no sense in hashing out a castle in the sky
 in details. Also it would be better to talk in person about this if
 possible (I'm in Berkeley now and will attend PyData and PyCon).

 Dag


 Thanks for the additional details.

 I feel like this must be a stupid question, but I have to ask: what is the
 point of being lazy here, since the computation is performed on exit anyway?

 -=- Olivier

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation

2012-02-20 Thread Dag Sverre Seljebotn
On 02/20/2012 10:04 AM, Francesc Alted wrote:
 On Feb 20, 2012, at 6:46 PM, Dag Sverre Seljebotn wrote:

 On 02/20/2012 09:24 AM, Olivier Delalleau wrote:
 Hi Dag,

 Would you mind elaborating a bit on that example you mentioned at the
 end of your email? I don't quite understand what behavior you would like
 to achieve

 Sure, see below. I think we should continue discussion on numpy-discuss.

 I wrote:

 You need at least a slightly different Python API to get anywhere, so
 numexpr/Theano is the right place to work on an implementation of this
 idea. Of course it would be nice if numexpr/Theano offered something as
 convenient as

 with lazy:
  arr = A + B + C # with all of these NumPy arrays
 # compute upon exiting...

 More information:

 The disadvantage today of using Theano (or numexpr) is that they require
 using a different API, so that one has to learn and use Theano from the
 ground up, rather than just slap it on in an optimization phase.

 The alternative would require extensive changes to NumPy, so I guess
 Theano authors or Francesc would need to push for this.

 The alternative would be (with A, B, C ndarray instances):

 with theano.lazy:
  arr = A + B + C

 On __enter__, the context manager would hook into NumPy to override it's
 arithmetic operators. Then it would build a Theano symbolic tree instead
 of performing computations right away.

 In addition to providing support for overriding arithmetic operators,
 slicing etc., it would be necesarry for arr to be an ndarray instance
 which is not yet computed (data-pointer set to NULL, and store a
 compute-me callback and some context information).

 Finally, the __exit__ would trigger computation. For other operations
 which need the data pointer (e.g., single element lookup) one could
 either raise an exception or trigger computation.

 This is just a rough sketch. It is not difficult in principle, but of
 course there's really a massive amount of work involved to work support
 for this into the NumPy APIs.

 Probably, we're talking a NumPy 3.0 thing, after the current round of
 refactorings have settled...

 Please: Before discussing this further one should figure out if there's
 manpower available for it; no sense in hashing out a castle in the sky
 in details.

 I see.  Mark Wiebe already suggested the same thing some time ago:

 https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst

Thanks, I didn't know about that (though I did really assume this was on 
Mark's radar already).


 Also it would be better to talk in person about this if
 possible (I'm in Berkeley now and will attend PyData and PyCon).

 Nice.  Most of Continuum crew (me included) will be attending to both 
 conferences.  Mark W. will make PyCon only, but will be a good occasion to 
 discuss this further.

I certainly don't think I have anything to add to this discussion beyond 
what Mark wrote up. But will be nice to meet up anyway.

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation

2012-02-20 Thread James Bergstra
On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted franc...@continuum.iowrote:

 On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
  You need at least a slightly different Python API to get anywhere, so
  numexpr/Theano is the right place to work on an implementation of this
  idea. Of course it would be nice if numexpr/Theano offered something as
  convenient as
 
  with lazy:
  arr = A + B + C # with all of these NumPy arrays
  # compute upon exiting…

 Hmm, that would be cute indeed.  Do you have an idea on how the code in
 the with context could be passed to the Python AST compiler (à la
 numexpr.evaluate(A + B + C))?


The biggest problem with the numexpr approach (e.g. evaluate(A + B + C))
whether the programmer has to type the quotes or not, is that the
sub-program has to be completely expressed in the sub-language.

If I write

 def f(x): return x[:3]
 numexpr.evaluate(A + B + f(C))

I would like that to be fast, but it's not obvious at all how that would
work. We would be asking numexpr to introspect arbitrary callable python
objects, and recompile arbitrary Python code, effectively setting up the
expectation in the user's mind that numexpr is re-implementing an entire
compiler. That can be fast obviously, but it seems to me to represent
significant departure from numpy's focus, which I always thought was the
data-container rather than the expression evaluation (though maybe this
firestorm of discussion is aimed at changing this?)

Theano went with another option which was to replace the A, B, and C
variables with objects that have a modified __add__. Theano's back-end can
be slow at times and the codebase can feel like a heavy dependency, but my
feeling is still that this is a great approach to getting really fast
implementations of compound expressions.

The context syntax you suggest using is a little ambiguous in that the
indented block of a with statement block includes *statements* whereas what
you mean to build in the indented block is a *single expression* graph.
 You could maybe get the right effect with something like

A, B, C = np.random.rand(3, 5)

expr = np.compound_expression()
with np.expression_builder(expr) as foo:
   arr = A + B + C
   brr = A + B * C
   foo.return((arr, brr))

# compute arr and brr as quickly as possible
a, b = expr.run()

# modify one of the arrays that the expression was compiled to use
A[:] += 1

# re-run the compiled expression on the new value
a, b = expr.run()

- JB

-- 
James Bergstra, Ph.D.
Research Scientist
Rowland Institute, Harvard University
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation

2012-02-20 Thread James Bergstra
On Mon, Feb 20, 2012 at 1:01 PM, James Bergstra james.bergs...@gmail.comwrote:

 On Mon, Feb 20, 2012 at 12:28 PM, Francesc Alted franc...@continuum.iowrote:

 On Feb 20, 2012, at 6:18 PM, Dag Sverre Seljebotn wrote:
  You need at least a slightly different Python API to get anywhere, so
  numexpr/Theano is the right place to work on an implementation of this
  idea. Of course it would be nice if numexpr/Theano offered something as
  convenient as
 
  with lazy:
  arr = A + B + C # with all of these NumPy arrays
  # compute upon exiting…

 Hmm, that would be cute indeed.  Do you have an idea on how the code in
 the with context could be passed to the Python AST compiler (à la
 numexpr.evaluate(A + B + C))?


 The biggest problem with the numexpr approach (e.g. evaluate(A + B + C))
 whether the programmer has to type the quotes or not, is that the
 sub-program has to be completely expressed in the sub-language.

 If I write

  def f(x): return x[:3]
  numexpr.evaluate(A + B + f(C))

 I would like that to be fast, but it's not obvious at all how that would
 work. We would be asking numexpr to introspect arbitrary callable python
 objects, and recompile arbitrary Python code, effectively setting up the
 expectation in the user's mind that numexpr is re-implementing an entire
 compiler. That can be fast obviously, but it seems to me to represent
 significant departure from numpy's focus, which I always thought was the
 data-container rather than the expression evaluation (though maybe this
 firestorm of discussion is aimed at changing this?)

 Theano went with another option which was to replace the A, B, and C
 variables with objects that have a modified __add__. Theano's back-end can
 be slow at times and the codebase can feel like a heavy dependency, but my
 feeling is still that this is a great approach to getting really fast
 implementations of compound expressions.

 The context syntax you suggest using is a little ambiguous in that the
 indented block of a with statement block includes *statements* whereas what
 you mean to build in the indented block is a *single expression* graph.
  You could maybe get the right effect with something like

 A, B, C = np.random.rand(3, 5)

 expr = np.compound_expression()
 with np.expression_builder(expr) as foo:
arr = A + B + C
brr = A + B * C
foo.return((arr, brr))

 # compute arr and brr as quickly as possible
 a, b = expr.run()

 # modify one of the arrays that the expression was compiled to use
 A[:] += 1

 # re-run the compiled expression on the new value
 a, b = expr.run()

 - JB


I should add that the biggest benefit of expressing things as compound
expressions in this way is not in saving temporaries (though that is nice)
it's being able to express enough computation work at a time that it
offsets the time required to ship the arguments off to a GPU for
evaluation!  This has been a *huge* win reaped by the Theano approach, it
works really well.  The abstraction boundary offered by this sort of
expression graph has been really effective.

This speaks even more to the importance of distinguishing between the data
container (e.g. numpy, Theano's internal ones, PyOpenCL's one, PyCUDA's
one) and the expression compilation and evaluation infrastructures (e.g.
Theano, numexpr, cython).  The goal should be as much as possible to
separate these two so that programs can be expressed in a natural way, and
then evaluated using containers that are suited to the program.

- JB

-- 
James Bergstra, Ph.D.
Research Scientist
Rowland Institute, Harvard University
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation

2012-02-20 Thread Lluís
James Bergstra writes:
[...]
 I should add that the biggest benefit of expressing things as compound
 expressions in this way is not in saving temporaries (though that is nice) 
 it's
 being able to express enough computation work at a time that it offsets the 
 time
 required to ship the arguments off to a GPU for evaluation!

Right, that's exacly what you need for an external computation to pay off.

Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do you
support any of these? (sorry for the made-up names)

* automatic transfer double-buffering

* automatic problem partitioning into domains (e.g., multiple GPUs; even better
  if also supports nodes - MPI -)

* point-specific computations (e.g., code dependant on the thread id, although
  this can also be expressed in other ways, like index ranges)

* point-relative computations (the most common would be a stencil)

If you have all of them, then I'd say the project has a huge potential for total
world dominance :)


Lluis

-- 
 And it's much the same thing with knowledge, for whenever you learn
 something new, the whole world becomes that much richer.
 -- The Princess of Pure Reason, as told by Norton Juster in The Phantom
 Tollbooth
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ndarray and lazy evaluation

2012-02-20 Thread James Bergstra
On Mon, Feb 20, 2012 at 2:57 PM, Lluís xscr...@gmx.net wrote:

 James Bergstra writes:
 [...]
  I should add that the biggest benefit of expressing things as compound
  expressions in this way is not in saving temporaries (though that is
 nice) it's
  being able to express enough computation work at a time that it offsets
 the time
  required to ship the arguments off to a GPU for evaluation!

 Right, that's exacly what you need for an external computation to pay
 off.

 Just out of curiosity (feel free to respond with a RTFM or a RTFP :)), do
 you
 support any of these? (sorry for the made-up names)

 * automatic transfer double-buffering


Not currently, but it would be quite straightforward to do it. Email
theano-dev and ask how if you really want to know.



 * automatic problem partitioning into domains (e.g., multiple GPUs; even
 better
  if also supports nodes - MPI -)


Not currently, and it would be hard.



 * point-specific computations (e.g., code dependant on the thread id,
 although
  this can also be expressed in other ways, like index ranges)


No.


 * point-relative computations (the most common would be a stencil)


No, but I think theano provides a decent expression language to tackle
this. The Composite element-wise code generator is an example of how I
would think about this. It provides point-relative computations across
several arguments.  You might want something different that applies a
stencil computation across one or several arguments... the scan operator
was another foray into this territory, and it got tricky when the stencil
operation could have side-effects (like random number generation) and could
define it's own input domain (stencil shape), but the result is quite
powerful.

-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion