[Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
Hey,

Another discussion on lazy evaluation, given the recent activity here:
https://github.com/ContinuumIO/numba/pull/6#issuecomment-6117091
A somewhat recent previous thread can be found here:
http://mail.scipy.org/pipermail/numpy-discussion/2012-February/060862.html
, and a NEP here:
https://github.com/numpy/numpy/blob/master/doc/neps/deferred-ufunc-evaluation.rst

I think trying to parse bytecode and build an expression graph for
array expressions from that has disadvantages and is harder in
general. For instance it won't be able to deal with branching at
execution time, and things like inter-procedural analysis will be
harder (not to mention you'd have to parse dtype creation). Instead,
what you really want to do is hook into a lazy evaluating version of
numpy, and generate your own code from the operations it records.

It would be great if we implement the NEP listed above, but with a few
extensions. I think Numpy should handle the lazy evaluation part, and
determine when expressions should be evaluated, etc. However, for each
user operation, Numpy will call back a user-installed hook
implementing some interface, to allow various packages to provide
their own hooks to evaluate vector operations however they want. This
will include packages such as Theano, which could run things on the
GPU, Numexpr, and in the future
https://github.com/markflorisson88/minivect (which will likely have an
LLVM backend in the future, and possibly integrated with Numba to
allow inlining of numba ufuncs). The project above tries to bring
together all the different array expression compilers together in a
single framework, to provide efficient array expressions specialized
for any data layout (nditer on steroids if you will, with SIMD,
threaded and inlining capabilities).

We could allow each hook to specify which dtypes it supports, and a
minimal data size needed before it should be invoked (to avoid
overhead for small arrays, like the openmp 'if' clause). If an
operation is not supported, it will simply raise NotImplementedError,
which means Numpy will evaluate the expression built so far and run
its own implementation, resulting in a non-lazy array. E.g. if a
library supports adding things together, but doesn't support the 'sin'
function, np.sin(a + b) will result in the library executing a + b,
and numpy evaluating sin on the result. So the idea is that the numpy
lazy array will wrap an expression graph, which is built when the user
performs operations and evaluated when needed (when a result is
required or when someone tells numpy to evaluate all lazy arrays).
Numpy will simply use the first hook willing to operate on data of the
specified size and dtype, and will keep using that hook to build the
expression until evaluated.

Anyway, this is somewhat of a high-level overview. If there is any
interest, we can flesh out the details and extend the NEP.

Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Nathaniel Smith
On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
 wrote:
> It would be great if we implement the NEP listed above, but with a few
> extensions. I think Numpy should handle the lazy evaluation part, and
> determine when expressions should be evaluated, etc. However, for each
> user operation, Numpy will call back a user-installed hook
> implementing some interface, to allow various packages to provide
> their own hooks to evaluate vector operations however they want. This
> will include packages such as Theano, which could run things on the
> GPU, Numexpr, and in the future
> https://github.com/markflorisson88/minivect (which will likely have an
> LLVM backend in the future, and possibly integrated with Numba to
> allow inlining of numba ufuncs). The project above tries to bring
> together all the different array expression compilers together in a
> single framework, to provide efficient array expressions specialized
> for any data layout (nditer on steroids if you will, with SIMD,
> threaded and inlining capabilities).

A global hook sounds ugly and hard to control -- it's hard to tell
which operations should be deferred and which should be forced, etc.
While it would be less magical, I think a more explicit API would in
the end be easier to use... something like

  a, b, c, d = deferred([a, b, c, d])
  e = a + b * c  # 'e' is a deferred object too
  f = np.dot(e, d)  # so is 'f'
  g = force(f)  # 'g' is an ndarray
  # or
  force(f, out=g)

But at that point, this could easily be an external library, right?
All we'd need from numpy would be some way for external types to
override the evaluation of ufuncs, np.dot, etc.? We've recently seen
several reasons to want that functionality, and it seems like
developing these "improved numexpr" ideas would be much easier if they
didn't require doing deep surgery to numpy itself...

-N
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 14:58, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>  wrote:
>> It would be great if we implement the NEP listed above, but with a few
>> extensions. I think Numpy should handle the lazy evaluation part, and
>> determine when expressions should be evaluated, etc. However, for each
>> user operation, Numpy will call back a user-installed hook
>> implementing some interface, to allow various packages to provide
>> their own hooks to evaluate vector operations however they want. This
>> will include packages such as Theano, which could run things on the
>> GPU, Numexpr, and in the future
>> https://github.com/markflorisson88/minivect (which will likely have an
>> LLVM backend in the future, and possibly integrated with Numba to
>> allow inlining of numba ufuncs). The project above tries to bring
>> together all the different array expression compilers together in a
>> single framework, to provide efficient array expressions specialized
>> for any data layout (nditer on steroids if you will, with SIMD,
>> threaded and inlining capabilities).
>
> A global hook sounds ugly and hard to control -- it's hard to tell
> which operations should be deferred and which should be forced, etc.

Yes, but for the user the difference should not be visible (unless
operations can raise exceptions, in which case you choose the safe
path, or let the user configure what to do).

> While it would be less magical, I think a more explicit API would in
> the end be easier to use... something like
>
>  a, b, c, d = deferred([a, b, c, d])
>  e = a + b * c  # 'e' is a deferred object too
>  f = np.dot(e, d)  # so is 'f'
>  g = force(f)  # 'g' is an ndarray
>  # or
>  force(f, out=g)
>
> But at that point, this could easily be an external library, right?
> All we'd need from numpy would be some way for external types to
> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
> several reasons to want that functionality, and it seems like
> developing these "improved numexpr" ideas would be much easier if they
> didn't require doing deep surgery to numpy itself...

Definitely, but besides monkey-patch-chaining I think some
modifications would be required, but they would be reasonably simple.
Most of the functionality would be handled in one function, which most
ufuncs (the ones you care about, as well as ufunc (methods) like add)
call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
, which is inserted after argument unpacking and sanity checking. You
could also do a per-module hook, and have the function look at
sys._getframe(1).f_globals, but that is fragile and won't work from C
or Cython code.

How did you have overrides in mind? I also found this thread:
http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
, but I think you want more than just to override ufuncs, you want
numpy to govern when stuff is allowed to be lazy and when stuff should
be evaluated (e.g. when it is indexed, slice assigned (although that
itself may also be lazy), etc). You don't want some funny object back
that doesn't work with things which are not overridden in numpy.

> -N
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Nathaniel Smith
On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
 wrote:
> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>  wrote:
>>> It would be great if we implement the NEP listed above, but with a few
>>> extensions. I think Numpy should handle the lazy evaluation part, and
>>> determine when expressions should be evaluated, etc. However, for each
>>> user operation, Numpy will call back a user-installed hook
>>> implementing some interface, to allow various packages to provide
>>> their own hooks to evaluate vector operations however they want. This
>>> will include packages such as Theano, which could run things on the
>>> GPU, Numexpr, and in the future
>>> https://github.com/markflorisson88/minivect (which will likely have an
>>> LLVM backend in the future, and possibly integrated with Numba to
>>> allow inlining of numba ufuncs). The project above tries to bring
>>> together all the different array expression compilers together in a
>>> single framework, to provide efficient array expressions specialized
>>> for any data layout (nditer on steroids if you will, with SIMD,
>>> threaded and inlining capabilities).
>>
>> A global hook sounds ugly and hard to control -- it's hard to tell
>> which operations should be deferred and which should be forced, etc.
>
> Yes, but for the user the difference should not be visible (unless
> operations can raise exceptions, in which case you choose the safe
> path, or let the user configure what to do).
>
>> While it would be less magical, I think a more explicit API would in
>> the end be easier to use... something like
>>
>>  a, b, c, d = deferred([a, b, c, d])
>>  e = a + b * c  # 'e' is a deferred object too
>>  f = np.dot(e, d)  # so is 'f'
>>  g = force(f)  # 'g' is an ndarray
>>  # or
>>  force(f, out=g)
>>
>> But at that point, this could easily be an external library, right?
>> All we'd need from numpy would be some way for external types to
>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>> several reasons to want that functionality, and it seems like
>> developing these "improved numexpr" ideas would be much easier if they
>> didn't require doing deep surgery to numpy itself...
>
> Definitely, but besides monkey-patch-chaining I think some
> modifications would be required, but they would be reasonably simple.
> Most of the functionality would be handled in one function, which most
> ufuncs (the ones you care about, as well as ufunc (methods) like add)
> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
> , which is inserted after argument unpacking and sanity checking. You
> could also do a per-module hook, and have the function look at
> sys._getframe(1).f_globals, but that is fragile and won't work from C
> or Cython code.
>
> How did you have overrides in mind?

My vague idea is that core numpy operations are about as fundamental
for scientific users as the Python builtin operations are, so they
should probably be overrideable in a similar way. So we'd teach numpy
functions to check for methods named like "__numpy_ufunc__" or
"__numpy_dot__" and let themselves be overridden if found. Like how
__gt__ and __add__ and stuff work. Or something along those lines.

> I also found this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
> , but I think you want more than just to override ufuncs, you want
> numpy to govern when stuff is allowed to be lazy and when stuff should
> be evaluated (e.g. when it is indexed, slice assigned (although that
> itself may also be lazy), etc). You don't want some funny object back
> that doesn't work with things which are not overridden in numpy.

My point is that probably numpy should *not* govern the decision about
what stuff should be lazy and what should be evaluated; that should be
governed by some combination of the user and
Numba/Theano/minivect/whatever. The toy API I sketched out would make
those decisions obvious and explicit. (And if the funny objects had an
__array_interface__ attribute that automatically forced evaluation
when accessed, then they'd work fine with code that was expecting an
array, or if they were assigned to a "real" ndarray, etc.)

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Neal Becker
Would lazy eval be able to eliminate temps in doing operations such as:

np.sum (u != 23)?

That is, now ops involving selecting elements of matrixes are often performed 
by 
first constructing temp matrixes, and the operating on them.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 18:21, Neal Becker  wrote:
> Would lazy eval be able to eliminate temps in doing operations such as:
>
> np.sum (u != 23)?
>
> That is, now ops involving selecting elements of matrixes are often performed 
> by
> first constructing temp matrixes, and the operating on them.
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Sure, yeah, it's pretty easy to generate a loop with an if statement
and a reduction.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 17:38, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>  wrote:
>> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>>  wrote:
 It would be great if we implement the NEP listed above, but with a few
 extensions. I think Numpy should handle the lazy evaluation part, and
 determine when expressions should be evaluated, etc. However, for each
 user operation, Numpy will call back a user-installed hook
 implementing some interface, to allow various packages to provide
 their own hooks to evaluate vector operations however they want. This
 will include packages such as Theano, which could run things on the
 GPU, Numexpr, and in the future
 https://github.com/markflorisson88/minivect (which will likely have an
 LLVM backend in the future, and possibly integrated with Numba to
 allow inlining of numba ufuncs). The project above tries to bring
 together all the different array expression compilers together in a
 single framework, to provide efficient array expressions specialized
 for any data layout (nditer on steroids if you will, with SIMD,
 threaded and inlining capabilities).
>>>
>>> A global hook sounds ugly and hard to control -- it's hard to tell
>>> which operations should be deferred and which should be forced, etc.
>>
>> Yes, but for the user the difference should not be visible (unless
>> operations can raise exceptions, in which case you choose the safe
>> path, or let the user configure what to do).
>>
>>> While it would be less magical, I think a more explicit API would in
>>> the end be easier to use... something like
>>>
>>>  a, b, c, d = deferred([a, b, c, d])
>>>  e = a + b * c  # 'e' is a deferred object too
>>>  f = np.dot(e, d)  # so is 'f'
>>>  g = force(f)  # 'g' is an ndarray
>>>  # or
>>>  force(f, out=g)
>>>
>>> But at that point, this could easily be an external library, right?
>>> All we'd need from numpy would be some way for external types to
>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>>> several reasons to want that functionality, and it seems like
>>> developing these "improved numexpr" ideas would be much easier if they
>>> didn't require doing deep surgery to numpy itself...
>>
>> Definitely, but besides monkey-patch-chaining I think some
>> modifications would be required, but they would be reasonably simple.
>> Most of the functionality would be handled in one function, which most
>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>> , which is inserted after argument unpacking and sanity checking. You
>> could also do a per-module hook, and have the function look at
>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>> or Cython code.
>>
>> How did you have overrides in mind?
>
> My vague idea is that core numpy operations are about as fundamental
> for scientific users as the Python builtin operations are, so they
> should probably be overrideable in a similar way. So we'd teach numpy
> functions to check for methods named like "__numpy_ufunc__" or
> "__numpy_dot__" and let themselves be overridden if found. Like how
> __gt__ and __add__ and stuff work. Or something along those lines.
>
>> I also found this thread:
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>> , but I think you want more than just to override ufuncs, you want
>> numpy to govern when stuff is allowed to be lazy and when stuff should
>> be evaluated (e.g. when it is indexed, slice assigned (although that
>> itself may also be lazy), etc). You don't want some funny object back
>> that doesn't work with things which are not overridden in numpy.
>
> My point is that probably numpy should *not* govern the decision about
> what stuff should be lazy and what should be evaluated; that should be
> governed by some combination of the user and
> Numba/Theano/minivect/whatever. The toy API I sketched out would make
> those decisions obvious and explicit. (And if the funny objects had an
> __array_interface__ attribute that automatically forced evaluation
> when accessed, then they'd work fine with code that was expecting an
> array, or if they were assigned to a "real" ndarray, etc.)

That's disappointing though, since the performance drawbacks can
severely limit the usefulness for people with big data sets. Ideally,
you would take your intuitive numpy code, and make it go fast, without
jumping through hoops. Numpypy has lazy evaluation,  I don't know how
good a job it does, but it does mean you can finally get fast numpy
code in an intuitive way (and even run it on a GPU if that is possible
and beneficial).

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
__

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Nathaniel Smith
On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
 wrote:
> On 5 June 2012 17:38, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>>  wrote:
>>> On 5 June 2012 14:58, Nathaniel Smith  wrote:
 On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
  wrote:
> It would be great if we implement the NEP listed above, but with a few
> extensions. I think Numpy should handle the lazy evaluation part, and
> determine when expressions should be evaluated, etc. However, for each
> user operation, Numpy will call back a user-installed hook
> implementing some interface, to allow various packages to provide
> their own hooks to evaluate vector operations however they want. This
> will include packages such as Theano, which could run things on the
> GPU, Numexpr, and in the future
> https://github.com/markflorisson88/minivect (which will likely have an
> LLVM backend in the future, and possibly integrated with Numba to
> allow inlining of numba ufuncs). The project above tries to bring
> together all the different array expression compilers together in a
> single framework, to provide efficient array expressions specialized
> for any data layout (nditer on steroids if you will, with SIMD,
> threaded and inlining capabilities).

 A global hook sounds ugly and hard to control -- it's hard to tell
 which operations should be deferred and which should be forced, etc.
>>>
>>> Yes, but for the user the difference should not be visible (unless
>>> operations can raise exceptions, in which case you choose the safe
>>> path, or let the user configure what to do).
>>>
 While it would be less magical, I think a more explicit API would in
 the end be easier to use... something like

  a, b, c, d = deferred([a, b, c, d])
  e = a + b * c  # 'e' is a deferred object too
  f = np.dot(e, d)  # so is 'f'
  g = force(f)  # 'g' is an ndarray
  # or
  force(f, out=g)

 But at that point, this could easily be an external library, right?
 All we'd need from numpy would be some way for external types to
 override the evaluation of ufuncs, np.dot, etc.? We've recently seen
 several reasons to want that functionality, and it seems like
 developing these "improved numexpr" ideas would be much easier if they
 didn't require doing deep surgery to numpy itself...
>>>
>>> Definitely, but besides monkey-patch-chaining I think some
>>> modifications would be required, but they would be reasonably simple.
>>> Most of the functionality would be handled in one function, which most
>>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>>> , which is inserted after argument unpacking and sanity checking. You
>>> could also do a per-module hook, and have the function look at
>>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>>> or Cython code.
>>>
>>> How did you have overrides in mind?
>>
>> My vague idea is that core numpy operations are about as fundamental
>> for scientific users as the Python builtin operations are, so they
>> should probably be overrideable in a similar way. So we'd teach numpy
>> functions to check for methods named like "__numpy_ufunc__" or
>> "__numpy_dot__" and let themselves be overridden if found. Like how
>> __gt__ and __add__ and stuff work. Or something along those lines.
>>
>>> I also found this thread:
>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>>> , but I think you want more than just to override ufuncs, you want
>>> numpy to govern when stuff is allowed to be lazy and when stuff should
>>> be evaluated (e.g. when it is indexed, slice assigned (although that
>>> itself may also be lazy), etc). You don't want some funny object back
>>> that doesn't work with things which are not overridden in numpy.
>>
>> My point is that probably numpy should *not* govern the decision about
>> what stuff should be lazy and what should be evaluated; that should be
>> governed by some combination of the user and
>> Numba/Theano/minivect/whatever. The toy API I sketched out would make
>> those decisions obvious and explicit. (And if the funny objects had an
>> __array_interface__ attribute that automatically forced evaluation
>> when accessed, then they'd work fine with code that was expecting an
>> array, or if they were assigned to a "real" ndarray, etc.)
>
> That's disappointing though, since the performance drawbacks can
> severely limit the usefulness for people with big data sets. Ideally,
> you would take your intuitive numpy code, and make it go fast, without
> jumping through hoops. Numpypy has lazy evaluation,  I don't know how
> good a job it does, but it does mean you can finally get fast numpy
> code in an intuitive way (and even run it on a GPU if that is possible
> and beneficial).

All of these proposals require the user 

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 20:17, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>  wrote:
>> On 5 June 2012 17:38, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>>>  wrote:
 On 5 June 2012 14:58, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>  wrote:
>> It would be great if we implement the NEP listed above, but with a few
>> extensions. I think Numpy should handle the lazy evaluation part, and
>> determine when expressions should be evaluated, etc. However, for each
>> user operation, Numpy will call back a user-installed hook
>> implementing some interface, to allow various packages to provide
>> their own hooks to evaluate vector operations however they want. This
>> will include packages such as Theano, which could run things on the
>> GPU, Numexpr, and in the future
>> https://github.com/markflorisson88/minivect (which will likely have an
>> LLVM backend in the future, and possibly integrated with Numba to
>> allow inlining of numba ufuncs). The project above tries to bring
>> together all the different array expression compilers together in a
>> single framework, to provide efficient array expressions specialized
>> for any data layout (nditer on steroids if you will, with SIMD,
>> threaded and inlining capabilities).
>
> A global hook sounds ugly and hard to control -- it's hard to tell
> which operations should be deferred and which should be forced, etc.

 Yes, but for the user the difference should not be visible (unless
 operations can raise exceptions, in which case you choose the safe
 path, or let the user configure what to do).

> While it would be less magical, I think a more explicit API would in
> the end be easier to use... something like
>
>  a, b, c, d = deferred([a, b, c, d])
>  e = a + b * c  # 'e' is a deferred object too
>  f = np.dot(e, d)  # so is 'f'
>  g = force(f)  # 'g' is an ndarray
>  # or
>  force(f, out=g)
>
> But at that point, this could easily be an external library, right?
> All we'd need from numpy would be some way for external types to
> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
> several reasons to want that functionality, and it seems like
> developing these "improved numexpr" ideas would be much easier if they
> didn't require doing deep surgery to numpy itself...

 Definitely, but besides monkey-patch-chaining I think some
 modifications would be required, but they would be reasonably simple.
 Most of the functionality would be handled in one function, which most
 ufuncs (the ones you care about, as well as ufunc (methods) like add)
 call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
 , which is inserted after argument unpacking and sanity checking. You
 could also do a per-module hook, and have the function look at
 sys._getframe(1).f_globals, but that is fragile and won't work from C
 or Cython code.

 How did you have overrides in mind?
>>>
>>> My vague idea is that core numpy operations are about as fundamental
>>> for scientific users as the Python builtin operations are, so they
>>> should probably be overrideable in a similar way. So we'd teach numpy
>>> functions to check for methods named like "__numpy_ufunc__" or
>>> "__numpy_dot__" and let themselves be overridden if found. Like how
>>> __gt__ and __add__ and stuff work. Or something along those lines.
>>>
 I also found this thread:
 http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
 , but I think you want more than just to override ufuncs, you want
 numpy to govern when stuff is allowed to be lazy and when stuff should
 be evaluated (e.g. when it is indexed, slice assigned (although that
 itself may also be lazy), etc). You don't want some funny object back
 that doesn't work with things which are not overridden in numpy.
>>>
>>> My point is that probably numpy should *not* govern the decision about
>>> what stuff should be lazy and what should be evaluated; that should be
>>> governed by some combination of the user and
>>> Numba/Theano/minivect/whatever. The toy API I sketched out would make
>>> those decisions obvious and explicit. (And if the funny objects had an
>>> __array_interface__ attribute that automatically forced evaluation
>>> when accessed, then they'd work fine with code that was expecting an
>>> array, or if they were assigned to a "real" ndarray, etc.)
>>
>> That's disappointing though, since the performance drawbacks can
>> severely limit the usefulness for people with big data sets. Ideally,
>> you would take your intuitive numpy code, and make it go fast, without
>> jumping through hoops. Numpypy has lazy evaluation,  I don't know how
>> good a job it does, but it does mean you can finally get fast nu

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Nathaniel Smith
On Tue, Jun 5, 2012 at 9:47 PM, mark florisson
 wrote:
> On 5 June 2012 20:17, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>  wrote:
>>> On 5 June 2012 17:38, Nathaniel Smith  wrote:
 On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
  wrote:
> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>  wrote:
>>> It would be great if we implement the NEP listed above, but with a few
>>> extensions. I think Numpy should handle the lazy evaluation part, and
>>> determine when expressions should be evaluated, etc. However, for each
>>> user operation, Numpy will call back a user-installed hook
>>> implementing some interface, to allow various packages to provide
>>> their own hooks to evaluate vector operations however they want. This
>>> will include packages such as Theano, which could run things on the
>>> GPU, Numexpr, and in the future
>>> https://github.com/markflorisson88/minivect (which will likely have an
>>> LLVM backend in the future, and possibly integrated with Numba to
>>> allow inlining of numba ufuncs). The project above tries to bring
>>> together all the different array expression compilers together in a
>>> single framework, to provide efficient array expressions specialized
>>> for any data layout (nditer on steroids if you will, with SIMD,
>>> threaded and inlining capabilities).
>>
>> A global hook sounds ugly and hard to control -- it's hard to tell
>> which operations should be deferred and which should be forced, etc.
>
> Yes, but for the user the difference should not be visible (unless
> operations can raise exceptions, in which case you choose the safe
> path, or let the user configure what to do).
>
>> While it would be less magical, I think a more explicit API would in
>> the end be easier to use... something like
>>
>>  a, b, c, d = deferred([a, b, c, d])
>>  e = a + b * c  # 'e' is a deferred object too
>>  f = np.dot(e, d)  # so is 'f'
>>  g = force(f)  # 'g' is an ndarray
>>  # or
>>  force(f, out=g)
>>
>> But at that point, this could easily be an external library, right?
>> All we'd need from numpy would be some way for external types to
>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>> several reasons to want that functionality, and it seems like
>> developing these "improved numexpr" ideas would be much easier if they
>> didn't require doing deep surgery to numpy itself...
>
> Definitely, but besides monkey-patch-chaining I think some
> modifications would be required, but they would be reasonably simple.
> Most of the functionality would be handled in one function, which most
> ufuncs (the ones you care about, as well as ufunc (methods) like add)
> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
> , which is inserted after argument unpacking and sanity checking. You
> could also do a per-module hook, and have the function look at
> sys._getframe(1).f_globals, but that is fragile and won't work from C
> or Cython code.
>
> How did you have overrides in mind?

 My vague idea is that core numpy operations are about as fundamental
 for scientific users as the Python builtin operations are, so they
 should probably be overrideable in a similar way. So we'd teach numpy
 functions to check for methods named like "__numpy_ufunc__" or
 "__numpy_dot__" and let themselves be overridden if found. Like how
 __gt__ and __add__ and stuff work. Or something along those lines.

> I also found this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
> , but I think you want more than just to override ufuncs, you want
> numpy to govern when stuff is allowed to be lazy and when stuff should
> be evaluated (e.g. when it is indexed, slice assigned (although that
> itself may also be lazy), etc). You don't want some funny object back
> that doesn't work with things which are not overridden in numpy.

 My point is that probably numpy should *not* govern the decision about
 what stuff should be lazy and what should be evaluated; that should be
 governed by some combination of the user and
 Numba/Theano/minivect/whatever. The toy API I sketched out would make
 those decisions obvious and explicit. (And if the funny objects had an
 __array_interface__ attribute that automatically forced evaluation
 when accessed, then they'd work fine with code that was expecting an
 array, or if they were assigned to a "real" ndarray, etc.)
>>>
>>> That's disappointing though, since the performance drawbacks can
>>> severely limit the usefulness for people with big data sets. Ideally,
>>> you would take your intuitive numpy code, and make it go fast, with

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread Dag Sverre Seljebotn
On 06/05/2012 10:47 PM, mark florisson wrote:
> On 5 June 2012 20:17, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>   wrote:
>>> On 5 June 2012 17:38, Nathaniel Smith  wrote:
 On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
   wrote:
> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>   wrote:
>>> It would be great if we implement the NEP listed above, but with a few
>>> extensions. I think Numpy should handle the lazy evaluation part, and
>>> determine when expressions should be evaluated, etc. However, for each
>>> user operation, Numpy will call back a user-installed hook
>>> implementing some interface, to allow various packages to provide
>>> their own hooks to evaluate vector operations however they want. This
>>> will include packages such as Theano, which could run things on the
>>> GPU, Numexpr, and in the future
>>> https://github.com/markflorisson88/minivect (which will likely have an
>>> LLVM backend in the future, and possibly integrated with Numba to
>>> allow inlining of numba ufuncs). The project above tries to bring
>>> together all the different array expression compilers together in a
>>> single framework, to provide efficient array expressions specialized
>>> for any data layout (nditer on steroids if you will, with SIMD,
>>> threaded and inlining capabilities).
>>
>> A global hook sounds ugly and hard to control -- it's hard to tell
>> which operations should be deferred and which should be forced, etc.
>
> Yes, but for the user the difference should not be visible (unless
> operations can raise exceptions, in which case you choose the safe
> path, or let the user configure what to do).
>
>> While it would be less magical, I think a more explicit API would in
>> the end be easier to use... something like
>>
>>   a, b, c, d = deferred([a, b, c, d])
>>   e = a + b * c  # 'e' is a deferred object too
>>   f = np.dot(e, d)  # so is 'f'
>>   g = force(f)  # 'g' is an ndarray
>>   # or
>>   force(f, out=g)
>>
>> But at that point, this could easily be an external library, right?
>> All we'd need from numpy would be some way for external types to
>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>> several reasons to want that functionality, and it seems like
>> developing these "improved numexpr" ideas would be much easier if they
>> didn't require doing deep surgery to numpy itself...
>
> Definitely, but besides monkey-patch-chaining I think some
> modifications would be required, but they would be reasonably simple.
> Most of the functionality would be handled in one function, which most
> ufuncs (the ones you care about, as well as ufunc (methods) like add)
> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
> , which is inserted after argument unpacking and sanity checking. You
> could also do a per-module hook, and have the function look at
> sys._getframe(1).f_globals, but that is fragile and won't work from C
> or Cython code.
>
> How did you have overrides in mind?

 My vague idea is that core numpy operations are about as fundamental
 for scientific users as the Python builtin operations are, so they
 should probably be overrideable in a similar way. So we'd teach numpy
 functions to check for methods named like "__numpy_ufunc__" or
 "__numpy_dot__" and let themselves be overridden if found. Like how
 __gt__ and __add__ and stuff work. Or something along those lines.

> I also found this thread:
> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
> , but I think you want more than just to override ufuncs, you want
> numpy to govern when stuff is allowed to be lazy and when stuff should
> be evaluated (e.g. when it is indexed, slice assigned (although that
> itself may also be lazy), etc). You don't want some funny object back
> that doesn't work with things which are not overridden in numpy.

 My point is that probably numpy should *not* govern the decision about
 what stuff should be lazy and what should be evaluated; that should be
 governed by some combination of the user and
 Numba/Theano/minivect/whatever. The toy API I sketched out would make
 those decisions obvious and explicit. (And if the funny objects had an
 __array_interface__ attribute that automatically forced evaluation
 when accessed, then they'd work fine with code that was expecting an
 array, or if they were assigned to a "real" ndarray, etc.)
>>>
>>> That's disappointing though, since the performance drawbacks can
>>> severely limit the usefulness for people with big data sets. Ideally,
>>> you would take your intuitive numpy code, and make it go fast, with

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 22:29, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 9:47 PM, mark florisson
>  wrote:
>> On 5 June 2012 20:17, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>>  wrote:
 On 5 June 2012 17:38, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>  wrote:
>> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>>  wrote:
 It would be great if we implement the NEP listed above, but with a few
 extensions. I think Numpy should handle the lazy evaluation part, and
 determine when expressions should be evaluated, etc. However, for each
 user operation, Numpy will call back a user-installed hook
 implementing some interface, to allow various packages to provide
 their own hooks to evaluate vector operations however they want. This
 will include packages such as Theano, which could run things on the
 GPU, Numexpr, and in the future
 https://github.com/markflorisson88/minivect (which will likely have an
 LLVM backend in the future, and possibly integrated with Numba to
 allow inlining of numba ufuncs). The project above tries to bring
 together all the different array expression compilers together in a
 single framework, to provide efficient array expressions specialized
 for any data layout (nditer on steroids if you will, with SIMD,
 threaded and inlining capabilities).
>>>
>>> A global hook sounds ugly and hard to control -- it's hard to tell
>>> which operations should be deferred and which should be forced, etc.
>>
>> Yes, but for the user the difference should not be visible (unless
>> operations can raise exceptions, in which case you choose the safe
>> path, or let the user configure what to do).
>>
>>> While it would be less magical, I think a more explicit API would in
>>> the end be easier to use... something like
>>>
>>>  a, b, c, d = deferred([a, b, c, d])
>>>  e = a + b * c  # 'e' is a deferred object too
>>>  f = np.dot(e, d)  # so is 'f'
>>>  g = force(f)  # 'g' is an ndarray
>>>  # or
>>>  force(f, out=g)
>>>
>>> But at that point, this could easily be an external library, right?
>>> All we'd need from numpy would be some way for external types to
>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>>> several reasons to want that functionality, and it seems like
>>> developing these "improved numexpr" ideas would be much easier if they
>>> didn't require doing deep surgery to numpy itself...
>>
>> Definitely, but besides monkey-patch-chaining I think some
>> modifications would be required, but they would be reasonably simple.
>> Most of the functionality would be handled in one function, which most
>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>> , which is inserted after argument unpacking and sanity checking. You
>> could also do a per-module hook, and have the function look at
>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>> or Cython code.
>>
>> How did you have overrides in mind?
>
> My vague idea is that core numpy operations are about as fundamental
> for scientific users as the Python builtin operations are, so they
> should probably be overrideable in a similar way. So we'd teach numpy
> functions to check for methods named like "__numpy_ufunc__" or
> "__numpy_dot__" and let themselves be overridden if found. Like how
> __gt__ and __add__ and stuff work. Or something along those lines.
>
>> I also found this thread:
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>> , but I think you want more than just to override ufuncs, you want
>> numpy to govern when stuff is allowed to be lazy and when stuff should
>> be evaluated (e.g. when it is indexed, slice assigned (although that
>> itself may also be lazy), etc). You don't want some funny object back
>> that doesn't work with things which are not overridden in numpy.
>
> My point is that probably numpy should *not* govern the decision about
> what stuff should be lazy and what should be evaluated; that should be
> governed by some combination of the user and
> Numba/Theano/minivect/whatever. The toy API I sketched out would make
> those decisions obvious and explicit. (And if the funny objects had an
> __array_interface__ attribute that automatically forced evaluation
> when accessed, then they'd work fine with code that was expecting an
> array, or if they were assigned to a "real" ndarray, etc.)

 That's disappointing though, since the performance drawbacks can
 seve

Re: [Numpy-discussion] lazy evaluation

2012-06-05 Thread mark florisson
On 5 June 2012 22:36, Dag Sverre Seljebotn  wrote:
> On 06/05/2012 10:47 PM, mark florisson wrote:
>> On 5 June 2012 20:17, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>>>   wrote:
 On 5 June 2012 17:38, Nathaniel Smith  wrote:
> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>   wrote:
>> On 5 June 2012 14:58, Nathaniel Smith  wrote:
>>> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>>>   wrote:
 It would be great if we implement the NEP listed above, but with a few
 extensions. I think Numpy should handle the lazy evaluation part, and
 determine when expressions should be evaluated, etc. However, for each
 user operation, Numpy will call back a user-installed hook
 implementing some interface, to allow various packages to provide
 their own hooks to evaluate vector operations however they want. This
 will include packages such as Theano, which could run things on the
 GPU, Numexpr, and in the future
 https://github.com/markflorisson88/minivect (which will likely have an
 LLVM backend in the future, and possibly integrated with Numba to
 allow inlining of numba ufuncs). The project above tries to bring
 together all the different array expression compilers together in a
 single framework, to provide efficient array expressions specialized
 for any data layout (nditer on steroids if you will, with SIMD,
 threaded and inlining capabilities).
>>>
>>> A global hook sounds ugly and hard to control -- it's hard to tell
>>> which operations should be deferred and which should be forced, etc.
>>
>> Yes, but for the user the difference should not be visible (unless
>> operations can raise exceptions, in which case you choose the safe
>> path, or let the user configure what to do).
>>
>>> While it would be less magical, I think a more explicit API would in
>>> the end be easier to use... something like
>>>
>>>   a, b, c, d = deferred([a, b, c, d])
>>>   e = a + b * c  # 'e' is a deferred object too
>>>   f = np.dot(e, d)  # so is 'f'
>>>   g = force(f)  # 'g' is an ndarray
>>>   # or
>>>   force(f, out=g)
>>>
>>> But at that point, this could easily be an external library, right?
>>> All we'd need from numpy would be some way for external types to
>>> override the evaluation of ufuncs, np.dot, etc.? We've recently seen
>>> several reasons to want that functionality, and it seems like
>>> developing these "improved numexpr" ideas would be much easier if they
>>> didn't require doing deep surgery to numpy itself...
>>
>> Definitely, but besides monkey-patch-chaining I think some
>> modifications would be required, but they would be reasonably simple.
>> Most of the functionality would be handled in one function, which most
>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>> , which is inserted after argument unpacking and sanity checking. You
>> could also do a per-module hook, and have the function look at
>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>> or Cython code.
>>
>> How did you have overrides in mind?
>
> My vague idea is that core numpy operations are about as fundamental
> for scientific users as the Python builtin operations are, so they
> should probably be overrideable in a similar way. So we'd teach numpy
> functions to check for methods named like "__numpy_ufunc__" or
> "__numpy_dot__" and let themselves be overridden if found. Like how
> __gt__ and __add__ and stuff work. Or something along those lines.
>
>> I also found this thread:
>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>> , but I think you want more than just to override ufuncs, you want
>> numpy to govern when stuff is allowed to be lazy and when stuff should
>> be evaluated (e.g. when it is indexed, slice assigned (although that
>> itself may also be lazy), etc). You don't want some funny object back
>> that doesn't work with things which are not overridden in numpy.
>
> My point is that probably numpy should *not* govern the decision about
> what stuff should be lazy and what should be evaluated; that should be
> governed by some combination of the user and
> Numba/Theano/minivect/whatever. The toy API I sketched out would make
> those decisions obvious and explicit. (And if the funny objects had an
> __array_interface__ attribute that automatically forced evaluation
> when accessed, then they'd work fine with code that was expecting an
> array, or if they were assigned to a "real" ndarray, etc.)

 That's disappointing though, since the performance drawbacks can
 s

Re: [Numpy-discussion] lazy evaluation

2012-06-06 Thread Dag Sverre Seljebotn
On 06/06/2012 12:06 AM, mark florisson wrote:
> On 5 June 2012 22:36, Dag Sverre Seljebotn  wrote:
>> On 06/05/2012 10:47 PM, mark florisson wrote:
>>> On 5 June 2012 20:17, Nathaniel Smithwrote:
 On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
 wrote:
> On 5 June 2012 17:38, Nathaniel Smithwrote:
>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>> wrote:
>>> On 5 June 2012 14:58, Nathaniel Smithwrote:
 On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
 wrote:
> It would be great if we implement the NEP listed above, but with a few
> extensions. I think Numpy should handle the lazy evaluation part, and
> determine when expressions should be evaluated, etc. However, for each
> user operation, Numpy will call back a user-installed hook
> implementing some interface, to allow various packages to provide
> their own hooks to evaluate vector operations however they want. This
> will include packages such as Theano, which could run things on the
> GPU, Numexpr, and in the future
> https://github.com/markflorisson88/minivect (which will likely have an
> LLVM backend in the future, and possibly integrated with Numba to
> allow inlining of numba ufuncs). The project above tries to bring
> together all the different array expression compilers together in a
> single framework, to provide efficient array expressions specialized
> for any data layout (nditer on steroids if you will, with SIMD,
> threaded and inlining capabilities).

 A global hook sounds ugly and hard to control -- it's hard to tell
 which operations should be deferred and which should be forced, etc.
>>>
>>> Yes, but for the user the difference should not be visible (unless
>>> operations can raise exceptions, in which case you choose the safe
>>> path, or let the user configure what to do).
>>>
 While it would be less magical, I think a more explicit API would in
 the end be easier to use... something like

a, b, c, d = deferred([a, b, c, d])
e = a + b * c  # 'e' is a deferred object too
f = np.dot(e, d)  # so is 'f'
g = force(f)  # 'g' is an ndarray
# or
force(f, out=g)

 But at that point, this could easily be an external library, right?
 All we'd need from numpy would be some way for external types to
 override the evaluation of ufuncs, np.dot, etc.? We've recently seen
 several reasons to want that functionality, and it seems like
 developing these "improved numexpr" ideas would be much easier if they
 didn't require doing deep surgery to numpy itself...
>>>
>>> Definitely, but besides monkey-patch-chaining I think some
>>> modifications would be required, but they would be reasonably simple.
>>> Most of the functionality would be handled in one function, which most
>>> ufuncs (the ones you care about, as well as ufunc (methods) like add)
>>> call. E.g. if ((result = NPy_LazyEval("add", op1, op2)) return result;
>>> , which is inserted after argument unpacking and sanity checking. You
>>> could also do a per-module hook, and have the function look at
>>> sys._getframe(1).f_globals, but that is fragile and won't work from C
>>> or Cython code.
>>>
>>> How did you have overrides in mind?
>>
>> My vague idea is that core numpy operations are about as fundamental
>> for scientific users as the Python builtin operations are, so they
>> should probably be overrideable in a similar way. So we'd teach numpy
>> functions to check for methods named like "__numpy_ufunc__" or
>> "__numpy_dot__" and let themselves be overridden if found. Like how
>> __gt__ and __add__ and stuff work. Or something along those lines.
>>
>>> I also found this thread:
>>> http://mail.scipy.org/pipermail/numpy-discussion/2011-June/056945.html
>>> , but I think you want more than just to override ufuncs, you want
>>> numpy to govern when stuff is allowed to be lazy and when stuff should
>>> be evaluated (e.g. when it is indexed, slice assigned (although that
>>> itself may also be lazy), etc). You don't want some funny object back
>>> that doesn't work with things which are not overridden in numpy.
>>
>> My point is that probably numpy should *not* govern the decision about
>> what stuff should be lazy and what should be evaluated; that should be
>> governed by some combination of the user and
>> Numba/Theano/minivect/whatever. The toy API I sketched out would make
>> those decisions obvious and explicit. (And if the funny objects had an
>> __array_interface__ attribute that automatically forced evaluation
>> when accessed, then they'd work fine with code that was expecting

Re: [Numpy-discussion] lazy evaluation

2012-06-10 Thread James Bergstra
Hi all, (sorry for missing the debate, I don't often check my
numpy-list folder.)

I agree that an "official" numpy solution to this problem is
premature, but at the same time I think the failure to approach
anything remotely resembling a consensus on how to deal with lazy
evaluation is really gumming up the works across the numpy community.
With apologies in advance to projects I don't cite (sorry!) it is
currently the case that many high-level libraries (e.g. pacal, pymc,
theano, sympy, pylearn2) offer more or less symbolic math features for
scientific applications, but each one defines it's own lazy-numpy AST
thing to track functional relationships between inputs and/or random
variables. At the same time, none of these ASTs (except arguably
Theano's) is handled natively by the many competing lazy-evaluation
compiler runtimes (e.g. cython, numba, theano, numexpr). So
consequently, the feature-specific ASTs often become more of a
performance *problem* than a part of the optimizing-compiler pathway
and libraries that provide end-user APIs (in my work I think of
sklearn and skimage) continue to "wait and see" and don't commit to
*any* of the options (except labour-intensive cython), so we all lose.

The interesting development/insight I got from numba's byte-code
parsing technique is the illustration that *Python byte code* is:

a) a standard data structure that all Python code is already using

b) editable (see e.g. http://code.google.com/p/byteplay)

c) in pretty direct correspondance with high level (e.g. Theano's)
"abstract" syntax graphs

d) an unambiguous and obvious program specification for optimization
(e.g. numba)


After a little proof of concept work, I think that many high-level
semantic features (e.g. turning a stochastic function into a sampler
via PyMC, tracking uncertainty through computations, or minimizing a
numpy function directly by automatic differentiation) can and should
be done as bytecode -> bytecode transforms.  An implementation of e.g.
auto-diff will have to recognize when it can (and cannot) make sense
of a code object... so functions with lots of control flow, yield
statements, exception handling and the like may just be rejected.
That's OK because mathematical code does often not require complex
(often even *any*) control flow constructs.

With regards to users being surprised by strange resource usage
levels... this surprise can be avoided because a user who is applying
such transforms will be well aware that he/she has transformed the
original function into a new function.  That transformation would be
explicit, so there will be little suggestion from the program syntax
that the new function  has any statements in common with the original.
The new function will have different statements, different resource
usage profile, etc. I think APIs for this sort of bytecode->bytecode
transformation can avoid surprising users if they are done right.


If anyone is interested in my ongoing API & bytecode adventure in why
/ how lazy computing could be useful, I've put together a few tiny
hypothetically-runnable examples here:

https://github.com/jaberg/numba/tree/master/examples
https://github.com/jaberg/numba/blob/master/examples/linear_svm.py
https://github.com/jaberg/numba/blob/master/examples/mcmc.py

The purpose of the examples is to show how the features of e.g. Theano
and PyMC could be expressed as operators on raw Python code. Perhaps
most importantly of all, these transforms would work together: a PaCal
transform could automatically generate a likelihood function from a
model and data, and then a Theano transform could provide the
parameter gradients required to fit the likelihood. This natural
chaining is a complete PITA when every project uses its own AST.

That numba fork also includes very sketchy pseudocode of the main work
routines in the numba/ad.py and numba/rv.py files. The linear_svm
example was recently using Theano as a backend. I don't think it works
right now but FWIW it is still close to running.


Sorry for the long post,

- James


On Wed, Jun 6, 2012 at 5:22 AM, Dag Sverre Seljebotn
 wrote:
> On 06/06/2012 12:06 AM, mark florisson wrote:
>> On 5 June 2012 22:36, Dag Sverre Seljebotn  
>> wrote:
>>> On 06/05/2012 10:47 PM, mark florisson wrote:
 On 5 June 2012 20:17, Nathaniel Smith    wrote:
> On Tue, Jun 5, 2012 at 7:08 PM, mark florisson
>     wrote:
>> On 5 June 2012 17:38, Nathaniel Smith    wrote:
>>> On Tue, Jun 5, 2012 at 4:12 PM, mark florisson
>>>     wrote:
 On 5 June 2012 14:58, Nathaniel Smith    wrote:
> On Tue, Jun 5, 2012 at 12:55 PM, mark florisson
>     wrote:
>> It would be great if we implement the NEP listed above, but with a 
>> few
>> extensions. I think Numpy should handle the lazy evaluation part, and
>> determine when expressions should be evaluated, etc. However, for 
>> each
>> user operation, Numpy will call back a user-installed hook
>>>

Re: [Numpy-discussion] lazy evaluation

2012-06-11 Thread James Bergstra
On Mon, Jun 11, 2012 at 12:03 AM, James Bergstra
 wrote:
> If anyone is interested in my ongoing API & bytecode adventure in why
> / how lazy computing could be useful, I've put together a few tiny
> hypothetically-runnable examples here:
>
> https://github.com/jaberg/numba/tree/master/examples
> https://github.com/jaberg/numba/blob/master/examples/linear_svm.py
> https://github.com/jaberg/numba/blob/master/examples/mcmc.py
>
> The purpose of the examples is to show how the features of e.g. Theano
> and PyMC could be expressed as operators on raw Python code. Perhaps
> most importantly of all, these transforms would work together: a PaCal
> transform could automatically generate a likelihood function from a
> model and data, and then a Theano transform could provide the
> parameter gradients required to fit the likelihood. This natural
> chaining is a complete PITA when every project uses its own AST.
>
> That numba fork also includes very sketchy pseudocode of the main work
> routines in the numba/ad.py and numba/rv.py files. The linear_svm
> example was recently using Theano as a backend. I don't think it works
> right now but FWIW it is still close to running.
>

For those interested, the linear_svm example works again.

-- 
http://www-etud.iro.umontreal.ca/~bergstrj
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion