Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-28 Thread Frédéric Bastien
Hi,

Up to now, I didn't post frequently to this mailing list. So I present
myself a little bit. I'm Frédéric Bastien, one of Theano senior
developer. I'm not a student nor a professor, but a staff of our lab.
So I more time for Theano then most people in our lab, but I have
other work too.

>From now on,  I'm going to read every weekday email on the numpy
mailing list. I think this will help communication between the Theano
and numpy community. We have many common goal and this can only be
useful to all of us. Also, if possible, I'll try to go to conference
where discussion will occur on the big change coming to numpy.


So now, here is my comments/questions to those change.

Currently Theano include some type of functionality that I didn't saw
in the this thread or others related to the refactoring of Numpy.

1) Theano do numerical stability optimization. (simple example: log(1
+ x) -> log1p(x))
2) It allow to do symbolic "graph generation". We use this to do
symbolic differentiation, to compute the R op and L op used in Hessian
free.
3) Automatically make some operation inplace on there input: a + b -> a += b
4) Mathematical optimization like a*2+a -> a * 3
5) Implementation optimization (planed in the new numpy)

Is there any plan to add 1-4 in the new project? Will it be
doable(especially 2). The Theano interface that ask the user to create
a Theano function was caused by the fact it is hard/time consuming to
work on the AST that we looked a little bit. I didn't looked at the
python AST myself, so I don't know if it was possible or just hard. Do
someone know if there is all the information that we need?

I don't recall discussion on having Theano work at the byte-code
level. I never looked at python byte-code, but I remember having read
that it change at each version of python? If that is true, how do you
plan to handle this? I don't think that rewriting each optimization
again at each release of python is acceptable. Event if python change
the byte-code only at each X.Y release.

I like the idea of having a common library to do array expression
optimization (the arrayExprOpt idea). Theano optimizations cover much
more type of optimization as said then just element-wise optimization.
Up to now, I only remember having read about the element-wise
optimization. Did I forget something? Do you plan more then this? If
the goal stay at this level, it mean in the end it target to replace
numexpr, but I think we should target more then this as there is many
high level optimization possible that are useful.


There is many projects that do optimized code generation that use SSE*
instruction for tensor operation like blitz++(already named) and
eigen2. Redoing all this work is time consuming. So any reuse can be
useful. Don't forget that many implementation optimization are size
and hardware dependent. Do you plan to go that low level? If there is
such low level optimization, I suppose it will be only for the
frequent case.



I'm not sure if this thread is the right one for this, but we started
a new project for a full tensor on the GPU. It will be the used by
Theano, PyCUDA and PyOpenCL. But this ask change to the ndarray
interface. The current code is a C library that use cython for its
interface. We made a mailing list for it[1]. The current version is in
a fork. I think it will be highly valuable that we have in the end the
same gpu nd array with numpy too. It only have been announced as a
goal for the new Numpy up to now. We have some work done. It will be
usable shortly. But don't expect all Numpy function implemented yet :)


Frédéric Bastien

[1] https://github.com/inducer/compyte/wiki
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-25 Thread Dag Sverre Seljebotn
On 03/25/2012 08:44 PM, Mic wrote:
> How about:
> * http://www.hotpy.org/

The front page says a 10x speedup. That's a bit short of the almost 
1000x speedup required for numerical code (that is, for some examples 
Python is thousands of times slower than C or Fortran).

Well -- I'm sure hotpy could get beyond 10x as well -- it's just to say 
that they have probably not looked much on the case of numerical 
computation.

> * http://pypy.org/numpydonate.html

Well, you can start with these...:

http://technicaldiscovery.blogspot.com/2011/10/thoughts-on-porting-numpy-to-pypy.html

http://blog.streamitive.com/2011/10/19/more-thoughts-on-arrays-in-pypy/

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-25 Thread Mic
How about:
* http://www.hotpy.org/
* http://pypy.org/numpydonate.html


On Wed, Mar 21, 2012 at 11:14 PM, mark florisson
wrote:

> On 20 March 2012 20:49, Olivier Delalleau  wrote:
> > I doubt Theano is already as smart as you'd want it to be right now,
> however
> > the core mechanisms are there to perform graph optimizations and move
> > computations to GPU. It may save time to start from there instead of
> > starting all over from scratch. I'm not sure though, but it looks like it
> > would be worth considering it at least.
>
> Thanks for the suggestion Olivier, as Dag said we discusses it, and
> indeed we (or I) should look a lot deeper into it and see what
> components are reusable there and discuss with the Theano community if
> and how we can collaborate.
>
> > -=- Olivier
> >
> > Le 20 mars 2012 15:40, Dag Sverre Seljebotn 
> a
> > écrit :
> >
> >> We talked some about Theano. There are some differences in project goals
> >> which means that it makes sense to make this a seperate project: Cython
> >> wants to use this to generate C code up front from the Cython AST at
> >> compilation time; numba also has a different frontend (parsing of python
> >> bytecode) and a different backend (LLVM).
> >>
> >> However, it may very well be possible that Theano could be refactored so
> >> that the more essential algorithms working on the syntax tree could be
> >> pulled out and shared with cython and numba. Then the question is
> whether
> >> the core of Theano is smart enough to compete with Fortran compilers and
> >> support arbitraily strided inputs optimally. Otherwise one might as well
> >> start from scratch. I'll leave that for Mark to figure out...
> >>
> >> Dag
> >> --
> >> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
> >>
> >>
> >> Olivier Delalleau  wrote:
> >>>
> >>> This sounds a lot like Theano, did you look into it?
> >>>
> >>> -=- Olivier
> >>>
> >>> Le 20 mars 2012 13:49, mark florisson  a
> écrit
> >>> :
> 
>  On 13 March 2012 18:18, Travis Oliphant  wrote:
>  >>>
>  >>> (Mark F., how does the above match how you feel about this?)
>  >>
>  >> I would like collaboration, but from a technical perspective I
> think
>  >> this would be much more involved than just dumping the AST to an IR
>  >> and generating some code from there. For vector expressions I think
>  >> sharing code would be more feasible than arbitrary (parallel)
> loops,
>  >> etc. Cython as a compiler can make many decisions that a Python
>  >> (bytecode) compiler can't make (at least without annotations and a
>  >> well-defined subset of the language (not so much the syntax as the
>  >> semantics)). I think in numba, if parallelism is to be supported,
> you
>  >> will want a prange-like construct, as proving independence between
>  >> iterations can be very hard to near impossible for a compiler.
>  >
>  > I completely agree that you have to define some kind of syntax to
> get
>  > parallelism.  But, a prange construct would not be out of the
> question, of
>  > course.
>  >
>  >>
>  >> As for code generation, I'm not sure how llvm would do things like
>  >> slicing arrays, reshaping, resizing etc (for vector expressions you
>  >> can first evaluate all slicing and indexing operations and then
>  >> compile the remaining vector expression), but for loops and array
>  >> reassignment within loops this would have to invoke the actual
>  >> slicing
>  >> code from the llvm code (I presume).
>  >
>  > There could be some analysis on the byte-code, prior to emitting the
>  > llvm code in order to handle lots of things.   Basically, you have
> to "play"
>  > the byte-code on a simple machine anyway in order to emit the
> correct code.
>  >   The big thing about Cython is you have to typedef too many things
> that are
>  > really quite knowable from the code.   If Cython could improve it's
> type
>  > inference, then it would be a more suitable target.
>  >
>  >> There are many other things, like
>  >> bounds checking, wraparound, etc, that are all supported in both
>  >> numpy
>  >> and Cython, but going through an llvm layer would as far as I can
>  >> see,
>  >> require re-implementing those, at least if you want top-notch
>  >> performance. Personally, I think for non-trivial
> performance-critical
>  >> code (for loops with indexing, slicing, function calls, etc) Cython
>  >> is
>  >> a better target.
>  >
>  > With libclang it is really quite possible to imagine a cython -> C
>  > target that itself compiles to llvm so that you can do everything
> at that
>  > intermediate layer.   However,  LLVM is a much better layer for
> optimization
>  > than C now that there are a lot of people collaborating on that
> layer.   I
>  > think it would be great if Cython targeted LLVM actually instead of
> C.
>  >

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-21 Thread mark florisson
On 20 March 2012 20:49, Olivier Delalleau  wrote:
> I doubt Theano is already as smart as you'd want it to be right now, however
> the core mechanisms are there to perform graph optimizations and move
> computations to GPU. It may save time to start from there instead of
> starting all over from scratch. I'm not sure though, but it looks like it
> would be worth considering it at least.

Thanks for the suggestion Olivier, as Dag said we discusses it, and
indeed we (or I) should look a lot deeper into it and see what
components are reusable there and discuss with the Theano community if
and how we can collaborate.

> -=- Olivier
>
> Le 20 mars 2012 15:40, Dag Sverre Seljebotn  a
> écrit :
>
>> We talked some about Theano. There are some differences in project goals
>> which means that it makes sense to make this a seperate project: Cython
>> wants to use this to generate C code up front from the Cython AST at
>> compilation time; numba also has a different frontend (parsing of python
>> bytecode) and a different backend (LLVM).
>>
>> However, it may very well be possible that Theano could be refactored so
>> that the more essential algorithms working on the syntax tree could be
>> pulled out and shared with cython and numba. Then the question is whether
>> the core of Theano is smart enough to compete with Fortran compilers and
>> support arbitraily strided inputs optimally. Otherwise one might as well
>> start from scratch. I'll leave that for Mark to figure out...
>>
>> Dag
>> --
>> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>>
>>
>> Olivier Delalleau  wrote:
>>>
>>> This sounds a lot like Theano, did you look into it?
>>>
>>> -=- Olivier
>>>
>>> Le 20 mars 2012 13:49, mark florisson  a écrit
>>> :

 On 13 March 2012 18:18, Travis Oliphant  wrote:
 >>>
 >>> (Mark F., how does the above match how you feel about this?)
 >>
 >> I would like collaboration, but from a technical perspective I think
 >> this would be much more involved than just dumping the AST to an IR
 >> and generating some code from there. For vector expressions I think
 >> sharing code would be more feasible than arbitrary (parallel) loops,
 >> etc. Cython as a compiler can make many decisions that a Python
 >> (bytecode) compiler can't make (at least without annotations and a
 >> well-defined subset of the language (not so much the syntax as the
 >> semantics)). I think in numba, if parallelism is to be supported, you
 >> will want a prange-like construct, as proving independence between
 >> iterations can be very hard to near impossible for a compiler.
 >
 > I completely agree that you have to define some kind of syntax to get
 > parallelism.  But, a prange construct would not be out of the question, 
 > of
 > course.
 >
 >>
 >> As for code generation, I'm not sure how llvm would do things like
 >> slicing arrays, reshaping, resizing etc (for vector expressions you
 >> can first evaluate all slicing and indexing operations and then
 >> compile the remaining vector expression), but for loops and array
 >> reassignment within loops this would have to invoke the actual
 >> slicing
 >> code from the llvm code (I presume).
 >
 > There could be some analysis on the byte-code, prior to emitting the
 > llvm code in order to handle lots of things.   Basically, you have to 
 > "play"
 > the byte-code on a simple machine anyway in order to emit the correct 
 > code.
 >   The big thing about Cython is you have to typedef too many things that 
 > are
 > really quite knowable from the code.   If Cython could improve it's type
 > inference, then it would be a more suitable target.
 >
 >> There are many other things, like
 >> bounds checking, wraparound, etc, that are all supported in both
 >> numpy
 >> and Cython, but going through an llvm layer would as far as I can
 >> see,
 >> require re-implementing those, at least if you want top-notch
 >> performance. Personally, I think for non-trivial performance-critical
 >> code (for loops with indexing, slicing, function calls, etc) Cython
 >> is
 >> a better target.
 >
 > With libclang it is really quite possible to imagine a cython -> C
 > target that itself compiles to llvm so that you can do everything at that
 > intermediate layer.   However,  LLVM is a much better layer for 
 > optimization
 > than C now that there are a lot of people collaborating on that layer.   
 > I
 > think it would be great if Cython targeted LLVM actually instead of C.
 >
 >>
 >> Finally, as for non-vector-expression code, I really believe Cython
 >> is
 >> a better target. cython.inline can have high overhead (at least the
 >> first time it has to compile), but with better (numpy-aware) type
 >> inference or profile guided optimizations (see recent

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-21 Thread mark florisson
On 21 March 2012 05:20, Dag Sverre Seljebotn  wrote:
> On 03/20/2012 12:56 PM, Francesc Alted wrote:
>> On Mar 20, 2012, at 2:29 PM, Dag Sverre Seljebotn wrote:
>>> Francesc Alted  wrote:
>>>
 On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
>> Cython and Numba certainly overlap.  However, Cython requires:
>>
>>        1) learning another language
>>        2) creating an extension module --- loading bit-code files
 and dynamically executing (even on a different machine from the one
 that initially created them) can be a powerful alternative for run-time
 compilation and distribution of code.
>>
>> These aren't show-stoppers obviously.   But, I think some users
 would prefer an even simpler approach to getting fast-code than Cython
 (which currently doesn't do enought type-inference and requires
 building a dlopen extension module).
>
> Dag and I have been discussing this at PyCon, and here is my take on
> it (at this moment :).
>
> Definitely, if you can avoid Cython then that is easier and more
> desirable in many ways. So perhaps we can create a third project
> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
> abstract syntax tree in a rather simple form, performs code
> optimizations such as rewriting loops with array accesses to vector
> expressions, fusing vector expressions and loops, etc, and spits out
 a
> transformed AST containing these optimizations. If runtime
 information
> is given such as actual shape and stride information the
> transformations could figure out there and then whether to do things
> like collapsing, axes swapping, blocking (as in, introducing more
 axes
> or loops to retain discontiguous blocks in the cache), blocked memory
> copies to contiguous chunks, etc. The AST could then also say whether
> the final expressions are vectorizable. Part of this functionality is
> already in numpy's nditer, except that this would be implicit and do
> more (and hopefully with minimal overhead).
>
> So numba, Cython and maybe numexpr could use the functionality,
 simply
> by building the AST from Python and converting back (if necessary) to
> its own AST. As such, the AST optimizer would be only part of any
> (runtime) compiler's pipeline, and it should be very flexible to
> retain any information (metadata regarding actual types, control flow
> information, etc) provided by the original AST. It would not do
> control flow analysis, type inference or promotion, etc, but only
 deal
> with abstract types like integers, reals and arrays (C, Fortran or
> partly contiguous or strided). It would not deal with objects, but
> would allow to insert nodes like UnreorderableNode and SideEffectNode
> wrapping parts of the original AST. In short, it should be as easy as
> possible to convert from an original AST to this project's AST and
> back again afterwards.

 I think this is a very interesting project, and certainly projects like
 numba can benefit of it.  So, in order to us have an idea on what you
 are after, can we assume that your project (call it X) would be kind of
 an compiler optimizer, and then the produced, optimized code could be
 feed into numba for optimized LLVM code generation (that on its turn,
 can be run on top of CPUs or GPUs or a combination)?  Is that correct?
>>>
>>> I think so. Another way of thinking about it is that it is a 
>>> reimplementation of the logic in the (good and closed source) Fortran 90 
>>> compilers, in a reusable component for inclusion in various compilers.
>>>
>>> Various c++ metaprogramming libraries (like Blitz++) are similar too.
>>
>> Aha, thanks.
>>
 Giving that my interpretation above is correct, it is bit more
 difficult to me to see how your X project could be of benefit for
 numexpr.  In fact, I actually see this the other way round: once the
 optimizer has discovered the vectorization parts, then go one step
 further and generate code that uses numexpr automatically (call this,
 vectorization through numexpr).  This is what you mean, or I'm missing
 something?
>>>
>>> No. I think in some ways this is a competitor to numexpr -- you would gut 
>>> out the middle of numexpr and keep the frontend and backend, but use this 
>>> to optimize iteration order and blocking strategies.
>>
>> I see.  Yes, I can easily see Mark's project X + numba more as a competitor 
>> (killer?) to numexpr too.
>>
>>>
>>> I think the goal is for higher performance than what I understand numexpr 
>>> can provide (in some cases, not all!). For instance, can numexpr deal well 
>>> with
>>>
>>> a + a.T
>>>
>>> where a is a c-contiguous array? Any numpy-like iteration order will not 
>>> work well, one needs to use higher-dimensional (eg 2D) blocking, not 1D 
>>> blocking.
>>
>> No.  numexpr cannot

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Dag Sverre Seljebotn
On 03/20/2012 09:20 PM, Dag Sverre Seljebotn wrote:
> On 03/20/2012 12:56 PM, Francesc Alted wrote:
>> On Mar 20, 2012, at 2:29 PM, Dag Sverre Seljebotn wrote:
>>> Francesc Alted   wrote:
>>>
 On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
>> Cython and Numba certainly overlap.  However, Cython requires:
>>
>> 1) learning another language
>> 2) creating an extension module --- loading bit-code files
 and dynamically executing (even on a different machine from the one
 that initially created them) can be a powerful alternative for run-time
 compilation and distribution of code.
>>
>> These aren't show-stoppers obviously.   But, I think some users
 would prefer an even simpler approach to getting fast-code than Cython
 (which currently doesn't do enought type-inference and requires
 building a dlopen extension module).
>
> Dag and I have been discussing this at PyCon, and here is my take on
> it (at this moment :).
>
> Definitely, if you can avoid Cython then that is easier and more
> desirable in many ways. So perhaps we can create a third project
> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
> abstract syntax tree in a rather simple form, performs code
> optimizations such as rewriting loops with array accesses to vector
> expressions, fusing vector expressions and loops, etc, and spits out
 a
> transformed AST containing these optimizations. If runtime
 information
> is given such as actual shape and stride information the
> transformations could figure out there and then whether to do things
> like collapsing, axes swapping, blocking (as in, introducing more
 axes
> or loops to retain discontiguous blocks in the cache), blocked memory
> copies to contiguous chunks, etc. The AST could then also say whether
> the final expressions are vectorizable. Part of this functionality is
> already in numpy's nditer, except that this would be implicit and do
> more (and hopefully with minimal overhead).
>
> So numba, Cython and maybe numexpr could use the functionality,
 simply
> by building the AST from Python and converting back (if necessary) to
> its own AST. As such, the AST optimizer would be only part of any
> (runtime) compiler's pipeline, and it should be very flexible to
> retain any information (metadata regarding actual types, control flow
> information, etc) provided by the original AST. It would not do
> control flow analysis, type inference or promotion, etc, but only
 deal
> with abstract types like integers, reals and arrays (C, Fortran or
> partly contiguous or strided). It would not deal with objects, but
> would allow to insert nodes like UnreorderableNode and SideEffectNode
> wrapping parts of the original AST. In short, it should be as easy as
> possible to convert from an original AST to this project's AST and
> back again afterwards.

 I think this is a very interesting project, and certainly projects like
 numba can benefit of it.  So, in order to us have an idea on what you
 are after, can we assume that your project (call it X) would be kind of
 an compiler optimizer, and then the produced, optimized code could be
 feed into numba for optimized LLVM code generation (that on its turn,
 can be run on top of CPUs or GPUs or a combination)?  Is that correct?
>>>
>>> I think so. Another way of thinking about it is that it is a 
>>> reimplementation of the logic in the (good and closed source) Fortran 90 
>>> compilers, in a reusable component for inclusion in various compilers.
>>>
>>> Various c++ metaprogramming libraries (like Blitz++) are similar too.
>>
>> Aha, thanks.
>>
 Giving that my interpretation above is correct, it is bit more
 difficult to me to see how your X project could be of benefit for
 numexpr.  In fact, I actually see this the other way round: once the
 optimizer has discovered the vectorization parts, then go one step
 further and generate code that uses numexpr automatically (call this,
 vectorization through numexpr).  This is what you mean, or I'm missing
 something?
>>>
>>> No. I think in some ways this is a competitor to numexpr -- you would gut 
>>> out the middle of numexpr and keep the frontend and backend, but use this 
>>> to optimize iteration order and blocking strategies.
>>
>> I see.  Yes, I can easily see Mark's project X + numba more as a competitor 
>> (killer?) to numexpr too.
>>
>>>
>>> I think the goal is for higher performance than what I understand numexpr 
>>> can provide (in some cases, not all!). For instance, can numexpr deal well 
>>> with
>>>
>>> a + a.T
>>>
>>> where a is a c-contiguous array? Any numpy-like iteration order will not 
>>> work well, one needs to use higher-dimensional (eg 2D) blocking, not 1D 
>>> blocking.
>>
>> No.  numexpr cann

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Dag Sverre Seljebotn
On 03/20/2012 12:56 PM, Francesc Alted wrote:
> On Mar 20, 2012, at 2:29 PM, Dag Sverre Seljebotn wrote:
>> Francesc Alted  wrote:
>>
>>> On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
> Cython and Numba certainly overlap.  However, Cython requires:
>
>1) learning another language
>2) creating an extension module --- loading bit-code files
>>> and dynamically executing (even on a different machine from the one
>>> that initially created them) can be a powerful alternative for run-time
>>> compilation and distribution of code.
>
> These aren't show-stoppers obviously.   But, I think some users
>>> would prefer an even simpler approach to getting fast-code than Cython
>>> (which currently doesn't do enought type-inference and requires
>>> building a dlopen extension module).

 Dag and I have been discussing this at PyCon, and here is my take on
 it (at this moment :).

 Definitely, if you can avoid Cython then that is easier and more
 desirable in many ways. So perhaps we can create a third project
 called X (I'm not very creative, maybe ArrayExprOpt), that takes an
 abstract syntax tree in a rather simple form, performs code
 optimizations such as rewriting loops with array accesses to vector
 expressions, fusing vector expressions and loops, etc, and spits out
>>> a
 transformed AST containing these optimizations. If runtime
>>> information
 is given such as actual shape and stride information the
 transformations could figure out there and then whether to do things
 like collapsing, axes swapping, blocking (as in, introducing more
>>> axes
 or loops to retain discontiguous blocks in the cache), blocked memory
 copies to contiguous chunks, etc. The AST could then also say whether
 the final expressions are vectorizable. Part of this functionality is
 already in numpy's nditer, except that this would be implicit and do
 more (and hopefully with minimal overhead).

 So numba, Cython and maybe numexpr could use the functionality,
>>> simply
 by building the AST from Python and converting back (if necessary) to
 its own AST. As such, the AST optimizer would be only part of any
 (runtime) compiler's pipeline, and it should be very flexible to
 retain any information (metadata regarding actual types, control flow
 information, etc) provided by the original AST. It would not do
 control flow analysis, type inference or promotion, etc, but only
>>> deal
 with abstract types like integers, reals and arrays (C, Fortran or
 partly contiguous or strided). It would not deal with objects, but
 would allow to insert nodes like UnreorderableNode and SideEffectNode
 wrapping parts of the original AST. In short, it should be as easy as
 possible to convert from an original AST to this project's AST and
 back again afterwards.
>>>
>>> I think this is a very interesting project, and certainly projects like
>>> numba can benefit of it.  So, in order to us have an idea on what you
>>> are after, can we assume that your project (call it X) would be kind of
>>> an compiler optimizer, and then the produced, optimized code could be
>>> feed into numba for optimized LLVM code generation (that on its turn,
>>> can be run on top of CPUs or GPUs or a combination)?  Is that correct?
>>
>> I think so. Another way of thinking about it is that it is a 
>> reimplementation of the logic in the (good and closed source) Fortran 90 
>> compilers, in a reusable component for inclusion in various compilers.
>>
>> Various c++ metaprogramming libraries (like Blitz++) are similar too.
>
> Aha, thanks.
>
>>> Giving that my interpretation above is correct, it is bit more
>>> difficult to me to see how your X project could be of benefit for
>>> numexpr.  In fact, I actually see this the other way round: once the
>>> optimizer has discovered the vectorization parts, then go one step
>>> further and generate code that uses numexpr automatically (call this,
>>> vectorization through numexpr).  This is what you mean, or I'm missing
>>> something?
>>
>> No. I think in some ways this is a competitor to numexpr -- you would gut 
>> out the middle of numexpr and keep the frontend and backend, but use this to 
>> optimize iteration order and blocking strategies.
>
> I see.  Yes, I can easily see Mark's project X + numba more as a competitor 
> (killer?) to numexpr too.
>
>>
>> I think the goal is for higher performance than what I understand numexpr 
>> can provide (in some cases, not all!). For instance, can numexpr deal well 
>> with
>>
>> a + a.T
>>
>> where a is a c-contiguous array? Any numpy-like iteration order will not 
>> work well, one needs to use higher-dimensional (eg 2D) blocking, not 1D 
>> blocking.
>
> No.  numexpr cannot deal with the above problem efficiently.  numexpr is 
> about 1d blocking, so its approach is pretty naive (but effective for these 
> 1d blocking task

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Francesc Alted
On Mar 20, 2012, at 2:29 PM, Dag Sverre Seljebotn wrote:
> Francesc Alted  wrote:
> 
>> On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
 Cython and Numba certainly overlap.  However, Cython requires:
 
   1) learning another language
   2) creating an extension module --- loading bit-code files
>> and dynamically executing (even on a different machine from the one
>> that initially created them) can be a powerful alternative for run-time
>> compilation and distribution of code.
 
 These aren't show-stoppers obviously.   But, I think some users
>> would prefer an even simpler approach to getting fast-code than Cython
>> (which currently doesn't do enought type-inference and requires
>> building a dlopen extension module).
>>> 
>>> Dag and I have been discussing this at PyCon, and here is my take on
>>> it (at this moment :).
>>> 
>>> Definitely, if you can avoid Cython then that is easier and more
>>> desirable in many ways. So perhaps we can create a third project
>>> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
>>> abstract syntax tree in a rather simple form, performs code
>>> optimizations such as rewriting loops with array accesses to vector
>>> expressions, fusing vector expressions and loops, etc, and spits out
>> a
>>> transformed AST containing these optimizations. If runtime
>> information
>>> is given such as actual shape and stride information the
>>> transformations could figure out there and then whether to do things
>>> like collapsing, axes swapping, blocking (as in, introducing more
>> axes
>>> or loops to retain discontiguous blocks in the cache), blocked memory
>>> copies to contiguous chunks, etc. The AST could then also say whether
>>> the final expressions are vectorizable. Part of this functionality is
>>> already in numpy's nditer, except that this would be implicit and do
>>> more (and hopefully with minimal overhead).
>>> 
>>> So numba, Cython and maybe numexpr could use the functionality,
>> simply
>>> by building the AST from Python and converting back (if necessary) to
>>> its own AST. As such, the AST optimizer would be only part of any
>>> (runtime) compiler's pipeline, and it should be very flexible to
>>> retain any information (metadata regarding actual types, control flow
>>> information, etc) provided by the original AST. It would not do
>>> control flow analysis, type inference or promotion, etc, but only
>> deal
>>> with abstract types like integers, reals and arrays (C, Fortran or
>>> partly contiguous or strided). It would not deal with objects, but
>>> would allow to insert nodes like UnreorderableNode and SideEffectNode
>>> wrapping parts of the original AST. In short, it should be as easy as
>>> possible to convert from an original AST to this project's AST and
>>> back again afterwards.
>> 
>> I think this is a very interesting project, and certainly projects like
>> numba can benefit of it.  So, in order to us have an idea on what you
>> are after, can we assume that your project (call it X) would be kind of
>> an compiler optimizer, and then the produced, optimized code could be
>> feed into numba for optimized LLVM code generation (that on its turn,
>> can be run on top of CPUs or GPUs or a combination)?  Is that correct?
> 
> I think so. Another way of thinking about it is that it is a reimplementation 
> of the logic in the (good and closed source) Fortran 90 compilers, in a 
> reusable component for inclusion in various compilers.
> 
> Various c++ metaprogramming libraries (like Blitz++) are similar too.

Aha, thanks.

>> Giving that my interpretation above is correct, it is bit more
>> difficult to me to see how your X project could be of benefit for
>> numexpr.  In fact, I actually see this the other way round: once the
>> optimizer has discovered the vectorization parts, then go one step
>> further and generate code that uses numexpr automatically (call this,
>> vectorization through numexpr).  This is what you mean, or I'm missing
>> something?
> 
> No. I think in some ways this is a competitor to numexpr -- you would gut out 
> the middle of numexpr and keep the frontend and backend, but use this to 
> optimize iteration order and blocking strategies.

I see.  Yes, I can easily see Mark's project X + numba more as a competitor 
(killer?) to numexpr too.

> 
> I think the goal is for higher performance than what I understand numexpr can 
> provide (in some cases, not all!). For instance, can numexpr deal well with
> 
> a + a.T
> 
> where a is a c-contiguous array? Any numpy-like iteration order will not work 
> well, one needs to use higher-dimensional (eg 2D) blocking, not 1D blocking.

No.  numexpr cannot deal with the above problem efficiently.  numexpr is about 
1d blocking, so its approach is pretty naive (but effective for these 1d 
blocking tasks).  

> (if numexpr can do this then great, the task might then reduce to refactoring 
> numexpr so that cython and numba can use the same logic)

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Olivier Delalleau
I doubt Theano is already as smart as you'd want it to be right now,
however the core mechanisms are there to perform graph optimizations and
move computations to GPU. It may save time to start from there instead of
starting all over from scratch. I'm not sure though, but it looks like it
would be worth considering it at least.

-=- Olivier

Le 20 mars 2012 15:40, Dag Sverre Seljebotn  a
écrit :

> ** We talked some about Theano. There are some differences in project
> goals which means that it makes sense to make this a seperate project:
> Cython wants to use this to generate C code up front from the Cython AST at
> compilation time; numba also has a different frontend (parsing of python
> bytecode) and a different backend (LLVM).
>
> However, it may very well be possible that Theano could be refactored so
> that the more essential algorithms working on the syntax tree could be
> pulled out and shared with cython and numba. Then the question is whether
> the core of Theano is smart enough to compete with Fortran compilers and
> support arbitraily strided inputs optimally. Otherwise one might as well
> start from scratch. I'll leave that for Mark to figure out...
>
> Dag
> --
> Sent from my Android phone with K-9 Mail. Please excuse my brevity.
>
>
> Olivier Delalleau  wrote:
>>
>> This sounds a lot like Theano, did you look into it?
>>
>> -=- Olivier
>>
>> Le 20 mars 2012 13:49, mark florisson  a
>> écrit :
>>
>>> On 13 March 2012 18:18, Travis Oliphant  wrote:
>>> >>>
>>> >>> (Mark F., how does the above match how you feel about this?)
>>> >>
>>> >> I would like collaboration, but from a technical perspective I think
>>> >> this would be much more involved than just dumping the AST to an IR
>>> >> and generating some code from there. For vector expressions I think
>>> >> sharing code would be more feasible than arbitrary (parallel) loops,
>>> >> etc. Cython as a compiler can make many decisions that a Python
>>> >> (bytecode) compiler can't make (at least without annotations and a
>>> >> well-defined subset of the language (not so much the syntax as the
>>> >> semantics)). I think in numba, if parallelism is to be supported, you
>>> >> will want a prange-like construct, as proving independence between
>>> >> iterations can be very hard to near impossible for a compiler.
>>> >
>>> > I completely agree that you have to define some kind of syntax to get
>>> parallelism.  But, a prange construct would not be out of the question, of
>>> course.
>>> >
>>> >>
>>> >> As for code generation, I'm not sure how llvm would do things like
>>> >> slicing arrays, reshaping, resizing etc (for vector expressions you
>>> >> can first evaluate all slicing and indexing operations and then
>>> >> compile the remaining vector expression), but for loops and array
>>> >> reassignment within loops this would have to invoke the actual slicing
>>> >> code from the llvm code (I presume).
>>> >
>>> > There could be some analysis on the byte-code, prior to emitting the
>>> llvm code in order to handle lots of things.   Basically, you have to
>>> "play" the byte-code on a simple machine anyway in order to emit the
>>> correct code.   The big thing about Cython is you have to typedef too many
>>> things that are really quite knowable from the code.   If Cython could
>>> improve it's type inference, then it would be a more suitable target.
>>> >
>>> >> There are many other things, like
>>> >> bounds checking, wraparound, etc, that are all supported in both numpy
>>> >> and Cython, but going through an llvm layer would as far as I can see,
>>> >> require re-implementing those, at least if you want top-notch
>>> >> performance. Personally, I think for non-trivial performance-critical
>>> >> code (for loops with indexing, slicing, function calls, etc) Cython is
>>> >> a better target.
>>> >
>>> > With libclang it is really quite possible to imagine a cython -> C
>>> target that itself compiles to llvm so that you can do everything at that
>>> intermediate layer.   However,  LLVM is a much better layer for
>>> optimization than C now that there are a lot of people collaborating on
>>> that layer.   I think it would be great if Cython targeted LLVM actually
>>> instead of C.
>>> >
>>> >>
>>> >> Finally, as for non-vector-expression code, I really believe Cython is
>>> >> a better target. cython.inline can have high overhead (at least the
>>> >> first time it has to compile), but with better (numpy-aware) type
>>> >> inference or profile guided optimizations (see recent threads on the
>>> >> cython-dev mailing list), in addition to things like prange, I
>>> >> personally believe Cython targets most of the use cases where numba
>>> >> would be able to generate performing code.
>>> >
>>> > Cython and Numba certainly overlap.  However, Cython requires:
>>> >
>>> >1) learning another language
>>> >2) creating an extension module --- loading bit-code files and
>>> dynamically executing (even on a different machine from the one that

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Dag Sverre Seljebotn
Sorry, forgot to CC list on this. Lines staring with single greater-than are 
mine.
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Dag Sverre Seljebotn  wrote:



Francesc Alted  wrote:

>On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
>>> Cython and Numba certainly overlap. However, Cython requires:
>>> 
>>> 1) learning another language
>>> 2) creating an extension module --- loading bit-code files
>and dynamically executing (even on a different machine from the one
>that initially created them) can be a powerful alternative for run-time
>compilation and distribution of code.
>>> 
>>> These aren't show-stoppers obviously. But, I think some users
>would prefer an even simpler approach to getting fast-code than Cython
>(which currently doesn't do enought type-inference and requires
>building a dlopen extension module).
>> 
>> Dag and I have been discussing this at PyCon, and here is my take on
>> it (at this moment :).
>> 
>> Definitely, if you can avoid Cython then that is easier and more
>> desirable in many ways. So perhaps we can create a third project
>> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
>> abstract syntax tree in a rather simple form, performs code
>> optimizations such as rewriting loops with array accesses to vector
>> expressions, fusing vector expressions and loops, etc, and spits out
>a
>> transformed AST containing these optimizations. If runtime
>information
>> is given such as actual shape and stride information the
>> transformations could figure out there and then whether to do things
>> like collapsing, axes swapping, blocking (as in, introducing more
>axes
>> or loops to retain discontiguous blocks in the cache), blocked memory
>> copies to contiguous chunks, etc. The AST could then also say whether
>> the final expressions are vectorizable. Part of this functionality is
>> already in numpy's nditer, except that this would be implicit and do
>> more (and hopefully with minimal overhead).
>> 
>> So numba, Cython and maybe numexpr could use the functionality,
>simply
>> by building the AST from Python and converting back (if necessary) to
>> its own AST. As such, the AST optimizer would be only part of any
>> (runtime) compiler's pipeline, and it should be very flexible to
>> retain any information (metadata regarding actual types, control flow
>> information, etc) provided by the original AST. It would not do
>> control flow analysis, type inference or promotion, etc, but only
>deal
>> with abstract types like integers, reals and arrays (C, Fortran or
>> partly contiguous or strided). It would not deal with objects, but
>> would allow to insert nodes like UnreorderableNode and SideEffectNode
>> wrapping parts of the original AST. In short, it should be as easy as
>> possible to convert from an original AST to this project's AST and
>> back again afterwards.
>
>I think this is a very interesting project, and certainly projects like
>numba can benefit of it. So, in order to us have an idea on what you
>are after, can we assume that your project (call it X) would be kind of
>an compiler optimizer, and then the produced, optimized code could be
>feed into numba for optimized LLVM code generation (that on its turn,
>can be run on top of CPUs or GPUs or a combination)? Is that correct?

I think so. Another way of thinking about it is that it is a reimplementation 
of the logic in the (good and closed source) Fortran 90 compilers, in a 
reusable component for inclusion in various compilers.

Various c++ metaprogramming libraries (like Blitz++) are similar too.

>
>Giving that my interpretation above is correct, it is bit more
>difficult to me to see how your X project could be of benefit for
>numexpr. In fact, I actually see this the other way round: once the
>optimizer has discovered the vectorization parts, then go one step
>further and generate code that uses numexpr automatically (call this,
>vectorization through numexpr). This is what you mean, or I'm missing
>something?

No. I think in some ways this is a competitor to numexpr -- you would gut out 
the middle of numexpr and keep the frontend and backend, but use this to 
optimize iteration order and blocking strategies.

I think the goal is for higher performance than what I understand numexpr can 
provide (in some cases, not all!). For instance, can numexpr deal well with

a + a.T

where a is a c-contiguous array? Any numpy-like iteration order will not work 
well, one needs to use higher-dimensional (eg 2D) blocking, not 1D blocking.

(if numexpr can do this then great, the task might then reduce to refactoring 
numexpr so that cython and numba can use the same logic)

Dag

>
>> As the project matures many optimizations may be added that deal with
>> all sorts of loop restructuring and ways to efficiently utilize the
>> cache as well as enable vectorization and possibly parallelism.
>> Perhaps it could even generate a different AST depending on whether
>> execution target the

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Dag Sverre Seljebotn
We talked some about Theano. There are some differences in project goals which 
means that it makes sense to make this a seperate project: Cython wants to use 
this to generate C code up front from the Cython AST at compilation time; numba 
also has a different frontend (parsing of python bytecode) and a different 
backend (LLVM).

However, it may very well be possible that Theano could be refactored so that 
the more essential algorithms working on the syntax tree could be pulled out 
and shared with cython and numba. Then the question is whether the core of 
Theano is smart enough to compete with Fortran compilers and support arbitraily 
strided inputs optimally. Otherwise one might as well start from scratch. I'll 
leave that for Mark to figure out...

Dag
-- 
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Olivier Delalleau  wrote:

This sounds a lot like Theano, did you look into it?

-=- Olivier

Le 20 mars 2012 13:49, mark florisson  a écrit :

On 13 March 2012 18:18, Travis Oliphant  wrote:
>>>
>>> (Mark F., how does the above match how you feel about this?)
>>
>> I would like collaboration, but from a technical perspective I think
>> this would be much more involved than just dumping the AST to an IR
>> and generating some code from there. For vector expressions I think
>> sharing code would be more feasible than arbitrary (parallel) loops,
>> etc. Cython as a compiler can make many decisions that a Python
>> (bytecode) compiler can't make (at least without annotations and a
>> well-defined subset of the language (not so much the syntax as the
>> semantics)). I think in numba, if parallelism is to be supported, you
>> will want a prange-like construct, as proving independence between
>> iterations can be very hard to near impossible for a compiler.
>
> I completely agree that you have to define some kind of syntax to get 
> parallelism.  But, a prange construct would not be out of the question, of 
> course.
>
>>
>> As for code generation, I'm not sure how llvm would do things like
>> slicing arrays, reshaping, resizing etc (for vector expressions you
>> can first evaluate all slicing and indexing operations and then
>> compile the remaining vector expression), but for loops and array
>> reassignment within loops this would have to invoke the actual slicing
>> code from the llvm code (I presume).
>
> There could be some analysis on the byte-code, prior to emitting the llvm 
> code in order to handle lots of things.   Basically, you have to "play" the 
> byte-code on a simple machine anyway in order to emit the correct code.   The 
> big thing about Cython is you have to typedef too many things that are really 
> quite knowable from the code.   If Cython could improve it's type inference, 
> then it would be a more suitable target.
>
>> There are many other things, like
>> bounds checking, wraparound, etc, that are all supported in both numpy
>> and Cython, but going through an llvm layer would as far as I can see,
>> require re-implementing those, at least if you want top-notch
>> performance. Personally, I think for non-trivial performance-critical
>> code (for loops with indexing, slicing, function calls, etc) Cython is
>> a better target.
>
> With libclang it is really quite possible to imagine a cython -> C target 
> that itself compiles to llvm so that you can do everything at that 
> intermediate layer.   However,  LLVM is a much better layer for optimization 
> than C now that there are a lot of people collaborating on that layer.   I 
> think it would be great if Cython targeted LLVM actually instead of C.
>
>>
>> Finally, as for non-vector-expression code, I really believe Cython is
>> a better target. cython.inline can have high overhead (at least the
>> first time it has to compile), but with better (numpy-aware) type
>> inference or profile guided optimizations (see recent threads on the
>> cython-dev mailing list), in addition to things like prange, I
>> personally believe Cython targets most of the use cases where numba
>> would be able to generate performing code.
>
> Cython and Numba certainly overlap.  However, Cython requires:
>
>1) learning another language
>2) creating an extension module --- loading bit-code files and 
> dynamically executing (even on a different machine from the one that 
> initially created them) can be a powerful alternative for run-time 
> compilation and distribution of code.
>
> These aren't show-stoppers obviously.   But, I think some users would prefer 
> an even simpler approach to getting fast-code than Cython (which currently 
> doesn't do enought type-inference and requires building a dlopen extension 
> module).

Dag and I have been discussing this at PyCon, and here is my take on
it (at this moment :).

Definitely, if you can avoid Cython then that is easier and more
desirable in many ways. So perhaps we can create a third project
called X (I'm not very creative, maybe ArrayExprOpt), that takes an
abstract syn

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Francesc Alted
On Mar 20, 2012, at 12:49 PM, mark florisson wrote:
>> Cython and Numba certainly overlap.  However, Cython requires:
>> 
>>1) learning another language
>>2) creating an extension module --- loading bit-code files and 
>> dynamically executing (even on a different machine from the one that 
>> initially created them) can be a powerful alternative for run-time 
>> compilation and distribution of code.
>> 
>> These aren't show-stoppers obviously.   But, I think some users would prefer 
>> an even simpler approach to getting fast-code than Cython (which currently 
>> doesn't do enought type-inference and requires building a dlopen extension 
>> module).
> 
> Dag and I have been discussing this at PyCon, and here is my take on
> it (at this moment :).
> 
> Definitely, if you can avoid Cython then that is easier and more
> desirable in many ways. So perhaps we can create a third project
> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
> abstract syntax tree in a rather simple form, performs code
> optimizations such as rewriting loops with array accesses to vector
> expressions, fusing vector expressions and loops, etc, and spits out a
> transformed AST containing these optimizations. If runtime information
> is given such as actual shape and stride information the
> transformations could figure out there and then whether to do things
> like collapsing, axes swapping, blocking (as in, introducing more axes
> or loops to retain discontiguous blocks in the cache), blocked memory
> copies to contiguous chunks, etc. The AST could then also say whether
> the final expressions are vectorizable. Part of this functionality is
> already in numpy's nditer, except that this would be implicit and do
> more (and hopefully with minimal overhead).
> 
> So numba, Cython and maybe numexpr could use the functionality, simply
> by building the AST from Python and converting back (if necessary) to
> its own AST. As such, the AST optimizer would be only part of any
> (runtime) compiler's pipeline, and it should be very flexible to
> retain any information (metadata regarding actual types, control flow
> information, etc) provided by the original AST. It would not do
> control flow analysis, type inference or promotion, etc, but only deal
> with abstract types like integers, reals and arrays (C, Fortran or
> partly contiguous or strided). It would not deal with objects, but
> would allow to insert nodes like UnreorderableNode and SideEffectNode
> wrapping parts of the original AST. In short, it should be as easy as
> possible to convert from an original AST to this project's AST and
> back again afterwards.

I think this is a very interesting project, and certainly projects like numba 
can benefit of it.  So, in order to us have an idea on what you are after, can 
we assume that your project (call it X) would be kind of an compiler optimizer, 
and then the produced, optimized code could be feed into numba for optimized 
LLVM code generation (that on its turn, can be run on top of CPUs or GPUs or a 
combination)?  Is that correct?

Giving that my interpretation above is correct, it is bit more difficult to me 
to see how your X project could be of benefit for numexpr.  In fact, I actually 
see this the other way round: once the optimizer has discovered the 
vectorization parts, then go one step further and generate code that uses 
numexpr automatically (call this, vectorization through numexpr).  This is what 
you mean, or I'm missing something?

> As the project matures many optimizations may be added that deal with
> all sorts of loop restructuring and ways to efficiently utilize the
> cache as well as enable vectorization and possibly parallelism.
> Perhaps it could even generate a different AST depending on whether
> execution target the CPU or the GPU (with optionally available
> information such as cache sizes, GPU shared/local memory sizes, etc).
> 
> Seeing that this would be a part of my master dissertation, my
> supervisor would require me to write the code, so at least until
> August I think I would have to write (at least the bulk of) this.
> Otherwise I can also make other parts of my dissertation's project
> more prominent to make up for it. Anyway, my question is, is there
> interest from at least the numba and numexpr projects (if code can be
> transformed into vector operations, it makes sense to use numexpr for
> that, I'm not sure what numba's interest is in that).

I'm definitely interested for the numexpr part.  It is just that I'm still 
struggling to see the big picture on this.  But the general idea is really 
appealing.

Thanks,

-- Francesc Alted



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread Olivier Delalleau
This sounds a lot like Theano, did you look into it?

-=- Olivier

Le 20 mars 2012 13:49, mark florisson  a écrit :

> On 13 March 2012 18:18, Travis Oliphant  wrote:
> >>>
> >>> (Mark F., how does the above match how you feel about this?)
> >>
> >> I would like collaboration, but from a technical perspective I think
> >> this would be much more involved than just dumping the AST to an IR
> >> and generating some code from there. For vector expressions I think
> >> sharing code would be more feasible than arbitrary (parallel) loops,
> >> etc. Cython as a compiler can make many decisions that a Python
> >> (bytecode) compiler can't make (at least without annotations and a
> >> well-defined subset of the language (not so much the syntax as the
> >> semantics)). I think in numba, if parallelism is to be supported, you
> >> will want a prange-like construct, as proving independence between
> >> iterations can be very hard to near impossible for a compiler.
> >
> > I completely agree that you have to define some kind of syntax to get
> parallelism.  But, a prange construct would not be out of the question, of
> course.
> >
> >>
> >> As for code generation, I'm not sure how llvm would do things like
> >> slicing arrays, reshaping, resizing etc (for vector expressions you
> >> can first evaluate all slicing and indexing operations and then
> >> compile the remaining vector expression), but for loops and array
> >> reassignment within loops this would have to invoke the actual slicing
> >> code from the llvm code (I presume).
> >
> > There could be some analysis on the byte-code, prior to emitting the
> llvm code in order to handle lots of things.   Basically, you have to
> "play" the byte-code on a simple machine anyway in order to emit the
> correct code.   The big thing about Cython is you have to typedef too many
> things that are really quite knowable from the code.   If Cython could
> improve it's type inference, then it would be a more suitable target.
> >
> >> There are many other things, like
> >> bounds checking, wraparound, etc, that are all supported in both numpy
> >> and Cython, but going through an llvm layer would as far as I can see,
> >> require re-implementing those, at least if you want top-notch
> >> performance. Personally, I think for non-trivial performance-critical
> >> code (for loops with indexing, slicing, function calls, etc) Cython is
> >> a better target.
> >
> > With libclang it is really quite possible to imagine a cython -> C
> target that itself compiles to llvm so that you can do everything at that
> intermediate layer.   However,  LLVM is a much better layer for
> optimization than C now that there are a lot of people collaborating on
> that layer.   I think it would be great if Cython targeted LLVM actually
> instead of C.
> >
> >>
> >> Finally, as for non-vector-expression code, I really believe Cython is
> >> a better target. cython.inline can have high overhead (at least the
> >> first time it has to compile), but with better (numpy-aware) type
> >> inference or profile guided optimizations (see recent threads on the
> >> cython-dev mailing list), in addition to things like prange, I
> >> personally believe Cython targets most of the use cases where numba
> >> would be able to generate performing code.
> >
> > Cython and Numba certainly overlap.  However, Cython requires:
> >
> >1) learning another language
> >2) creating an extension module --- loading bit-code files and
> dynamically executing (even on a different machine from the one that
> initially created them) can be a powerful alternative for run-time
> compilation and distribution of code.
> >
> > These aren't show-stoppers obviously.   But, I think some users would
> prefer an even simpler approach to getting fast-code than Cython (which
> currently doesn't do enought type-inference and requires building a dlopen
> extension module).
>
> Dag and I have been discussing this at PyCon, and here is my take on
> it (at this moment :).
>
> Definitely, if you can avoid Cython then that is easier and more
> desirable in many ways. So perhaps we can create a third project
> called X (I'm not very creative, maybe ArrayExprOpt), that takes an
> abstract syntax tree in a rather simple form, performs code
> optimizations such as rewriting loops with array accesses to vector
> expressions, fusing vector expressions and loops, etc, and spits out a
> transformed AST containing these optimizations. If runtime information
> is given such as actual shape and stride information the
> transformations could figure out there and then whether to do things
> like collapsing, axes swapping, blocking (as in, introducing more axes
> or loops to retain discontiguous blocks in the cache), blocked memory
> copies to contiguous chunks, etc. The AST could then also say whether
> the final expressions are vectorizable. Part of this functionality is
> already in numpy's nditer, except that this would be implicit and do
> more (and h

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-20 Thread mark florisson
On 13 March 2012 18:18, Travis Oliphant  wrote:
>>>
>>> (Mark F., how does the above match how you feel about this?)
>>
>> I would like collaboration, but from a technical perspective I think
>> this would be much more involved than just dumping the AST to an IR
>> and generating some code from there. For vector expressions I think
>> sharing code would be more feasible than arbitrary (parallel) loops,
>> etc. Cython as a compiler can make many decisions that a Python
>> (bytecode) compiler can't make (at least without annotations and a
>> well-defined subset of the language (not so much the syntax as the
>> semantics)). I think in numba, if parallelism is to be supported, you
>> will want a prange-like construct, as proving independence between
>> iterations can be very hard to near impossible for a compiler.
>
> I completely agree that you have to define some kind of syntax to get 
> parallelism.  But, a prange construct would not be out of the question, of 
> course.
>
>>
>> As for code generation, I'm not sure how llvm would do things like
>> slicing arrays, reshaping, resizing etc (for vector expressions you
>> can first evaluate all slicing and indexing operations and then
>> compile the remaining vector expression), but for loops and array
>> reassignment within loops this would have to invoke the actual slicing
>> code from the llvm code (I presume).
>
> There could be some analysis on the byte-code, prior to emitting the llvm 
> code in order to handle lots of things.   Basically, you have to "play" the 
> byte-code on a simple machine anyway in order to emit the correct code.   The 
> big thing about Cython is you have to typedef too many things that are really 
> quite knowable from the code.   If Cython could improve it's type inference, 
> then it would be a more suitable target.
>
>> There are many other things, like
>> bounds checking, wraparound, etc, that are all supported in both numpy
>> and Cython, but going through an llvm layer would as far as I can see,
>> require re-implementing those, at least if you want top-notch
>> performance. Personally, I think for non-trivial performance-critical
>> code (for loops with indexing, slicing, function calls, etc) Cython is
>> a better target.
>
> With libclang it is really quite possible to imagine a cython -> C target 
> that itself compiles to llvm so that you can do everything at that 
> intermediate layer.   However,  LLVM is a much better layer for optimization 
> than C now that there are a lot of people collaborating on that layer.   I 
> think it would be great if Cython targeted LLVM actually instead of C.
>
>>
>> Finally, as for non-vector-expression code, I really believe Cython is
>> a better target. cython.inline can have high overhead (at least the
>> first time it has to compile), but with better (numpy-aware) type
>> inference or profile guided optimizations (see recent threads on the
>> cython-dev mailing list), in addition to things like prange, I
>> personally believe Cython targets most of the use cases where numba
>> would be able to generate performing code.
>
> Cython and Numba certainly overlap.  However, Cython requires:
>
>        1) learning another language
>        2) creating an extension module --- loading bit-code files and 
> dynamically executing (even on a different machine from the one that 
> initially created them) can be a powerful alternative for run-time 
> compilation and distribution of code.
>
> These aren't show-stoppers obviously.   But, I think some users would prefer 
> an even simpler approach to getting fast-code than Cython (which currently 
> doesn't do enought type-inference and requires building a dlopen extension 
> module).

Dag and I have been discussing this at PyCon, and here is my take on
it (at this moment :).

Definitely, if you can avoid Cython then that is easier and more
desirable in many ways. So perhaps we can create a third project
called X (I'm not very creative, maybe ArrayExprOpt), that takes an
abstract syntax tree in a rather simple form, performs code
optimizations such as rewriting loops with array accesses to vector
expressions, fusing vector expressions and loops, etc, and spits out a
transformed AST containing these optimizations. If runtime information
is given such as actual shape and stride information the
transformations could figure out there and then whether to do things
like collapsing, axes swapping, blocking (as in, introducing more axes
or loops to retain discontiguous blocks in the cache), blocked memory
copies to contiguous chunks, etc. The AST could then also say whether
the final expressions are vectorizable. Part of this functionality is
already in numpy's nditer, except that this would be implicit and do
more (and hopefully with minimal overhead).

So numba, Cython and maybe numexpr could use the functionality, simply
by building the AST from Python and converting back (if necessary) to
its own AST. As such, the AST optimizer would be only part of any
(

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-13 Thread Ilan Schnell
As far as I understand, the goal is not to handle arbitrary
Python code, because this would become too difficult as is
not necessary when you have a simple math oriented function
which you want to speed up.  The idea is to create something
similar to http://www.enthought.com/~ischnell/paper.html which
only handles some restricted Python code, but using LLVM
(instead of C) as the target language.

- Ilan


On Tue, Mar 13, 2012 at 1:39 PM, Nathaniel Smith  wrote:
> On Tue, Mar 13, 2012 at 5:18 PM, Travis Oliphant  wrote:
>> Cython and Numba certainly overlap.  However, Cython requires:
>>
>>        1) learning another language
>
> So is the goal for numba to actually handle arbitrary Python code with
> correct semantics, i.e., it's actually a compiled implementation of
> Python-the-language? (I feel like most of where Cython-the-language
> differs from Python-the-language is that Cython adds extensions for
> stuff where getting speed out of Python-the-language would just be too
> hard. Dynamic type inference for numpy arrays is definitely a good
> start, but you can't even, say, promote a Python integer to a C
> integer without changing semantics...)
>
>>        2) creating an extension module --- loading bit-code files and 
>> dynamically executing (even on a different machine from the one that 
>> initially created them) can be a powerful alternative for run-time 
>> compilation and distribution of code.
>
> Totally agreed on this point, the workflow for Cython could definitely
> be smoother.
>
> -- Nathaniel
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-13 Thread Nathaniel Smith
On Tue, Mar 13, 2012 at 5:18 PM, Travis Oliphant  wrote:
> Cython and Numba certainly overlap.  However, Cython requires:
>
>        1) learning another language

So is the goal for numba to actually handle arbitrary Python code with
correct semantics, i.e., it's actually a compiled implementation of
Python-the-language? (I feel like most of where Cython-the-language
differs from Python-the-language is that Cython adds extensions for
stuff where getting speed out of Python-the-language would just be too
hard. Dynamic type inference for numpy arrays is definitely a good
start, but you can't even, say, promote a Python integer to a C
integer without changing semantics...)

>        2) creating an extension module --- loading bit-code files and 
> dynamically executing (even on a different machine from the one that 
> initially created them) can be a powerful alternative for run-time 
> compilation and distribution of code.

Totally agreed on this point, the workflow for Cython could definitely
be smoother.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-13 Thread Travis Oliphant
>> 
>> (Mark F., how does the above match how you feel about this?)
> 
> I would like collaboration, but from a technical perspective I think
> this would be much more involved than just dumping the AST to an IR
> and generating some code from there. For vector expressions I think
> sharing code would be more feasible than arbitrary (parallel) loops,
> etc. Cython as a compiler can make many decisions that a Python
> (bytecode) compiler can't make (at least without annotations and a
> well-defined subset of the language (not so much the syntax as the
> semantics)). I think in numba, if parallelism is to be supported, you
> will want a prange-like construct, as proving independence between
> iterations can be very hard to near impossible for a compiler.

I completely agree that you have to define some kind of syntax to get 
parallelism.  But, a prange construct would not be out of the question, of 
course.  

> 
> As for code generation, I'm not sure how llvm would do things like
> slicing arrays, reshaping, resizing etc (for vector expressions you
> can first evaluate all slicing and indexing operations and then
> compile the remaining vector expression), but for loops and array
> reassignment within loops this would have to invoke the actual slicing
> code from the llvm code (I presume).

There could be some analysis on the byte-code, prior to emitting the llvm code 
in order to handle lots of things.   Basically, you have to "play" the 
byte-code on a simple machine anyway in order to emit the correct code.   The 
big thing about Cython is you have to typedef too many things that are really 
quite knowable from the code.   If Cython could improve it's type inference, 
then it would be a more suitable target. 

> There are many other things, like
> bounds checking, wraparound, etc, that are all supported in both numpy
> and Cython, but going through an llvm layer would as far as I can see,
> require re-implementing those, at least if you want top-notch
> performance. Personally, I think for non-trivial performance-critical
> code (for loops with indexing, slicing, function calls, etc) Cython is
> a better target.

With libclang it is really quite possible to imagine a cython -> C target that 
itself compiles to llvm so that you can do everything at that intermediate 
layer.   However,  LLVM is a much better layer for optimization than C now that 
there are a lot of people collaborating on that layer.   I think it would be 
great if Cython targeted LLVM actually instead of C.  

> 
> Finally, as for non-vector-expression code, I really believe Cython is
> a better target. cython.inline can have high overhead (at least the
> first time it has to compile), but with better (numpy-aware) type
> inference or profile guided optimizations (see recent threads on the
> cython-dev mailing list), in addition to things like prange, I
> personally believe Cython targets most of the use cases where numba
> would be able to generate performing code.

Cython and Numba certainly overlap.  However, Cython requires:

1) learning another language
2) creating an extension module --- loading bit-code files and 
dynamically executing (even on a different machine from the one that initially 
created them) can be a powerful alternative for run-time compilation and 
distribution of code. 

These aren't show-stoppers obviously.   But, I think some users would prefer an 
even simpler approach to getting fast-code than Cython (which currently doesn't 
do enought type-inference and requires building a dlopen extension module). 

-Travis




> 
>> Dag
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
>> 
>> 
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-13 Thread mark florisson
On 13 March 2012 09:19, Travis Oliphant  wrote:
>
> On Mar 13, 2012, at 12:58 AM, Dag Sverre Seljebotn wrote:
>
> On 03/10/2012 10:35 PM, Travis Oliphant wrote:
>
> Hey all,
>
>
> I gave a lightning talk this morning on numba which is the start of a
>
> Python compiler to machine code through the LLVM tool-chain. It is proof
>
> of concept stage only at this point (use it only if you are interested
>
> in helping develop the code at this point). The only thing that works is
>
> a fast-vectorize capability on a few functions (without for-loops). But,
>
> it shows how creating functions in Python that can be used by the NumPy
>
> runtime in various ways. Several NEPS that will be discussed in the
>
> coming months will use this concept.
>
>
> Right now there is very little design documentation, but I will be
>
> adding some in the days ahead, especially if I get people who are
>
> interested in collaborating on the project. I did talk to Fijal and Alex
>
> of the PyPy project at PyCon and they both graciously suggested that I
>
> look at some of the PyPy code which walks the byte-code and does
>
> translation to their intermediate representation for inspiration.
>
>
> Again, the code is not ready for use, it is only proof of concept, but I
>
> would like to get feedback and help especially from people who might
>
> have written compilers before. The code lives at:
>
> https://github.com/ContinuumIO/numba
>
>
> Hi Travis,
>
> me and Mark F. has been talking today about whether some of numba and
> Cython development could overlap -- not right away, but in the sense
> that if Cython gets some features for optimization of numerical code,
> then make it easy for numba to reuse that functionality.
>
>
> That would be very, very interesting.
>
>
> This may be sort of off-topic re: the above-- but part of the goal of
> this post is to figure out numba's intended scope. If there isn't an
> overlap, that's good to know in itself.
>
> Question 1: Did you look at Clyther and/or Copperhead? Though similar,
> they target GPUs...but at first glance they look as though they may be
> parsing Python bytecode to get their ASTs... (didn't check though)
>
>
> I have looked at both projects although Clyther more in depth.    Clyther is
> parsing bytecode to get the AST (through a sub-project by the same author
> called Meta:  http://srossross.github.com/Meta/html/index.html).
>
>
> Question 2: What kind of performance are you targeting -- in the short
> term, and in the long term? Is competing with "Fortran-level"
> performance a goal at all?
>
>
> In the short-term, I'm targeting C-equivalent performance (like weave).   In
> the long-term, I'm targeting optimized high-level expressions (i.e.
> Fortran-level) with GPU and mulit-core.
>
>
> E.g., for ufunc computations with different iteration orders such
> as "a + b.T" (a and b in C-order), one must do blocking to get good
> performance. And when dealing with strided arrays, copying small chunks
> at the time will sometimes help performance (and sometimes not).
>
> This is optimization strategies which (as I understand it) is quite
> beyond what NumPy iterators etc. can provide.
>

As for blocking, this could be done by the numpy iterators themselves,
by simply introducing more dimensions with appropriate shape and
strides (not saying that's a solution :).

>
>
> And the LLVM level could
> be too low -- one has quite a lot of information when generating the
> ufunc/reduction/etc. that would be thrown away when generating LLVM
> code.
>
>
> It doesn't need to be thrown away at all.   It could be used to generate
> appropriate code for the arrays being used.   The long-term idea is to
> actually be aware of NumPy arrays and encourage expression of high-level
> constructs which generate optimized code using chunking, blocking, AVX
> instructions, multiple threads, etc.
>
> To do this, it may make more sense to actually emit OpenMP (unless LLVM
> grows standard threading intrinsics).   This is not out of the question.

That would be interesting, my experience with OpenMP is that the
standard doesn't define (ironically enough) the use of OpenMP in the
context of threading, and indeed, trying to use OpenMP outside of the
main thread simply segfaults your program. If llvm would get such
features, one must be prepared to make the OpenMP runtime thread-safe
as well (hopefully it will be in the first place, like I believe
Intel's implementation).

> Vectorizing compilers do their best to reconstruct this
> information; I know nothing about what actually exists here for
> LLVM. They are certainly a lot more complicated to implement and work
> with than making use of on higher-level information available before
> code generation.
>
> The idea we've been playing with is for Cython to define a limited
> subset of its syntax tree (essentially the "GIL-less" subset) seperate
> from the rest of Cython, with a more well-defined API for optimization
> passes etc., and targeted for a numerical optim

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-13 Thread Travis Oliphant

On Mar 13, 2012, at 12:58 AM, Dag Sverre Seljebotn wrote:

> On 03/10/2012 10:35 PM, Travis Oliphant wrote:
>> Hey all,
>> 
>> I gave a lightning talk this morning on numba which is the start of a
>> Python compiler to machine code through the LLVM tool-chain. It is proof
>> of concept stage only at this point (use it only if you are interested
>> in helping develop the code at this point). The only thing that works is
>> a fast-vectorize capability on a few functions (without for-loops). But,
>> it shows how creating functions in Python that can be used by the NumPy
>> runtime in various ways. Several NEPS that will be discussed in the
>> coming months will use this concept.
>> 
>> Right now there is very little design documentation, but I will be
>> adding some in the days ahead, especially if I get people who are
>> interested in collaborating on the project. I did talk to Fijal and Alex
>> of the PyPy project at PyCon and they both graciously suggested that I
>> look at some of the PyPy code which walks the byte-code and does
>> translation to their intermediate representation for inspiration.
>> 
>> Again, the code is not ready for use, it is only proof of concept, but I
>> would like to get feedback and help especially from people who might
>> have written compilers before. The code lives at:
>> https://github.com/ContinuumIO/numba
> 
> Hi Travis,
> 
> me and Mark F. has been talking today about whether some of numba and
> Cython development could overlap -- not right away, but in the sense
> that if Cython gets some features for optimization of numerical code,
> then make it easy for numba to reuse that functionality.

That would be very, very interesting.  

> 
> This may be sort of off-topic re: the above-- but part of the goal of 
> this post is to figure out numba's intended scope. If there isn't an 
> overlap, that's good to know in itself.
> 
> Question 1: Did you look at Clyther and/or Copperhead? Though similar, 
> they target GPUs...but at first glance they look as though they may be 
> parsing Python bytecode to get their ASTs... (didn't check though)

I have looked at both projects although Clyther more in depth.Clyther is 
parsing bytecode to get the AST (through a sub-project by the same author 
called Meta:  http://srossross.github.com/Meta/html/index.html). 

> 
> Question 2: What kind of performance are you targeting -- in the short
> term, and in the long term? Is competing with "Fortran-level"
> performance a goal at all?

In the short-term, I'm targeting C-equivalent performance (like weave).   In 
the long-term, I'm targeting optimized high-level expressions (i.e. 
Fortran-level) with GPU and mulit-core. 

> 
> E.g., for ufunc computations with different iteration orders such
> as "a + b.T" (a and b in C-order), one must do blocking to get good
> performance. And when dealing with strided arrays, copying small chunks 
> at the time will sometimes help performance (and sometimes not).
> 
> This is optimization strategies which (as I understand it) is quite
> beyond what NumPy iterators etc. can provide.



> And the LLVM level could
> be too low -- one has quite a lot of information when generating the
> ufunc/reduction/etc. that would be thrown away when generating LLVM
> code.

It doesn't need to be thrown away at all.   It could be used to generate 
appropriate code for the arrays being used.   The long-term idea is to actually 
be aware of NumPy arrays and encourage expression of high-level constructs 
which generate optimized code using chunking, blocking, AVX instructions, 
multiple threads, etc.  

To do this, it may make more sense to actually emit OpenMP (unless LLVM grows 
standard threading intrinsics).   This is not out of the question. 

> Vectorizing compilers do their best to reconstruct this
> information; I know nothing about what actually exists here for
> LLVM. They are certainly a lot more complicated to implement and work
> with than making use of on higher-level information available before
> code generation.
> 
> The idea we've been playing with is for Cython to define a limited
> subset of its syntax tree (essentially the "GIL-less" subset) seperate
> from the rest of Cython, with a more well-defined API for optimization
> passes etc., and targeted for a numerical optimization pipeline.
> 
> This subset would actually be pretty close to what numba needs to
> compile, even if the overlap isn't perfect. So such a pipeline could
> possibly be shared between Cython and numba, even if Cython would use
> it at compile-time and numba at runtime, and even if the code
> generation backend is different (the code generation backend is
> probably not the hard part...). To be concrete, the idea is:

> 
> (Cython|numba) -> high-level numerical compiler and
> loop-structure/blocking optimizer (by us on a shared parse tree
> representation) -> (LLVM/C/OpenCL) -> low-level optimization (by the
> respective compilers)
> 
> Some algorithms that could be shareable are ite

Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-12 Thread Dag Sverre Seljebotn
On 03/10/2012 10:35 PM, Travis Oliphant wrote:
> Hey all,
>
> I gave a lightning talk this morning on numba which is the start of a
> Python compiler to machine code through the LLVM tool-chain. It is proof
> of concept stage only at this point (use it only if you are interested
> in helping develop the code at this point). The only thing that works is
> a fast-vectorize capability on a few functions (without for-loops). But,
> it shows how creating functions in Python that can be used by the NumPy
> runtime in various ways. Several NEPS that will be discussed in the
> coming months will use this concept.
>
> Right now there is very little design documentation, but I will be
> adding some in the days ahead, especially if I get people who are
> interested in collaborating on the project. I did talk to Fijal and Alex
> of the PyPy project at PyCon and they both graciously suggested that I
> look at some of the PyPy code which walks the byte-code and does
> translation to their intermediate representation for inspiration.
>
> Again, the code is not ready for use, it is only proof of concept, but I
> would like to get feedback and help especially from people who might
> have written compilers before. The code lives at:
> https://github.com/ContinuumIO/numba

Hi Travis,

me and Mark F. has been talking today about whether some of numba and
Cython development could overlap -- not right away, but in the sense
that if Cython gets some features for optimization of numerical code,
then make it easy for numba to reuse that functionality.

This may be sort of off-topic re: the above-- but part of the goal of 
this post is to figure out numba's intended scope. If there isn't an 
overlap, that's good to know in itself.

Question 1: Did you look at Clyther and/or Copperhead? Though similar, 
they target GPUs...but at first glance they look as though they may be 
parsing Python bytecode to get their ASTs... (didn't check though)

Question 2: What kind of performance are you targeting -- in the short
term, and in the long term? Is competing with "Fortran-level"
performance a goal at all?

E.g., for ufunc computations with different iteration orders such
as "a + b.T" (a and b in C-order), one must do blocking to get good
performance. And when dealing with strided arrays, copying small chunks 
at the time will sometimes help performance (and sometimes not).

This is optimization strategies which (as I understand it) is quite
beyond what NumPy iterators etc. can provide. And the LLVM level could
be too low -- one has quite a lot of information when generating the
ufunc/reduction/etc. that would be thrown away when generating LLVM
code. Vectorizing compilers do their best to reconstruct this
information; I know nothing about what actually exists here for
LLVM. They are certainly a lot more complicated to implement and work
with than making use of on higher-level information available before
code generation.

The idea we've been playing with is for Cython to define a limited
subset of its syntax tree (essentially the "GIL-less" subset) seperate
from the rest of Cython, with a more well-defined API for optimization
passes etc., and targeted for a numerical optimization pipeline.

This subset would actually be pretty close to what numba needs to
compile, even if the overlap isn't perfect. So such a pipeline could
possibly be shared between Cython and numba, even if Cython would use
it at compile-time and numba at runtime, and even if the code
generation backend is different (the code generation backend is
probably not the hard part...). To be concrete, the idea is:

(Cython|numba) -> high-level numerical compiler and
loop-structure/blocking optimizer (by us on a shared parse tree
representation) -> (LLVM/C/OpenCL) -> low-level optimization (by the
respective compilers)

Some algorithms that could be shareable are iteration strategies
(already in NumPy though), blocking strategies, etc.

Even if this may be beyond numba's (and perhaps Cython's) current
ambition, it may be worth thinking about, if nothing else then just
for how Cython's code should be structured.

(Mark F., how does the above match how you feel about this?)

Dag
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-12 Thread Pierre Haessig
Hi,
Le 12/03/2012 00:21, Sturla Molden a écrit :
>
> It could also put Python/Numba high up on the Debian shootout ;-)
Can you tell a bit more about it ? (I just didn't understand the whole
sentence in fact ;-) )

Thanks !
-- 
Pierre



signature.asc
Description: OpenPGP digital signature
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-12 Thread Olivier Delalleau
One major difference is that Theano doesn't attempt to parse existing
Python (byte)code: you need to explicitly code with the Theano syntax
(which tries to be close to Numpy, but can end up looking quite different,
especially if you want to control the program flow with loops and ifs for
instance).

A potentially interesting avenue would be to parse Python (byte)code to
generate a Theano graph. It'd be nice if numba could output some
intermediate information that would represent the computational graph being
compiled, so that Theano could re-use it directly :) (probably much easier
said than done though)

-=- Olivier

Le 12 mars 2012 12:57, Till Stensitzki  a écrit :

> Doesent Theano does the same, only via GCC compilation?
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-12 Thread Till Stensitzki
Doesent Theano does the same, only via GCC compilation?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread Sturla Molden

Den 11.03.2012 23:11, skrev Travis Oliphant:


* Numba will be much closer to Cython in spirit than Unladen Swallow 
(or PyPy) --- people who just use Cython for a loop or two will be 
able to use Numba instead


This is perhaps the most important issue for scientific and algorithmic 
codes.


Not having to resort to Cython, C, C++ or Fortran to get decent 
performance for algorithmic code would be great.


It could also put Python/Numba high up on the Debian shootout ;-)

Or one could benchmark a pure Python version of timsort against the 
standard library.


Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread Sturla Molden
Den 11.03.2012 15:52, skrev Pauli Virtanen:
> To get speed gains, you need to optimize not only the bytecode 
> interpreter side, but also the object space --- Python classes, 
> strings and all that. Keeping in mind Python's dynamism, there are 
> potential side effects everywhere. I guess this is what sunk the swallow. 

I'm not sure what scared off or killed the bird. Maybe they just 
approached it from the wrong side.

Psyco has proven that some "algorithmic" Python code can be accelerated 
by an order of magnitude or two. But Psyco did not use an optimizing 
compiler like LLVM, it just generated unoptimized x86 binary code from 
Python.

Also, a JIT for a dynamic language is possible. One example is the 
Strongtalk JIT compiler for Smalltalk. It was purchased by Sun and 
renamed "Java VM Hotspot", because Sun could not produce a JIT for its 
static language Java. The static nature of Java does not have anything 
to do with the performance of its JIT compiler.

Maybe there are differences between Python and Smalltalk that makes the 
latter more easy to JIT compile? Or perhaps there are differences 
between Hotspot/Strongtalk and LLVM that makes the latter less efficient 
for dynamic languages? I don't know. But making a performant JIT for all 
of Python is obviously not easy.

A third example of a fast JIT for a dynamic language is LuaJIT. It can 
often make "interpreted" and duck-typed Lua run faster than statically 
compiled C. Like psyco, LuaJIT just focuses on making algorithmic code 
with a few elementary types run fast.

Another approach to speeding up dynamic languages is optional static 
typing and static compilation. Cython, Bigloo (et al.), and CMUCL/SBCL 
are exellent examples of that. Maybe one could use type annotations like 
Cython in pure Python mode? Python 3 has even got type annotations in 
the syntax.

I think (but I am not 100% sure) that the main problem with JIT'ing 
Python is its dynamic attributes. So maybe some magic with __slots__ or 
__metaclass__ could be used to turn that dynamicity off? So a JIT like 
Numba would only work with builtin types (int, float, list, dict, set, 
tuple), NumPy arrays, and some less-dynamic classes. It would not speed 
up all of Python, but a sufficient subset to make scientists happy.

Sturla
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread Travis Oliphant

On Mar 11, 2012, at 1:59 AM, Mic wrote:

> what is the difference to http://www.python.org/dev/peps/pep-3146/ ?
> 

To fully expound the difference would take a lot of discussion.   But, 
summarizing: 

* Numba is not nearly as ambitious as US (Numba will be fine with some 
user-directed information and with a subset of the language that gets compiled)
* Numba will focus on compilation rather than JITing --- in other words 
it won't be trying to detect hot-spots and compile segments (actually LLVM is 
not a good candidate for that sort of thing as it is not a very fast or 
memory-efficient compiler for a lot of reasons,  I believe this is why PyPy 
does not use it anymore, for example). 
* Numba will be much closer to Cython in spirit than Unladen Swallow 
(or PyPy) --- people who just use Cython for a loop or two will be able to use 
Numba instead
* Numba will borrow the idea of call-sites from IronPython wherein a 
function is replaced by a generic function that dispatches based on input types 
to either cached compiled code for the types specified or the generic function.
* Numba will be mainly about trying to leverage the code-generation of 
LLVM which multiple hardware manufacturers are using (Intel's OpenCL support, 
Nvidia's PTX backend, Apple's CLang, etc.) for NumPy arrays


-Travis





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread Sturla Molden


Den 11. mars 2012 kl. 15:52 skrev Pauli Virtanen :

> To get speed gains, you need to optimize not only the bytecode
> interpreter side, but also the object space --- Python classes, strings
> and all that. Keeping in mind Python's dynamism, there are potential
> side effects everywhere. I guess this is what sunk the swallow.
> 
> Just speeding up effectively statically typed code dealing with arrays
> and basic types, on the other hand, sounds much easier.

Yes, it would be like psyco, except it has LLVM as compiler and knows about 
NumPy.

Sturla




___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread xavier.gn...@gmail.com
On 03/11/2012 03:52 PM, Pauli Virtanen wrote:
> 11.03.2012 15:12, xavier.gn...@gmail.com kirjoitti:
> [clip]
>> If this description is correct, Numba is an additional pass once the
>> cpython  bytecode has be produced by cpython.
>> Is it correct??
>> Is python bytecote a good intermediate representation to perform numpy
>> related optimization?
>>
>> One current big issue with numpy is that C=A+B+D produces temporaries.
>> numexpr addresses this issue and it would be great to get the same
>> result by default in numpy.
>> numexpr also optimizes polynomials using Horner's method. It is hard to
>> do that at bytecode level, isn't it?
> My impression is that dealing with Python's bytecode is not necessarily
> significantly harder than dealing with the AST.
>
> Your example reads
>
>1   0 LOAD_NAME0 (A)
>3 LOAD_NAME1 (B)
>6 BINARY_ADD
>7 LOAD_NAME2 (D)
>   10 BINARY_ADD
>   11 STORE_NAME   3 (C)
>
> For instance, interpreting the bytecode (e.g. loop body) once with dummy
> objects lets you know what the final compound expression is.

ok.
>
>> Unladen swallow wanted to replace the full cpython implementation by a
>> jit compiler built using LLVM... but unladen swallow is dead.
> To get speed gains, you need to optimize not only the bytecode
> interpreter side, but also the object space --- Python classes, strings
> and all that. Keeping in mind Python's dynamism, there are potential
> side effects everywhere. I guess this is what sunk the swallow.
Correct : impact of dynamism underestimated and unexpected bugs in LLVM 
--> dead swallow.

> Just speeding up effectively statically typed code dealing with arrays
> and basic types, on the other hand, sounds much easier.
it is :)

> The PyPy guys have a much more ambitious approach, and are getting nice
> results. Also with arrays --- as I understand, the fact that they want
> to be able to do this sort of optimization is the main reason why they
> want to reimplement the core parts of Numpy in RPython.
It sonds ok for numpy but it is no sense because numpy is only the 
beginning of the story.
Numpy is not *that* useful without scipy, matplotlib and so on.

> The second issue is that unfortunately their emulation of CPython's
> C-API is at the moment seems to have quite large overheads. Porting
> Numpy on top of that is possible --- I managed to get basic things
> (apart from string/unicode arrays) to work, but things took very large
> speed hits (of the order ~ 100x for things like `arange(1).sum()`).
> This pushes the speed advantage of Numpy to a bit too large array sizes.
> The reason is probably that Numpy uses PyObjects internally heavily,
> which accumulates the cost of passing objects through the emulation layer.
Yes. We have two types of users :
1) Guys using python without numpy. These guys want the python core 
language to go faster (with the full complexity of a dynamic language).
2) science users. These guys are fine with python speed because their 
codes spend 99% the time in 
numpy/scipy/any_other_science_module_using_c-api. Who cares if it takes 
1.1s (in python) instead of 1.0 (in C) to read the data input file if it 
takes hours (or even minutes) to process them (using the C-api and doing 
the basics with temporaries and not optimized loops)?
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread Pauli Virtanen
11.03.2012 15:12, xavier.gn...@gmail.com kirjoitti:
[clip]
> If this description is correct, Numba is an additional pass once the 
> cpython  bytecode has be produced by cpython.
> Is it correct??
> Is python bytecote a good intermediate representation to perform numpy 
> related optimization?
> 
> One current big issue with numpy is that C=A+B+D produces temporaries.
> numexpr addresses this issue and it would be great to get the same 
> result by default in numpy.
> numexpr also optimizes polynomials using Horner's method. It is hard to 
> do that at bytecode level, isn't it?

My impression is that dealing with Python's bytecode is not necessarily
significantly harder than dealing with the AST.

Your example reads

  1   0 LOAD_NAME0 (A)
  3 LOAD_NAME1 (B)
  6 BINARY_ADD
  7 LOAD_NAME2 (D)
 10 BINARY_ADD
 11 STORE_NAME   3 (C)

For instance, interpreting the bytecode (e.g. loop body) once with dummy
objects lets you know what the final compound expression is.

> Unladen swallow wanted to replace the full cpython implementation by a 
> jit compiler built using LLVM... but unladen swallow is dead.

To get speed gains, you need to optimize not only the bytecode
interpreter side, but also the object space --- Python classes, strings
and all that. Keeping in mind Python's dynamism, there are potential
side effects everywhere. I guess this is what sunk the swallow.

Just speeding up effectively statically typed code dealing with arrays
and basic types, on the other hand, sounds much easier.

The PyPy guys have a much more ambitious approach, and are getting nice
results. Also with arrays --- as I understand, the fact that they want
to be able to do this sort of optimization is the main reason why they
want to reimplement the core parts of Numpy in RPython.

The second issue is that unfortunately their emulation of CPython's
C-API is at the moment seems to have quite large overheads. Porting
Numpy on top of that is possible --- I managed to get basic things
(apart from string/unicode arrays) to work, but things took very large
speed hits (of the order ~ 100x for things like `arange(1).sum()`).
This pushes the speed advantage of Numpy to a bit too large array sizes.
The reason is probably that Numpy uses PyObjects internally heavily,
which accumulates the cost of passing objects through the emulation layer.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-11 Thread xavier.gn...@gmail.com
It says : " Numba is a NumPy aware optimizing compiler for Python. It 
uses the remarkable LLVM compiler infrastructure to compile Python 
byte-code to machine code especially for use in the NumPy run-time and 
SciPy modules."*
*
If this description is correct, Numba is an additional pass once the 
cpython  bytecode has be produced by cpython.
Is it correct??
Is python bytecote a good intermediate representation to perform numpy 
related optimization?

One current big issue with numpy is that C=A+B+D produces temporaries.
numexpr addresses this issue and it would be great to get the same 
result by default in numpy.
numexpr also optimizes polynomials using Horner's method. It is hard to 
do that at bytecode level, isn't it?

Unladen swallow wanted to replace the full cpython implementation by a 
jit compiler built using LLVM...but unladen swallow is dead.

Xavier


> what is the difference to http://www.python.org/dev/peps/pep-3146/ ?
>
> On Sun, Mar 11, 2012 at 4:35 PM, Travis Oliphant  > wrote:
>
> Hey all,
>
> I gave a lightning talk this morning on numba which is the start
> of a Python compiler to machine code through the LLVM tool-chain.
>   It is proof of concept stage only at this point (use it only if
> you are interested in helping develop the code at this point).  
> The only thing that works is a fast-vectorize capability on a few
> functions (without for-loops).   But, it shows how creating
> functions in Python that can be used by the NumPy runtime in
> various ways.   Several NEPS that will be discussed in the coming
> months will use this concept.
>
> Right now there is very little design documentation, but I will be
> adding some in the days ahead, especially if I get people who are
> interested in collaborating on the project.   I did talk to Fijal
> and Alex of the PyPy project at PyCon and they both graciously
> suggested that I look at some of the PyPy code which walks the
> byte-code and does translation to their intermediate
> representation for inspiration.
>
> Again, the code is not ready for use, it is only proof of concept,
> but I would like to get feedback and help especially from people
> who might have written compilers before.  The code lives at:
> https://github.com/ContinuumIO/numba
>
> -Travis
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org 
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Looking for people interested in helping with Python compiler to LLVM

2012-03-10 Thread Mic
what is the difference to http://www.python.org/dev/peps/pep-3146/ ?

On Sun, Mar 11, 2012 at 4:35 PM, Travis Oliphant wrote:

> Hey all,
>
> I gave a lightning talk this morning on numba which is the start of a
> Python compiler to machine code through the LLVM tool-chain.   It is proof
> of concept stage only at this point (use it only if you are interested in
> helping develop the code at this point).   The only thing that works is a
> fast-vectorize capability on a few functions (without for-loops).   But, it
> shows how creating functions in Python that can be used by the NumPy
> runtime in various ways.   Several NEPS that will be discussed in the
> coming months will use this concept.
>
> Right now there is very little design documentation, but I will be adding
> some in the days ahead, especially if I get people who are interested in
> collaborating on the project.   I did talk to Fijal and Alex of the PyPy
> project at PyCon and they both graciously suggested that I look at some of
> the PyPy code which walks the byte-code and does translation to their
> intermediate representation for inspiration.
>
> Again, the code is not ready for use, it is only proof of concept, but I
> would like to get feedback and help especially from people who might have
> written compilers before.  The code lives at:
> https://github.com/ContinuumIO/numba
>
> -Travis
>
>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion