Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-27 Thread mark florisson
On 27 June 2013 01:48, Frédéric Bastien no...@nouiz.org wrote:



 On Wed, Jun 26, 2013 at 7:30 AM, mark florisson markflorisso...@gmail.com
 wrote:

 On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no
 wrote:
  On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
 
  Hi,
 
  I wasn't able to attend this year Scipy Conference. My tutorial
  proposal
  was rejected and other deadline intefered with this conference date.
 
  Will the presentation be recorded? If not, can you make the slide
  available?
 
  What is your opinion on this question:
 
  - Should other lib like NumPy/Theano/Cython/Numba base their elemwise
  implemention (or part of it) on dynd or minivect? I know cython and
  Numba do it, but it was before dynd and I don't know where dynd fit in
  the big picture. Do dynd reuse minivect itself?
 
 
  Actually, I think the Cython branch with minivect support was in the end
  not
  merged, due to lack of interest/manpower to maintain support for
  vectorization in the long term (so it was better to not add the feature
  than
  have a badly supported feature).
 
  My understanding is that Numba is based on minivect and not on dynd, so
  it's
  more of a competitor.
 
  Perhaps Mark Florisson will be able to comment.
 
  Dag Sverre

 Hey Dag,

 Indeed, numba uses it for its array expression support, but it will
 likely remove the minivect dependency and generate a simple loop nest
 for now. I'm working on pykit now
 (https://github.com/ContinuumIO/pykit) which similarly to minivect
 defines its own intermediate representation, with array expressions in
 the form of map/reduce/scan/etc functions. The project has a broader
 scope than minivect, to be used by projects like numba, what but a
 minivect baked in.

 As such, minivect isn't really maintained any longer, and I wouldn't
 recommend anyone using the code at this point (just maybe some of the
 ideas :)).


 Hi,

 thanks for the information. I checked the repo rapidly and didn't found
 information on how to use it the way I expect to use it. I would like to be
 able to take a small Theano graph (like just elemwise operation) and make a
 graph in it to have it generate the c code. Do you have some tests/tests/doc
 that demonstrate something in that direction?

 Ideally I would like to be able to implement something like this simple
 example:

 (x ** 2).sum(1) or (x ** 2).sum()

 Is pykit or Numba IR ready for that?

 thanks

 Frédéric

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


Hey Fred,

It's in no way ready for public use, it doesn't really actually do
anything yet :) Numba doesn't really optimize reductions yet, so I
don't think it addresses any of your needs - but the input would be a
Python function (compiled from generated source code, or an AST).

I don't know how much further pykit would go beyond simple fusion and
perhaps tiling - I imagine it will defer to libraries like dynd to
perform actual work. This is offtopic for numpy itself, but it may be
useful to Theano in the future, I'll be sure to keep you in the loop
and bounce ideas of for feedback and collaboration.

Cheers,

Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-26 Thread Dag Sverre Seljebotn
On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
 Hi,

 I wasn't able to attend this year Scipy Conference. My tutorial proposal
 was rejected and other deadline intefered with this conference date.

 Will the presentation be recorded? If not, can you make the slide available?

 What is your opinion on this question:

 - Should other lib like NumPy/Theano/Cython/Numba base their elemwise
 implemention (or part of it) on dynd or minivect? I know cython and
 Numba do it, but it was before dynd and I don't know where dynd fit in
 the big picture. Do dynd reuse minivect itself?

Actually, I think the Cython branch with minivect support was in the end 
not merged, due to lack of interest/manpower to maintain support for 
vectorization in the long term (so it was better to not add the feature 
than have a badly supported feature).

My understanding is that Numba is based on minivect and not on dynd, so 
it's more of a competitor.

Perhaps Mark Florisson will be able to comment.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-26 Thread mark florisson
On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no wrote:
 On 06/25/2013 04:21 PM, Frédéric Bastien wrote:

 Hi,

 I wasn't able to attend this year Scipy Conference. My tutorial proposal
 was rejected and other deadline intefered with this conference date.

 Will the presentation be recorded? If not, can you make the slide
 available?

 What is your opinion on this question:

 - Should other lib like NumPy/Theano/Cython/Numba base their elemwise
 implemention (or part of it) on dynd or minivect? I know cython and
 Numba do it, but it was before dynd and I don't know where dynd fit in
 the big picture. Do dynd reuse minivect itself?


 Actually, I think the Cython branch with minivect support was in the end not
 merged, due to lack of interest/manpower to maintain support for
 vectorization in the long term (so it was better to not add the feature than
 have a badly supported feature).

 My understanding is that Numba is based on minivect and not on dynd, so it's
 more of a competitor.

 Perhaps Mark Florisson will be able to comment.

 Dag Sverre

Hey Dag,

Indeed, numba uses it for its array expression support, but it will
likely remove the minivect dependency and generate a simple loop nest
for now. I'm working on pykit now
(https://github.com/ContinuumIO/pykit) which similarly to minivect
defines its own intermediate representation, with array expressions in
the form of map/reduce/scan/etc functions. The project has a broader
scope than minivect, to be used by projects like numba, what but a
minivect baked in.

As such, minivect isn't really maintained any longer, and I wouldn't
recommend anyone using the code at this point (just maybe some of the
ideas :)).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-26 Thread mark florisson
On 19 June 2013 01:43, Frédéric Bastien no...@nouiz.org wrote:
 Hi,


 On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:

 On 17.06.2013 17:11, Frédéric Bastien wrote:
  Hi,
 
  I saw that recently Julian Taylor is doing many low level optimization
  like using SSE instruction. I think it is great.
 
  Last year, Mark Florisson released the minivect[1] project that he
  worked on during is master thesis. minivect is a compiler for
  element-wise expression that do some of the same low level optimization
  that Julian is doing in NumPy right now.
 
  Mark did minivect in a way that allow it to be reused by other project.
  It is used now by Cython and Numba I think. I had plan to reuse it in
  Theano, but I didn't got the time to integrate it up to now.
 
  What about reusing it in NumPy? I think that some of Julian optimization
  aren't in minivect (I didn't check to confirm). But from I heard,
  minivect don't implement reduction and there is a pull request to
  optimize this in NumPy.

 Hi,
 what I vectorized is just the really easy cases of unit stride
 continuous operations, so the min/max reductions which is now in numpy
 is in essence pretty trivial.
 minivect goes much further in optimizing general strided access and
 broadcasting via loop optimizations (it seems to have a lot of overlap
 with the graphite loop optimizer available in GCC [0]) so my code is
 probably not of very much use to minivect.

 The most interesting part in minivect for numpy is probably the
 optimization of broadcasting loops which seem to be pretty inefficient
 in numpy [0].

 Concerning the rest I'm not sure how much of a bottleneck general
 strided operations really are in common numpy using code.


 I guess a similar discussion about adding an expression compiler to
 numpy has already happened when numexpr was released?
 If yes what was the outcome of that?


 I don't recall a discussion when numexpr was done as this is before I read
 this list. numexpr do optimization that can't be done by NumPy: fusing
 element-wise operation in one call. So I don't see how it could be done to
 reuse it in NumPy.

 You call your optimization trivial, but I don't. In the git log of NumPy,
 the first commit is in 2001. It is the first time someone do this in 12
 years! Also, this give 1.5-8x speed up (from memory from your PR
 description). This is not negligible. But how much time did you spend on
 them? Also, some of them are processor dependent, how many people in this
 list already have done this? I suppose not many.

 Yes, your optimization don't cover all cases that minivect do. I see 2 level
 of optimization. 1) The inner loop/contiguous cases, 2) the strided,
 broadcasted level. We don't need all optimization being done for them to be
 useful. Any of them are useful.

 So what I think is that we could reuse/share that work. NumPy have c code
 generator. They could call minivect code generator for some of them when
 compiling NumPy. This will make optimization done to those code generator
 reused by more people. For example, when new processor are launched, we will
 need only 1 place to change for many projects. Or for example, it the call
 to MKL vector library is done there, more people will benefit from it. Right
 now, only numexpr do it.

 About the level 2 optimization (strides, broadcast), I never read NumPy code
 that deal with that. Do someone that know it have an idea if it would be
 possible to reuse minivect for this?

I wouldn't attempt to, it's not really maintained any longer, though
pykit will likely address what minivect did in the future (more in
following email). Many of the optimizations minivect will really only
shine in a runtime context where it can perform fusion, and where you
can hoist out repeated computation from inner loops. I like the code
reuse, especially between dynd/blaze/theano.

 Frédéric

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-26 Thread Frédéric Bastien
On Wed, Jun 26, 2013 at 7:30 AM, mark florisson
markflorisso...@gmail.comwrote:

 On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljeb...@astro.uio.no
 wrote:
  On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
 
  Hi,
 
  I wasn't able to attend this year Scipy Conference. My tutorial proposal
  was rejected and other deadline intefered with this conference date.
 
  Will the presentation be recorded? If not, can you make the slide
  available?
 
  What is your opinion on this question:
 
  - Should other lib like NumPy/Theano/Cython/Numba base their elemwise
  implemention (or part of it) on dynd or minivect? I know cython and
  Numba do it, but it was before dynd and I don't know where dynd fit in
  the big picture. Do dynd reuse minivect itself?
 
 
  Actually, I think the Cython branch with minivect support was in the end
 not
  merged, due to lack of interest/manpower to maintain support for
  vectorization in the long term (so it was better to not add the feature
 than
  have a badly supported feature).
 
  My understanding is that Numba is based on minivect and not on dynd, so
 it's
  more of a competitor.
 
  Perhaps Mark Florisson will be able to comment.
 
  Dag Sverre

 Hey Dag,

 Indeed, numba uses it for its array expression support, but it will
 likely remove the minivect dependency and generate a simple loop nest
 for now. I'm working on pykit now
 (https://github.com/ContinuumIO/pykit) which similarly to minivect
 defines its own intermediate representation, with array expressions in
 the form of map/reduce/scan/etc functions. The project has a broader
 scope than minivect, to be used by projects like numba, what but a
 minivect baked in.

 As such, minivect isn't really maintained any longer, and I wouldn't
 recommend anyone using the code at this point (just maybe some of the
 ideas :)).


Hi,

thanks for the information. I checked the repo rapidly and didn't found
information on how to use it the way I expect to use it. I would like to be
able to take a small Theano graph (like just elemwise operation) and make a
graph in it to have it generate the c code. Do you have some
tests/tests/doc that demonstrate something in that direction?

Ideally I would like to be able to implement something like this simple
example:

(x ** 2).sum(1) or (x ** 2).sum()

Is pykit or Numba IR ready for that?

thanks

Frédéric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-25 Thread Frédéric Bastien
Hi,

I wasn't able to attend this year Scipy Conference. My tutorial proposal
was rejected and other deadline intefered with this conference date.

Will the presentation be recorded? If not, can you make the slide available?

What is your opinion on this question:

- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba
do it, but it was before dynd and I don't know where dynd fit in the big
picture. Do dynd reuse minivect itself?

thanks

Frédéric


On Mon, Jun 24, 2013 at 11:46 AM, Mark Wiebe mwwi...@gmail.com wrote:

 On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris 
 charlesr.har...@gmail.com wrote:



 On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett 
 matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien no...@nouiz.org
 wrote:
  Hi,
 
 
  On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.06.2013 17:11, Frédéric Bastien wrote:
   Hi,
  
   I saw that recently Julian Taylor is doing many low level
 optimization
   like using SSE instruction. I think it is great.
  
   Last year, Mark Florisson released the minivect[1] project that he
   worked on during is master thesis. minivect is a compiler for
   element-wise expression that do some of the same low level
 optimization
   that Julian is doing in NumPy right now.
  
   Mark did minivect in a way that allow it to be reused by other
 project.
   It is used now by Cython and Numba I think. I had plan to reuse it
 in
   Theano, but I didn't got the time to integrate it up to now.
  
   What about reusing it in NumPy? I think that some of Julian
 optimization
   aren't in minivect (I didn't check to confirm). But from I heard,
   minivect don't implement reduction and there is a pull request to
   optimize this in NumPy.
 
  Hi,
  what I vectorized is just the really easy cases of unit stride
  continuous operations, so the min/max reductions which is now in numpy
  is in essence pretty trivial.
  minivect goes much further in optimizing general strided access and
  broadcasting via loop optimizations (it seems to have a lot of overlap
  with the graphite loop optimizer available in GCC [0]) so my code is
  probably not of very much use to minivect.
 
  The most interesting part in minivect for numpy is probably the
  optimization of broadcasting loops which seem to be pretty inefficient
  in numpy [0].
 
  Concerning the rest I'm not sure how much of a bottleneck general
  strided operations really are in common numpy using code.
 
 
  I guess a similar discussion about adding an expression compiler to
  numpy has already happened when numexpr was released?
  If yes what was the outcome of that?
 
 
  I don't recall a discussion when numexpr was done as this is before I
 read
  this list. numexpr do optimization that can't be done by NumPy: fusing
  element-wise operation in one call. So I don't see how it could be
 done to
  reuse it in NumPy.
 
  You call your optimization trivial, but I don't. In the git log of
 NumPy,
  the first commit is in 2001. It is the first time someone do this in 12
  years! Also, this give 1.5-8x speed up (from memory from your PR
  description). This is not negligible. But how much time did you spend
 on
  them? Also, some of them are processor dependent, how many people in
 this
  list already have done this? I suppose not many.
 
  Yes, your optimization don't cover all cases that minivect do. I see 2
 level
  of optimization. 1) The inner loop/contiguous cases, 2) the strided,
  broadcasted level. We don't need all optimization being done for them
 to be
  useful. Any of them are useful.
 
  So what I think is that we could reuse/share that work. NumPy have c
 code
  generator. They could call minivect code generator for some of them
 when
  compiling NumPy. This will make optimization done to those code
 generator
  reused by more people. For example, when new processor are launched,
 we will
  need only 1 place to change for many projects. Or for example, it the
 call
  to MKL vector library is done there, more people will benefit from it.
 Right
  now, only numexpr do it.
 
  About the level 2 optimization (strides, broadcast), I never read
 NumPy code
  that deal with that. Do someone that know it have an idea if it would
 be
  possible to reuse minivect for this?

 Would someone be able to guide some of the numpy C experts into a room
 to do some thinking / writing on this at the scipy conference?

 I completely agree that these kind of optimizations and code sharing
 seem likely to be very important for the future.

 I'm not at the conference, but if there's anything I can do to help,
 please someone let me know.


 Concerning the future development of numpy, I'd also suggest that we look
 at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like
 it is reaching a level of maturity where it is worth trying to plan out a
 long term path 

Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-24 Thread Mark Wiebe
On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien no...@nouiz.org
 wrote:
  Hi,
 
 
  On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.06.2013 17:11, Frédéric Bastien wrote:
   Hi,
  
   I saw that recently Julian Taylor is doing many low level
 optimization
   like using SSE instruction. I think it is great.
  
   Last year, Mark Florisson released the minivect[1] project that he
   worked on during is master thesis. minivect is a compiler for
   element-wise expression that do some of the same low level
 optimization
   that Julian is doing in NumPy right now.
  
   Mark did minivect in a way that allow it to be reused by other
 project.
   It is used now by Cython and Numba I think. I had plan to reuse it in
   Theano, but I didn't got the time to integrate it up to now.
  
   What about reusing it in NumPy? I think that some of Julian
 optimization
   aren't in minivect (I didn't check to confirm). But from I heard,
   minivect don't implement reduction and there is a pull request to
   optimize this in NumPy.
 
  Hi,
  what I vectorized is just the really easy cases of unit stride
  continuous operations, so the min/max reductions which is now in numpy
  is in essence pretty trivial.
  minivect goes much further in optimizing general strided access and
  broadcasting via loop optimizations (it seems to have a lot of overlap
  with the graphite loop optimizer available in GCC [0]) so my code is
  probably not of very much use to minivect.
 
  The most interesting part in minivect for numpy is probably the
  optimization of broadcasting loops which seem to be pretty inefficient
  in numpy [0].
 
  Concerning the rest I'm not sure how much of a bottleneck general
  strided operations really are in common numpy using code.
 
 
  I guess a similar discussion about adding an expression compiler to
  numpy has already happened when numexpr was released?
  If yes what was the outcome of that?
 
 
  I don't recall a discussion when numexpr was done as this is before I
 read
  this list. numexpr do optimization that can't be done by NumPy: fusing
  element-wise operation in one call. So I don't see how it could be done
 to
  reuse it in NumPy.
 
  You call your optimization trivial, but I don't. In the git log of
 NumPy,
  the first commit is in 2001. It is the first time someone do this in 12
  years! Also, this give 1.5-8x speed up (from memory from your PR
  description). This is not negligible. But how much time did you spend on
  them? Also, some of them are processor dependent, how many people in
 this
  list already have done this? I suppose not many.
 
  Yes, your optimization don't cover all cases that minivect do. I see 2
 level
  of optimization. 1) The inner loop/contiguous cases, 2) the strided,
  broadcasted level. We don't need all optimization being done for them
 to be
  useful. Any of them are useful.
 
  So what I think is that we could reuse/share that work. NumPy have c
 code
  generator. They could call minivect code generator for some of them when
  compiling NumPy. This will make optimization done to those code
 generator
  reused by more people. For example, when new processor are launched, we
 will
  need only 1 place to change for many projects. Or for example, it the
 call
  to MKL vector library is done there, more people will benefit from it.
 Right
  now, only numexpr do it.
 
  About the level 2 optimization (strides, broadcast), I never read NumPy
 code
  that deal with that. Do someone that know it have an idea if it would be
  possible to reuse minivect for this?

 Would someone be able to guide some of the numpy C experts into a room
 to do some thinking / writing on this at the scipy conference?

 I completely agree that these kind of optimizations and code sharing
 seem likely to be very important for the future.

 I'm not at the conference, but if there's anything I can do to help,
 please someone let me know.


 Concerning the future development of numpy, I'd also suggest that we look
 at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like
 it is reaching a level of maturity where it is worth trying to plan out a
 long term path to merger.


I'm in Austin for SciPy, and will giving a talk on the dynd library on
Thursday, please drop by if you can make it, I'm very interested in
cross-pollination of ideas between numpy, libdynd, blaze, and other array
programming projects. The Python exposure of dynd as it is now can
transform data to/from numpy via views very easily, where the data is
compatible, and I expect libdynd and numpy to live alongside each other for
quite some time. One possible way things could work is to think of libdynd
as a more rapidly changing playground for functionality that would be
nice to have in numpy, without the 

Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-20 Thread Frédéric Bastien
I didn't know about this project. It is interresting.

Is some of you discuss this at the scipy conference, it would be
appreciated if you write here a summary of that. I won't be there this year.

thanks

Frédéric


On Wed, Jun 19, 2013 at 8:48 AM, Charles R Harris charlesr.har...@gmail.com
 wrote:



 On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien no...@nouiz.org
 wrote:
  Hi,
 
 
  On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.06.2013 17:11, Frédéric Bastien wrote:
   Hi,
  
   I saw that recently Julian Taylor is doing many low level
 optimization
   like using SSE instruction. I think it is great.
  
   Last year, Mark Florisson released the minivect[1] project that he
   worked on during is master thesis. minivect is a compiler for
   element-wise expression that do some of the same low level
 optimization
   that Julian is doing in NumPy right now.
  
   Mark did minivect in a way that allow it to be reused by other
 project.
   It is used now by Cython and Numba I think. I had plan to reuse it in
   Theano, but I didn't got the time to integrate it up to now.
  
   What about reusing it in NumPy? I think that some of Julian
 optimization
   aren't in minivect (I didn't check to confirm). But from I heard,
   minivect don't implement reduction and there is a pull request to
   optimize this in NumPy.
 
  Hi,
  what I vectorized is just the really easy cases of unit stride
  continuous operations, so the min/max reductions which is now in numpy
  is in essence pretty trivial.
  minivect goes much further in optimizing general strided access and
  broadcasting via loop optimizations (it seems to have a lot of overlap
  with the graphite loop optimizer available in GCC [0]) so my code is
  probably not of very much use to minivect.
 
  The most interesting part in minivect for numpy is probably the
  optimization of broadcasting loops which seem to be pretty inefficient
  in numpy [0].
 
  Concerning the rest I'm not sure how much of a bottleneck general
  strided operations really are in common numpy using code.
 
 
  I guess a similar discussion about adding an expression compiler to
  numpy has already happened when numexpr was released?
  If yes what was the outcome of that?
 
 
  I don't recall a discussion when numexpr was done as this is before I
 read
  this list. numexpr do optimization that can't be done by NumPy: fusing
  element-wise operation in one call. So I don't see how it could be done
 to
  reuse it in NumPy.
 
  You call your optimization trivial, but I don't. In the git log of
 NumPy,
  the first commit is in 2001. It is the first time someone do this in 12
  years! Also, this give 1.5-8x speed up (from memory from your PR
  description). This is not negligible. But how much time did you spend on
  them? Also, some of them are processor dependent, how many people in
 this
  list already have done this? I suppose not many.
 
  Yes, your optimization don't cover all cases that minivect do. I see 2
 level
  of optimization. 1) The inner loop/contiguous cases, 2) the strided,
  broadcasted level. We don't need all optimization being done for them
 to be
  useful. Any of them are useful.
 
  So what I think is that we could reuse/share that work. NumPy have c
 code
  generator. They could call minivect code generator for some of them when
  compiling NumPy. This will make optimization done to those code
 generator
  reused by more people. For example, when new processor are launched, we
 will
  need only 1 place to change for many projects. Or for example, it the
 call
  to MKL vector library is done there, more people will benefit from it.
 Right
  now, only numexpr do it.
 
  About the level 2 optimization (strides, broadcast), I never read NumPy
 code
  that deal with that. Do someone that know it have an idea if it would be
  possible to reuse minivect for this?

 Would someone be able to guide some of the numpy C experts into a room
 to do some thinking / writing on this at the scipy conference?

 I completely agree that these kind of optimizations and code sharing
 seem likely to be very important for the future.

 I'm not at the conference, but if there's anything I can do to help,
 please someone let me know.


 Concerning the future development of numpy, I'd also suggest that we look
 at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like
 it is reaching a level of maturity where it is worth trying to plan out a
 long term path to merger.

 Chuck


 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-19 Thread Matthew Brett
Hi,

On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien no...@nouiz.org wrote:
 Hi,


 On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
 jtaylor.deb...@googlemail.com wrote:

 On 17.06.2013 17:11, Frédéric Bastien wrote:
  Hi,
 
  I saw that recently Julian Taylor is doing many low level optimization
  like using SSE instruction. I think it is great.
 
  Last year, Mark Florisson released the minivect[1] project that he
  worked on during is master thesis. minivect is a compiler for
  element-wise expression that do some of the same low level optimization
  that Julian is doing in NumPy right now.
 
  Mark did minivect in a way that allow it to be reused by other project.
  It is used now by Cython and Numba I think. I had plan to reuse it in
  Theano, but I didn't got the time to integrate it up to now.
 
  What about reusing it in NumPy? I think that some of Julian optimization
  aren't in minivect (I didn't check to confirm). But from I heard,
  minivect don't implement reduction and there is a pull request to
  optimize this in NumPy.

 Hi,
 what I vectorized is just the really easy cases of unit stride
 continuous operations, so the min/max reductions which is now in numpy
 is in essence pretty trivial.
 minivect goes much further in optimizing general strided access and
 broadcasting via loop optimizations (it seems to have a lot of overlap
 with the graphite loop optimizer available in GCC [0]) so my code is
 probably not of very much use to minivect.

 The most interesting part in minivect for numpy is probably the
 optimization of broadcasting loops which seem to be pretty inefficient
 in numpy [0].

 Concerning the rest I'm not sure how much of a bottleneck general
 strided operations really are in common numpy using code.


 I guess a similar discussion about adding an expression compiler to
 numpy has already happened when numexpr was released?
 If yes what was the outcome of that?


 I don't recall a discussion when numexpr was done as this is before I read
 this list. numexpr do optimization that can't be done by NumPy: fusing
 element-wise operation in one call. So I don't see how it could be done to
 reuse it in NumPy.

 You call your optimization trivial, but I don't. In the git log of NumPy,
 the first commit is in 2001. It is the first time someone do this in 12
 years! Also, this give 1.5-8x speed up (from memory from your PR
 description). This is not negligible. But how much time did you spend on
 them? Also, some of them are processor dependent, how many people in this
 list already have done this? I suppose not many.

 Yes, your optimization don't cover all cases that minivect do. I see 2 level
 of optimization. 1) The inner loop/contiguous cases, 2) the strided,
 broadcasted level. We don't need all optimization being done for them to be
 useful. Any of them are useful.

 So what I think is that we could reuse/share that work. NumPy have c code
 generator. They could call minivect code generator for some of them when
 compiling NumPy. This will make optimization done to those code generator
 reused by more people. For example, when new processor are launched, we will
 need only 1 place to change for many projects. Or for example, it the call
 to MKL vector library is done there, more people will benefit from it. Right
 now, only numexpr do it.

 About the level 2 optimization (strides, broadcast), I never read NumPy code
 that deal with that. Do someone that know it have an idea if it would be
 possible to reuse minivect for this?

Would someone be able to guide some of the numpy C experts into a room
to do some thinking / writing on this at the scipy conference?

I completely agree that these kind of optimizations and code sharing
seem likely to be very important for the future.

I'm not at the conference, but if there's anything I can do to help,
please someone let me know.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-19 Thread Charles R Harris
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.br...@gmail.comwrote:

 Hi,

 On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien no...@nouiz.org wrote:
  Hi,
 
 
  On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
  jtaylor.deb...@googlemail.com wrote:
 
  On 17.06.2013 17:11, Frédéric Bastien wrote:
   Hi,
  
   I saw that recently Julian Taylor is doing many low level optimization
   like using SSE instruction. I think it is great.
  
   Last year, Mark Florisson released the minivect[1] project that he
   worked on during is master thesis. minivect is a compiler for
   element-wise expression that do some of the same low level
 optimization
   that Julian is doing in NumPy right now.
  
   Mark did minivect in a way that allow it to be reused by other
 project.
   It is used now by Cython and Numba I think. I had plan to reuse it in
   Theano, but I didn't got the time to integrate it up to now.
  
   What about reusing it in NumPy? I think that some of Julian
 optimization
   aren't in minivect (I didn't check to confirm). But from I heard,
   minivect don't implement reduction and there is a pull request to
   optimize this in NumPy.
 
  Hi,
  what I vectorized is just the really easy cases of unit stride
  continuous operations, so the min/max reductions which is now in numpy
  is in essence pretty trivial.
  minivect goes much further in optimizing general strided access and
  broadcasting via loop optimizations (it seems to have a lot of overlap
  with the graphite loop optimizer available in GCC [0]) so my code is
  probably not of very much use to minivect.
 
  The most interesting part in minivect for numpy is probably the
  optimization of broadcasting loops which seem to be pretty inefficient
  in numpy [0].
 
  Concerning the rest I'm not sure how much of a bottleneck general
  strided operations really are in common numpy using code.
 
 
  I guess a similar discussion about adding an expression compiler to
  numpy has already happened when numexpr was released?
  If yes what was the outcome of that?
 
 
  I don't recall a discussion when numexpr was done as this is before I
 read
  this list. numexpr do optimization that can't be done by NumPy: fusing
  element-wise operation in one call. So I don't see how it could be done
 to
  reuse it in NumPy.
 
  You call your optimization trivial, but I don't. In the git log of NumPy,
  the first commit is in 2001. It is the first time someone do this in 12
  years! Also, this give 1.5-8x speed up (from memory from your PR
  description). This is not negligible. But how much time did you spend on
  them? Also, some of them are processor dependent, how many people in this
  list already have done this? I suppose not many.
 
  Yes, your optimization don't cover all cases that minivect do. I see 2
 level
  of optimization. 1) The inner loop/contiguous cases, 2) the strided,
  broadcasted level. We don't need all optimization being done for them to
 be
  useful. Any of them are useful.
 
  So what I think is that we could reuse/share that work. NumPy have c code
  generator. They could call minivect code generator for some of them when
  compiling NumPy. This will make optimization done to those code generator
  reused by more people. For example, when new processor are launched, we
 will
  need only 1 place to change for many projects. Or for example, it the
 call
  to MKL vector library is done there, more people will benefit from it.
 Right
  now, only numexpr do it.
 
  About the level 2 optimization (strides, broadcast), I never read NumPy
 code
  that deal with that. Do someone that know it have an idea if it would be
  possible to reuse minivect for this?

 Would someone be able to guide some of the numpy C experts into a room
 to do some thinking / writing on this at the scipy conference?

 I completely agree that these kind of optimizations and code sharing
 seem likely to be very important for the future.

 I'm not at the conference, but if there's anything I can do to help,
 please someone let me know.


Concerning the future development of numpy, I'd also suggest that we look
at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like it
is reaching a level of maturity where it is worth trying to plan out a long
term path to merger.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-18 Thread Frédéric Bastien
Hi,


On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor 
jtaylor.deb...@googlemail.com wrote:

 On 17.06.2013 17:11, Frédéric Bastien wrote:
  Hi,
 
  I saw that recently Julian Taylor is doing many low level optimization
  like using SSE instruction. I think it is great.
 
  Last year, Mark Florisson released the minivect[1] project that he
  worked on during is master thesis. minivect is a compiler for
  element-wise expression that do some of the same low level optimization
  that Julian is doing in NumPy right now.
 
  Mark did minivect in a way that allow it to be reused by other project.
  It is used now by Cython and Numba I think. I had plan to reuse it in
  Theano, but I didn't got the time to integrate it up to now.
 
  What about reusing it in NumPy? I think that some of Julian optimization
  aren't in minivect (I didn't check to confirm). But from I heard,
  minivect don't implement reduction and there is a pull request to
  optimize this in NumPy.

 Hi,
 what I vectorized is just the really easy cases of unit stride
 continuous operations, so the min/max reductions which is now in numpy
 is in essence pretty trivial.
 minivect goes much further in optimizing general strided access and
 broadcasting via loop optimizations (it seems to have a lot of overlap
 with the graphite loop optimizer available in GCC [0]) so my code is
 probably not of very much use to minivect.

 The most interesting part in minivect for numpy is probably the
 optimization of broadcasting loops which seem to be pretty inefficient
 in numpy [0].

 Concerning the rest I'm not sure how much of a bottleneck general
 strided operations really are in common numpy using code.


 I guess a similar discussion about adding an expression compiler to
 numpy has already happened when numexpr was released?
 If yes what was the outcome of that?


I don't recall a discussion when numexpr was done as this is before I read
this list. numexpr do optimization that can't be done by NumPy: fusing
element-wise operation in one call. So I don't see how it could be done to
reuse it in NumPy.

You call your optimization trivial, but I don't. In the git log of NumPy,
the first commit is in 2001. It is the first time someone do this in 12
years! Also, this give 1.5-8x speed up (from memory from your PR
description). This is not negligible. But how much time did you spend on
them? Also, some of them are processor dependent, how many people in this
list already have done this? I suppose not many.

Yes, your optimization don't cover all cases that minivect do. I see 2
level of optimization. 1) The inner loop/contiguous cases, 2) the strided,
broadcasted level. We don't need all optimization being done for them to be
useful. Any of them are useful.

So what I think is that we could reuse/share that work. NumPy have c code
generator. They could call minivect code generator for some of them when
compiling NumPy. This will make optimization done to those code generator
reused by more people. For example, when new processor are launched, we
will need only 1 place to change for many projects. Or for example, it the
call to MKL vector library is done there, more people will benefit from it.
Right now, only numexpr do it.

About the level 2 optimization (strides, broadcast), I never read NumPy
code that deal with that. Do someone that know it have an idea if it would
be possible to reuse minivect for this?

Frédéric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-17 Thread Julian Taylor
On 17.06.2013 17:11, Frédéric Bastien wrote:
 Hi,
 
 I saw that recently Julian Taylor is doing many low level optimization
 like using SSE instruction. I think it is great.
 
 Last year, Mark Florisson released the minivect[1] project that he
 worked on during is master thesis. minivect is a compiler for
 element-wise expression that do some of the same low level optimization
 that Julian is doing in NumPy right now.
 
 Mark did minivect in a way that allow it to be reused by other project.
 It is used now by Cython and Numba I think. I had plan to reuse it in
 Theano, but I didn't got the time to integrate it up to now.
 
 What about reusing it in NumPy? I think that some of Julian optimization
 aren't in minivect (I didn't check to confirm). But from I heard,
 minivect don't implement reduction and there is a pull request to
 optimize this in NumPy.

Hi,
what I vectorized is just the really easy cases of unit stride
continuous operations, so the min/max reductions which is now in numpy
is in essence pretty trivial.
minivect goes much further in optimizing general strided access and
broadcasting via loop optimizations (it seems to have a lot of overlap
with the graphite loop optimizer available in GCC [0]) so my code is
probably not of very much use to minivect.

The most interesting part in minivect for numpy is probably the
optimization of broadcasting loops which seem to be pretty inefficient
in numpy [0].

Concerning the rest I'm not sure how much of a bottleneck general
strided operations really are in common numpy using code.


I guess a similar discussion about adding an expression compiler to
numpy has already happened when numexpr was released?
If yes what was the outcome of that?


[0] http://gcc.gnu.org/wiki/Graphite
[1] ones((5000,100)) - ones((100,) spends about 40% of its time copying
stuff around in buffers
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] low level optimization in NumPy and minivect

2013-06-17 Thread Dag Sverre Seljebotn
On 06/17/2013 11:03 PM, Julian Taylor wrote:
 On 17.06.2013 17:11, Frédéric Bastien wrote:
 Hi,

 I saw that recently Julian Taylor is doing many low level optimization
 like using SSE instruction. I think it is great.

 Last year, Mark Florisson released the minivect[1] project that he
 worked on during is master thesis. minivect is a compiler for
 element-wise expression that do some of the same low level optimization
 that Julian is doing in NumPy right now.

 Mark did minivect in a way that allow it to be reused by other project.
 It is used now by Cython and Numba I think. I had plan to reuse it in
 Theano, but I didn't got the time to integrate it up to now.

 What about reusing it in NumPy? I think that some of Julian optimization
 aren't in minivect (I didn't check to confirm). But from I heard,
 minivect don't implement reduction and there is a pull request to
 optimize this in NumPy.

 Hi,
 what I vectorized is just the really easy cases of unit stride
 continuous operations, so the min/max reductions which is now in numpy
 is in essence pretty trivial.
 minivect goes much further in optimizing general strided access and
 broadcasting via loop optimizations (it seems to have a lot of overlap
 with the graphite loop optimizer available in GCC [0]) so my code is
 probably not of very much use to minivect.

 The most interesting part in minivect for numpy is probably the
 optimization of broadcasting loops which seem to be pretty inefficient
 in numpy [0].

There's also related things like

arr + arr.T

which has much less than optimal performance in NumPy (unless there was 
recent changes). This example was one of the motivating examples for 
minivect.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion