[Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Keith Hughitt
Hi all,

Myself and several colleagues have recently started work on a Python library
for solar physics , in order to provide an
alternative to the current mainstay for solar
physics,
which is written in IDL.

One of the first steps we have taken is to create a Python
portof
a popular benchmark for IDL (time_test3) which measures performance
for a
variety of (primarily matrix) operations. In our initial attempt, however,
Python performs significantly poorer than IDL for several of the tests. I
have attached a graph which shows the results for one machine: the x-axis is
the test # being compared, and the y-axis is the time it took to complete
the test, in milliseconds. While it is possible that this is simply due to
limitations in Python/Numpy, I suspect that this is due at least in part to
our lack in familiarity with NumPy and SciPy.

So my question is, does anyone see any places where we are doing things very
inefficiently in Python?

In order to try and ensure a fair comparison between IDL and Python there
are some things (e.g. the style of timing and output) which we have
deliberately chosen to do a certain way. In other cases, however, it is
likely that we just didn't know a better method.

Any feedback or suggestions people have would be greatly appreciated.
Unfortunately, due to the proprietary nature of IDL, we cannot share the
original version of time_test3, but hopefully the comments in time_test3.py
will be clear enough.

Thanks!
Keith
<>___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Johann Cohen-Tanugi

hi Keith,

I do not think that your primary concern should be with this kind of 
speed test at this stage :
1/ rest assured that this sort of tests have been performed in other 
contexts, and you can always do some hard work on high level computing 
languages like IDL and python to improve performance

2/ "early optimization is the root of all evil" (Knuth?)
3/ I believe that your primary motivation is to provide an alternative 
library to a proprietary software. If this is so, then your effort is 
most welcome and I would suggest first to port an interesting but small 
piece of the IDL solar physics lib and then study the path to speed 
improvements on such a concrete use case.


As for your python time_test3, if it is a benchmark code proprietary to 
the IDL codebas, there is no wonder it performs well there! :)

At any rate, I would suggest simplifying your code with ipython :

In [1]: import numpy as np
In [2]: a = np.zeros([512, 512], dtype=np.uint8)
In [3]: a[200:250, 200:250] = 10
In [4]: from scipy import ndimage
In [5]: %timeit ndimage.filters.median_filter(a, size=(5, 5))
10 loops, best of 3: 98 ms per loop

I am not sure what unit is your vertical axis

best,
Johann

On 09/26/2011 04:19 PM, Keith Hughitt wrote:

Hi all,

Myself and several colleagues have recently started work on a Python 
library for solar physics , in order to provide 
an alternative to the current mainstay for solar physics 
, which is written in IDL.


One of the first steps we have taken is to create a Python port 
 
of a popular benchmark for IDL (time_test3) which measures performance 
for a variety of (primarily matrix) operations. In our initial 
attempt, however, Python performs significantly poorer than IDL for 
several of the tests. I have attached a graph which shows the results 
for one machine: the x-axis is the test # being compared, and the 
y-axis is the time it took to complete the test, in milliseconds. 
While it is possible that this is simply due to limitations in 
Python/Numpy, I suspect that this is due at least in part to our lack 
in familiarity with NumPy and SciPy.


So my question is, does anyone see any places where we are doing 
things very inefficiently in Python?


In order to try and ensure a fair comparison between IDL and Python 
there are some things (e.g. the style of timing and output) which we 
have deliberately chosen to do a certain way. In other cases, however, 
it is likely that we just didn't know a better method.


Any feedback or suggestions people have would be greatly appreciated. 
Unfortunately, due to the proprietary nature of IDL, we cannot share 
the original version of time_test3, but hopefully the comments in 
time_test3.py will be clear enough.


Thanks!
Keith

--
This message has been scanned for viruses and
dangerous content by *MailScanner* , and is
believed to be clean.


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Peter
On Mon, Sep 26, 2011 at 3:19 PM, Keith Hughitt  wrote:
> Hi all,
> Myself and several colleagues have recently started work on a Python library
> for solar physics, in order to provide an alternative to the current
> mainstay for solar physics, which is written in IDL.
> One of the first steps we have taken is to create a Python port of a popular
> benchmark for IDL (time_test3) which measures performance for a variety of
> (primarily matrix) operations. In our initial attempt, however, Python
> performs significantly poorer than IDL for several of the tests. I have
> attached a graph which shows the results for one machine: the x-axis is the
> test # being compared, and the y-axis is the time it took to complete the
> test, in milliseconds. While it is possible that this is simply due to
> limitations in Python/Numpy, I suspect that this is due at least in part to
> our lack in familiarity with NumPy and SciPy.
>
> So my question is, does anyone see any places where we are doing things very
> inefficiently in Python?

Looking at the plot there are five stand out tests, 1,2,3, 6 and 21.

Tests 1, 2 and 3 are testing Python itself (no numpy or scipy),
but are things you should be avoiding when using numpy
anyway (don't use loops, use vectorised calculations etc).

This is test 6,

#Test 6 - Shift 512 by 512 byte and store
nrep = 300 * scale_factor
for i in range(nrep):
c = np.roll(np.roll(b, 10, axis=0), 10, axis=1) #pylint: disable=W0612
timer.log('Shift 512 by 512 byte and store, %d times.' % nrep)

The precise contents of b are determined by the previous tests
(is that deliberate - it makes testing it in isolation hard). I'm unsure
what you are trying to do and if it is the best way.

This is test 21, which is just calling a scipy function repeatedly.
Questions about this might be better directed to the scipy
mailing list - also check what version of SciPy etc you have.

n = 2**(17 * scale_factor)
a = np.arange(n, dtype=np.float32)
...
#Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar
for i in range(nrep):
b = scipy.ndimage.filters.median_filter(a, size=(5, 5))
timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep)

After than, tests 10, 15 and 18 stand out. Test 10 is another use
of roll, so whatever advice you get on test 6 may apply. Test 10:

#Test 10 - Shift 512 x 512 array
nrep = 60 * scale_factor
for i in range(nrep):
c = np.roll(np.roll(b, 10, axis=0), 10, axis=1)
#for i in range(nrep): c = d.rotate(
timer.log('Shift 512 x 512 array, %d times' % nrep)

Test 15 is a loop based version of 16, where Python wins. Test 18
is a loop based version of 19 (log), where the difference is small.

So in terms of numpy speed, your question just seems to be
about numpy.roll and how else one might achieve this result?

Peter
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Charles R Harris
On Mon, Sep 26, 2011 at 8:19 AM, Keith Hughitt wrote:

> Hi all,
>
> Myself and several colleagues have recently started work on a Python
> library for solar physics , in order to provide an
> alternative to the current mainstay for solar 
> physics,
> which is written in IDL.
>
> One of the first steps we have taken is to create a Python 
> portof a 
> popular benchmark for IDL (time_test3) which measures performance for a
> variety of (primarily matrix) operations. In our initial attempt, however,
> Python performs significantly poorer than IDL for several of the tests. I
> have attached a graph which shows the results for one machine: the x-axis is
> the test # being compared, and the y-axis is the time it took to complete
> the test, in milliseconds. While it is possible that this is simply due to
> limitations in Python/Numpy, I suspect that this is due at least in part to
> our lack in familiarity with NumPy and SciPy.
>
> So my question is, does anyone see any places where we are doing things
> very inefficiently in Python?
>
> In order to try and ensure a fair comparison between IDL and Python there
> are some things (e.g. the style of timing and output) which we have
> deliberately chosen to do a certain way. In other cases, however, it is
> likely that we just didn't know a better method.
>
> Any feedback or suggestions people have would be greatly appreciated.
> Unfortunately, due to the proprietary nature of IDL, we cannot share the
> original version of time_test3, but hopefully the comments in time_test3.py
> will be clear enough.
>
>
The first three tests are of Python loops over python lists, so I'm not much
surprised at the results. Number 6 uses numpy roll, which is not implemented
in a particularly efficient way, so could use some improvement. I haven't
looked at the rest of the results, but I suspect they are similar. So in
some cases I think the benchmark isn't particularly useful, but in a few
others numpy could be improved.

It would be interesting to see which features are actually widely used in
IDL code and weight them accordingly. In general, for loops are to be
avoided, but if some numpy routine is a bottleneck we should fix it.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Zachary Pincus
Hello Keith,

While I also echo Johann's points about the arbitrariness and non-utility of 
benchmarking I'll briefly comment just on just a few tests to help out with 
getting things into idiomatic python/numpy:

Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that 
won't actually influence the running time of well-written non-pathological code.

Test 3: 
#Test 3 - Add 20 scalar ints
nrep = 200 * scale_factor
for i in range(nrep):
a = i + 1

well, python looping is slow... one doesn't do such loops in idiomatic code if 
the underlying intent can be re-cast into array operations in numpy. But here 
the test is on such a simple operation that it's not clear how to recast in a 
way that would remain reasonable. Ideally you'd test something like:
i = numpy.arange(20)
for j in range(scale_factor):
  a = i + 1

but that sort of changes what the test is testing.


Finally, test 21:
#Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar
for i in range(nrep):
b = scipy.ndimage.filters.median_filter(a, size=(5, 5))
timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep)

A median filter is definitely NOT a boxcar filter! You want "uniform_filter":

In [4]: a = numpy.empty((1000,1000))

In [5]: timeit scipy.ndimage.filters.median_filter(a, size=(5, 5))
10 loops, best of 3: 93.2 ms per loop

In [6]: timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5))
10 loops, best of 3: 27.7 ms per loop

Zach


On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote:

> Hi all,
> 
> Myself and several colleagues have recently started work on a Python library 
> for solar physics, in order to provide an alternative to the current mainstay 
> for solar physics, which is written in IDL.
> 
> One of the first steps we have taken is to create a Python port of a popular 
> benchmark for IDL (time_test3) which measures performance for a variety of 
> (primarily matrix) operations. In our initial attempt, however, Python 
> performs significantly poorer than IDL for several of the tests. I have 
> attached a graph which shows the results for one machine: the x-axis is the 
> test # being compared, and the y-axis is the time it took to complete the 
> test, in milliseconds. While it is possible that this is simply due to 
> limitations in Python/Numpy, I suspect that this is due at least in part to 
> our lack in familiarity with NumPy and SciPy.
> 
> So my question is, does anyone see any places where we are doing things very 
> inefficiently in Python?
> 
> In order to try and ensure a fair comparison between IDL and Python there are 
> some things (e.g. the style of timing and output) which we have deliberately 
> chosen to do a certain way. In other cases, however, it is likely that we 
> just didn't know a better method.
> 
> Any feedback or suggestions people have would be greatly appreciated. 
> Unfortunately, due to the proprietary nature of IDL, we cannot share the 
> original version of time_test3, but hopefully the comments in time_test3.py 
> will be clear enough.
> 
> Thanks!
> Keith
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Nathaniel Smith
On Mon, Sep 26, 2011 at 8:24 AM, Zachary Pincus  wrote:
> Test 3:
>    #Test 3 - Add 20 scalar ints
>    nrep = 200 * scale_factor
>    for i in range(nrep):
>        a = i + 1
>
> well, python looping is slow... one doesn't do such loops in idiomatic code 
> if the underlying intent can be re-cast into array operations in numpy.

Also, in this particular case, what you're mostly measuring is how
much time it takes to allocate a giant list of integers by calling
'range'. Using 'xrange' instead speeds things up by a factor of two:

def f():
nrep = 200
for i in range(nrep):
a = i + 1
def g():
nrep = 200
for i in xrange(nrep):
a = i + 1

In [8]: timeit f()
10 loops, best of 3: 138 ms per loop
In [9]: timeit g()
10 loops, best of 3: 72.1 ms per loop

Usually I don't worry about the difference between xrange and range --
it doesn't really matter for small loops or loops that are doing more
work inside each iteration -- and that's every loop I actually write
in practice :-). And if I really did need to write a loop like this
(lots of iterations with a small amount of work in each and speed is
critical) then I'd use cython. But, you might as well get in the habit
of using 'xrange'; it won't hurt and occasionally will help.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-26 Thread Olivier Delalleau
One minor thing is you should use xrange rather than range. Although it will
probably only make a difference for the empty loop ;)

Otherwise, from what I can see, tests where numpy is really much worse are:
- 1, 2, 3, 15, 18: Not numpy but Python related: for loops are not efficient
- 6, 10: Maybe numpy.roll is indeed not efficiently implemented
- 21: Same for this scipy function

-=- Olivier

2011/9/26 Keith Hughitt 

> Hi all,
>
> Myself and several colleagues have recently started work on a Python
> library for solar physics , in order to provide an
> alternative to the current mainstay for solar 
> physics,
> which is written in IDL.
>
> One of the first steps we have taken is to create a Python 
> portof a 
> popular benchmark for IDL (time_test3) which measures performance for a
> variety of (primarily matrix) operations. In our initial attempt, however,
> Python performs significantly poorer than IDL for several of the tests. I
> have attached a graph which shows the results for one machine: the x-axis is
> the test # being compared, and the y-axis is the time it took to complete
> the test, in milliseconds. While it is possible that this is simply due to
> limitations in Python/Numpy, I suspect that this is due at least in part to
> our lack in familiarity with NumPy and SciPy.
>
> So my question is, does anyone see any places where we are doing things
> very inefficiently in Python?
>
> In order to try and ensure a fair comparison between IDL and Python there
> are some things (e.g. the style of timing and output) which we have
> deliberately chosen to do a certain way. In other cases, however, it is
> likely that we just didn't know a better method.
>
> Any feedback or suggestions people have would be greatly appreciated.
> Unfortunately, due to the proprietary nature of IDL, we cannot share the
> original version of time_test3, but hopefully the comments in time_test3.py
> will be clear enough.
>
> Thanks!
> Keith
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-29 Thread Zachary Pincus
I think the remaining delta between the integer and float "boxcar" smoothing is 
that the integer version (test 21) still uses median_filter(), while the float 
one (test 22) is using uniform_filter(), which is a boxcar.

Other than that and the slow roll() implementation in numpy, things look pretty 
solid, yes?

Zach


On Sep 29, 2011, at 12:11 PM, Keith Hughitt wrote:

> Thank you all for the comments and suggestions.
> 
> First off, I would like to say that I entirely agree with people's 
> suggestions about lack of objectiveness in the test design, and the caveat 
> about optimizing early. The main reason we put together the Python version of 
> the benchmark was as a quick "sanity check" to make sure that there are no 
> major show-stoppers before we began work on the library. We also wanted to 
> put together something to show other people who are firmly in the IDL camp 
> that this is a viable option.
> 
> We did in fact put together another short test-suite (test_testr.py & 
> time_testr.pro) which consists of operations that would are frequently used 
> by us, but it also is testing a very small portion of the kinds of things our 
> library will eventually do.
> 
> That said, I made a few small changes to the original benchmark, based on 
> people's feedback, and put together a new plot.
> 
> The changes made include:
> 
> 1. Using xrange instead of range
> 2. Using uniform filter instead of median filter
> 3. Fixed a typo for tests 2 & 3 which resulted in slower Python results
> 
> Again, note that some of the tests are testing non-numpy functionality. 
> Several of the results still stand out,  but overall the results are much 
> more reasonable than before.
> 
> Cheers,
> Keith
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-29 Thread Keith Hughitt
Ah. Thanks for catching that!

Otherwise though I think everything looks pretty good.

Thanks all,
Keith

On Thu, Sep 29, 2011 at 12:18 PM, Zachary Pincus wrote:

> I think the remaining delta between the integer and float "boxcar"
> smoothing is that the integer version (test 21) still uses median_filter(),
> while the float one (test 22) is using uniform_filter(), which is a boxcar.
>
> Other than that and the slow roll() implementation in numpy, things look
> pretty solid, yes?
>
> Zach
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Comparing NumPy/IDL Performance

2011-09-30 Thread David Verelst
Just want to point to some excellent material that was recently presented at
the course Advanced Scientific Programming in
Pythonat St Andrews. Day 3 was titled
"The Quest for Speed" (see
https://python.g-node.org/wiki/schedule) and might interest you as well.

Regards,
David

On 29 September 2011 20:46, Keith Hughitt  wrote:

> Ah. Thanks for catching that!
>
> Otherwise though I think everything looks pretty good.
>
> Thanks all,
> Keith
>
> On Thu, Sep 29, 2011 at 12:18 PM, Zachary Pincus 
> wrote:
>
>> I think the remaining delta between the integer and float "boxcar"
>> smoothing is that the integer version (test 21) still uses median_filter(),
>> while the float one (test 22) is using uniform_filter(), which is a boxcar.
>>
>> Other than that and the slow roll() implementation in numpy, things look
>> pretty solid, yes?
>>
>> Zach
>>
>>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion