[Numpy-discussion] Comparing NumPy/IDL Performance
Hi all, Myself and several colleagues have recently started work on a Python library for solar physics http://www.sunpy.org/, in order to provide an alternative to the current mainstay for solar physicshttp://www.lmsal.com/solarsoft/, which is written in IDL. One of the first steps we have taken is to create a Python porthttps://github.com/sunpy/sunpy/blob/master/benchmarks/time_test3.pyof a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. Thanks! Keith attachment: sunpy_time_test3_idl_python_2011-09-26.png___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
hi Keith, I do not think that your primary concern should be with this kind of speed test at this stage : 1/ rest assured that this sort of tests have been performed in other contexts, and you can always do some hard work on high level computing languages like IDL and python to improve performance 2/ early optimization is the root of all evil (Knuth?) 3/ I believe that your primary motivation is to provide an alternative library to a proprietary software. If this is so, then your effort is most welcome and I would suggest first to port an interesting but small piece of the IDL solar physics lib and then study the path to speed improvements on such a concrete use case. As for your python time_test3, if it is a benchmark code proprietary to the IDL codebas, there is no wonder it performs well there! :) At any rate, I would suggest simplifying your code with ipython : In [1]: import numpy as np In [2]: a = np.zeros([512, 512], dtype=np.uint8) In [3]: a[200:250, 200:250] = 10 In [4]: from scipy import ndimage In [5]: %timeit ndimage.filters.median_filter(a, size=(5, 5)) 10 loops, best of 3: 98 ms per loop I am not sure what unit is your vertical axis best, Johann On 09/26/2011 04:19 PM, Keith Hughitt wrote: Hi all, Myself and several colleagues have recently started work on a Python library for solar physics http://www.sunpy.org/, in order to provide an alternative to the current mainstay for solar physics http://www.lmsal.com/solarsoft/, which is written in IDL. One of the first steps we have taken is to create a Python port https://github.com/sunpy/sunpy/blob/master/benchmarks/time_test3.py of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. Thanks! Keith -- This message has been scanned for viruses and dangerous content by *MailScanner* http://www.mailscanner.info/, and is believed to be clean. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
On Mon, Sep 26, 2011 at 3:19 PM, Keith Hughitt keith.hugh...@gmail.com wrote: Hi all, Myself and several colleagues have recently started work on a Python library for solar physics, in order to provide an alternative to the current mainstay for solar physics, which is written in IDL. One of the first steps we have taken is to create a Python port of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? Looking at the plot there are five stand out tests, 1,2,3, 6 and 21. Tests 1, 2 and 3 are testing Python itself (no numpy or scipy), but are things you should be avoiding when using numpy anyway (don't use loops, use vectorised calculations etc). This is test 6, #Test 6 - Shift 512 by 512 byte and store nrep = 300 * scale_factor for i in range(nrep): c = np.roll(np.roll(b, 10, axis=0), 10, axis=1) #pylint: disable=W0612 timer.log('Shift 512 by 512 byte and store, %d times.' % nrep) The precise contents of b are determined by the previous tests (is that deliberate - it makes testing it in isolation hard). I'm unsure what you are trying to do and if it is the best way. This is test 21, which is just calling a scipy function repeatedly. Questions about this might be better directed to the scipy mailing list - also check what version of SciPy etc you have. n = 2**(17 * scale_factor) a = np.arange(n, dtype=np.float32) ... #Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar for i in range(nrep): b = scipy.ndimage.filters.median_filter(a, size=(5, 5)) timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep) After than, tests 10, 15 and 18 stand out. Test 10 is another use of roll, so whatever advice you get on test 6 may apply. Test 10: #Test 10 - Shift 512 x 512 array nrep = 60 * scale_factor for i in range(nrep): c = np.roll(np.roll(b, 10, axis=0), 10, axis=1) #for i in range(nrep): c = d.rotate( timer.log('Shift 512 x 512 array, %d times' % nrep) Test 15 is a loop based version of 16, where Python wins. Test 18 is a loop based version of 19 (log), where the difference is small. So in terms of numpy speed, your question just seems to be about numpy.roll and how else one might achieve this result? Peter ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
On Mon, Sep 26, 2011 at 8:19 AM, Keith Hughitt keith.hugh...@gmail.comwrote: Hi all, Myself and several colleagues have recently started work on a Python library for solar physics http://www.sunpy.org/, in order to provide an alternative to the current mainstay for solar physicshttp://www.lmsal.com/solarsoft/, which is written in IDL. One of the first steps we have taken is to create a Python porthttps://github.com/sunpy/sunpy/blob/master/benchmarks/time_test3.pyof a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. The first three tests are of Python loops over python lists, so I'm not much surprised at the results. Number 6 uses numpy roll, which is not implemented in a particularly efficient way, so could use some improvement. I haven't looked at the rest of the results, but I suspect they are similar. So in some cases I think the benchmark isn't particularly useful, but in a few others numpy could be improved. It would be interesting to see which features are actually widely used in IDL code and weight them accordingly. In general, for loops are to be avoided, but if some numpy routine is a bottleneck we should fix it. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
Hello Keith, While I also echo Johann's points about the arbitrariness and non-utility of benchmarking I'll briefly comment just on just a few tests to help out with getting things into idiomatic python/numpy: Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that won't actually influence the running time of well-written non-pathological code. Test 3: #Test 3 - Add 20 scalar ints nrep = 200 * scale_factor for i in range(nrep): a = i + 1 well, python looping is slow... one doesn't do such loops in idiomatic code if the underlying intent can be re-cast into array operations in numpy. But here the test is on such a simple operation that it's not clear how to recast in a way that would remain reasonable. Ideally you'd test something like: i = numpy.arange(20) for j in range(scale_factor): a = i + 1 but that sort of changes what the test is testing. Finally, test 21: #Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar for i in range(nrep): b = scipy.ndimage.filters.median_filter(a, size=(5, 5)) timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep) A median filter is definitely NOT a boxcar filter! You want uniform_filter: In [4]: a = numpy.empty((1000,1000)) In [5]: timeit scipy.ndimage.filters.median_filter(a, size=(5, 5)) 10 loops, best of 3: 93.2 ms per loop In [6]: timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5)) 10 loops, best of 3: 27.7 ms per loop Zach On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote: Hi all, Myself and several colleagues have recently started work on a Python library for solar physics, in order to provide an alternative to the current mainstay for solar physics, which is written in IDL. One of the first steps we have taken is to create a Python port of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. Thanks! Keith sunpy_time_test3_idl_python_2011-09-26.png___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
On Mon, Sep 26, 2011 at 8:24 AM, Zachary Pincus zachary.pin...@yale.edu wrote: Test 3: #Test 3 - Add 20 scalar ints nrep = 200 * scale_factor for i in range(nrep): a = i + 1 well, python looping is slow... one doesn't do such loops in idiomatic code if the underlying intent can be re-cast into array operations in numpy. Also, in this particular case, what you're mostly measuring is how much time it takes to allocate a giant list of integers by calling 'range'. Using 'xrange' instead speeds things up by a factor of two: def f(): nrep = 200 for i in range(nrep): a = i + 1 def g(): nrep = 200 for i in xrange(nrep): a = i + 1 In [8]: timeit f() 10 loops, best of 3: 138 ms per loop In [9]: timeit g() 10 loops, best of 3: 72.1 ms per loop Usually I don't worry about the difference between xrange and range -- it doesn't really matter for small loops or loops that are doing more work inside each iteration -- and that's every loop I actually write in practice :-). And if I really did need to write a loop like this (lots of iterations with a small amount of work in each and speed is critical) then I'd use cython. But, you might as well get in the habit of using 'xrange'; it won't hurt and occasionally will help. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] Trouble installing numpy
Using Source Forge download of NumPy installer package: numpy-1.6.1-win32-superpack-python 2.7.exe. Installation Wizard starts, but then installation fails with error message: Python version 2.7 required, which was not found in the registry Idle says it's using: Python 2.7.2 64 bit AMD64 on Win 32 So what's holding up the installation, and what do I need to do to install numpy? The Helmbolds 2645 E Southern Ave A241 Tempe AZ 85282 Email: hel...@yahoo.com VOX: 480-831-3611 CELL: 602-568-6948 (but not often turned on)___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Comparing NumPy/IDL Performance
One minor thing is you should use xrange rather than range. Although it will probably only make a difference for the empty loop ;) Otherwise, from what I can see, tests where numpy is really much worse are: - 1, 2, 3, 15, 18: Not numpy but Python related: for loops are not efficient - 6, 10: Maybe numpy.roll is indeed not efficiently implemented - 21: Same for this scipy function -=- Olivier 2011/9/26 Keith Hughitt keith.hugh...@gmail.com Hi all, Myself and several colleagues have recently started work on a Python library for solar physics http://www.sunpy.org/, in order to provide an alternative to the current mainstay for solar physicshttp://www.lmsal.com/solarsoft/, which is written in IDL. One of the first steps we have taken is to create a Python porthttps://github.com/sunpy/sunpy/blob/master/benchmarks/time_test3.pyof a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy. So my question is, does anyone see any places where we are doing things very inefficiently in Python? In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method. Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough. Thanks! Keith ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trouble installing numpy
On Mon, Sep 26, 2011 at 9:43 AM, The Helmbolds hel...@yahoo.com wrote: Using Source Forge download of NumPy installer package: numpy-1.6.1-win32-superpack-python 2.7.exe. Installation Wizard starts, but then installation fails with error message: Python version 2.7 required, which was not found in the registry Idle says it's using: Python 2.7.2 64 bit AMD64 on Win 32 So what's holding up the installation, and what do I need to do to install numpy? Your Python is 64 bits, the numpy package is 32 bits and needs 32 bit Python. If you need free 64 bit numpy on windows your best bet is probably here http://www.lfd.uci.edu/%7Egohlke/pythonlibs/. Chuck ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trouble installing numpy
You are probably trying to install the 32 bit version of numpy on your 64 bit Python. Either switch to 64 bit numpy or 32 bit Python. -=- Olivier 2011/9/26 The Helmbolds hel...@yahoo.com Using Source Forge download of NumPy installer package: numpy-1.6.1-win32-superpack-python 2.7.exe. Installation Wizard starts, but then installation fails with error message: Python version 2.7 required, which was not found in the registry Idle says it's using: Python 2.7.2 64 bit AMD64 on Win 32 So what's holding up the installation, and what do I need to do to install numpy? The Helmbolds 2645 E Southern Ave A241 Tempe AZ 85282 Email: hel...@yahoo.com VOX: 480-831-3611 CELL: 602-568-6948 (but not often turned on) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Trouble installing numpy
Thanks. The error message did not refer to the difference between 32-bit and 64-bit, only to the supposed absence of Python 2.7. And Idle says it's using Python 2.7.2 64 bit AMD64 on Win 32, which confused me. However, I see that the suggested web page contains the following four options, each of which apparently offers a NumPy-like Windows installation package for 64-bit windows machines: Bottleneck-0.5.0.win-amd64-py2.7.exe numpy-MKL-1.6.1.win-amd64-py2.7.exe [*-see comment below] numpy-unoptimized-1.6.1.win-amd64-py2.7.exe numscons-012.0.win-amd64-py2.7.exe [*-coment] This item is described as not compatible with the official SciPy distributions. Any comments on which ones I should try first? The Helmbolds 2645 E Southern Ave A241 Tempe AZ 85282 Email: hel...@yahoo.com VOX: 480-831-3611 CELL: 602-568-6948 (but not often turned on) From: numpy-discussion-requ...@scipy.org numpy-discussion-requ...@scipy.org To: numpy-discussion@scipy.org Sent: Monday, September 26, 2011 10:00 AM Subject: NumPy-Discussion Digest, Vol 60, Issue 55 Send NumPy-Discussion mailing list submissions to numpy-discussion@scipy.org To subscribe or unsubscribe via the World Wide Web, visit http://mail.scipy.org/mailman/listinfo/numpy-discussion or, via email, send a message with subject or body 'help' to numpy-discussion-requ...@scipy.org You can reach the person managing the list at numpy-discussion-ow...@scipy.org When replying, please edit your Subject line so it is more specific than Re: Contents of NumPy-Discussion digest... Today's Topics: 1. Re: Trouble installing numpy (Charles R Harris) 2. Re: Trouble installing numpy (Olivier Delalleau) -- Message: 1 Date: Mon, 26 Sep 2011 10:14:16 -0600 From: Charles R Harris charlesr.har...@gmail.com Subject: Re: [Numpy-discussion] Trouble installing numpy To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: CAB6mnx+rC01d7mAtdEgvGxT=seh+vrlo5p-hsvgvg16xren...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 On Mon, Sep 26, 2011 at 9:43 AM, The Helmbolds hel...@yahoo.com wrote: Using Source Forge download of NumPy installer package: numpy-1.6.1-win32-superpack-python 2.7.exe. Installation Wizard starts, but then installation fails with error message: Python version 2.7 required, which was not found in the registry Idle says it's using: Python 2.7.2 64 bit AMD64 on Win 32 So what's holding up the installation, and what do I need to do to install numpy? Your Python is 64 bits, the numpy package is 32 bits and needs 32 bit Python. If you need free 64 bit numpy on windows your best bet is probably here http://www.lfd.uci.edu/%7Egohlke/pythonlibs/. Chuck -- next part -- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110926/2ea1a5ec/attachment-0001.html -- Message: 2 Date: Mon, 26 Sep 2011 12:32:08 -0400 From: Olivier Delalleau sh...@keba.be Subject: Re: [Numpy-discussion] Trouble installing numpy To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: cafxk4br4ocyhjzosygpyy8o4f-xyvgbws5nfqa1cqh-lgpv...@mail.gmail.com Content-Type: text/plain; charset=iso-8859-1 You are probably trying to install the 32 bit version of numpy on your 64 bit Python. Either switch to 64 bit numpy or 32 bit Python. -=- Olivier 2011/9/26 The Helmbolds hel...@yahoo.com Using Source Forge download of NumPy installer package: numpy-1.6.1-win32-superpack-python 2.7.exe. Installation Wizard starts, but then installation fails with error message: Python version 2.7 required, which was not found in the registry Idle says it's using: Python 2.7.2 64 bit AMD64 on Win 32 So what's holding up the installation, and what do I need to do to install numpy? The Helmbolds 2645 E Southern Ave A241 Tempe AZ 85282 Email: hel...@yahoo.com VOX: 480-831-3611 CELL: 602-568-6948 (but not often turned on) ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -- next part -- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20110926/ce5633df/attachment-0001.html -- ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion End of NumPy-Discussion Digest, Vol 60, Issue 55 ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] [ANN] ODE dy/dt = f(t) solver with guaranteed speficiable accuracy
hi all, now free solver interalg from OpenOpt framework (based on interval analysis) can solve ODE dy/dt = f(t) with guaranteed specifiable accuracy. See the ODE webpage for more details, there is an example of comparison with scipy.integrate.odeint, where latter fails to solve a problem. Future plans include solving of some general ODE systems dy/dt = f(y, t). Regards, D. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion