Hello Keith, While I also echo Johann's points about the arbitrariness and non-utility of benchmarking I'll briefly comment just on just a few tests to help out with getting things into idiomatic python/numpy:
Tests 1 and 2 are fairly pointless (empty for loop and empty procedure) that won't actually influence the running time of well-written non-pathological code. Test 3: #Test 3 - Add 200000 scalar ints nrep = 2000000 * scale_factor for i in range(nrep): a = i + 1 well, python looping is slow... one doesn't do such loops in idiomatic code if the underlying intent can be re-cast into array operations in numpy. But here the test is on such a simple operation that it's not clear how to recast in a way that would remain reasonable. Ideally you'd test something like: i = numpy.arange(200000) for j in range(scale_factor): a = i + 1 but that sort of changes what the test is testing. Finally, test 21: #Test 21 - Smooth 512 by 512 byte array, 5x5 boxcar for i in range(nrep): b = scipy.ndimage.filters.median_filter(a, size=(5, 5)) timer.log('Smooth 512 by 512 byte array, 5x5 boxcar, %d times' % nrep) A median filter is definitely NOT a boxcar filter! You want "uniform_filter": In [4]: a = numpy.empty((1000,1000)) In [5]: timeit scipy.ndimage.filters.median_filter(a, size=(5, 5)) 10 loops, best of 3: 93.2 ms per loop In [6]: timeit scipy.ndimage.filters.uniform_filter(a, size=(5, 5)) 10 loops, best of 3: 27.7 ms per loop Zach On Sep 26, 2011, at 10:19 AM, Keith Hughitt wrote: > Hi all, > > Myself and several colleagues have recently started work on a Python library > for solar physics, in order to provide an alternative to the current mainstay > for solar physics, which is written in IDL. > > One of the first steps we have taken is to create a Python port of a popular > benchmark for IDL (time_test3) which measures performance for a variety of > (primarily matrix) operations. In our initial attempt, however, Python > performs significantly poorer than IDL for several of the tests. I have > attached a graph which shows the results for one machine: the x-axis is the > test # being compared, and the y-axis is the time it took to complete the > test, in milliseconds. While it is possible that this is simply due to > limitations in Python/Numpy, I suspect that this is due at least in part to > our lack in familiarity with NumPy and SciPy. > > So my question is, does anyone see any places where we are doing things very > inefficiently in Python? > > In order to try and ensure a fair comparison between IDL and Python there are > some things (e.g. the style of timing and output) which we have deliberately > chosen to do a certain way. In other cases, however, it is likely that we > just didn't know a better method. > > Any feedback or suggestions people have would be greatly appreciated. > Unfortunately, due to the proprietary nature of IDL, we cannot share the > original version of time_test3, but hopefully the comments in time_test3.py > will be clear enough. > > Thanks! > Keith > <sunpy_time_test3_idl_python_2011-09-26.png>_______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion