Re: [Numpy-discussion] numexpr with the new iterator
A Monday 10 January 2011 19:29:33 Mark Wiebe escrigué: so, the new code is just 5% slower. I suppose that removing the NPY_ITER_ALIGNED flag would give us a bit more performance, but that's great as it is now. How did you do that? Your new_iter branch in NumPy already deals with unaligned data, right? Take a look at lowlevel_strided_loops.c.src. In this case, the buffering setup code calls PyArray_GetDTypeTransferFunction, which in turn calls PyArray_GetStridedCopyFn, which on an x86 platform returns _aligned_strided_to_contig_size8. This function has a simple loop of copies using a npy_uint64 data type. I see. Brilliant! Well, if you can support reduce operations with your patch that would be extremely good news as I'm afraid that the current reduce code is a bit broken in Numexpr (at least, I vaguely remember seeing it working badly in some cases). Cool, I'll take a look at some point. I imagine with the most obvious implementation small reductions would perform poorly. IMO, reductions like sum() or prod() are mainly limited my memory access, so my advise would be to not try to over-optimize here, and just make use of the new iterator. We can refine performance later on. -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] numexpr with the new iterator
A Tuesday 11 January 2011 06:45:28 Mark Wiebe escrigué: On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe mwwi...@gmail.com wrote: I'm a bit curious why the jump from 1 to 2 threads is scaling so poorly. Your timings have improvement factors of 1.85, 1.68, 1.64, and 1.79. Since the computation is trivial data parallelism, and I believe it's still pretty far off the memory bandwidth limit, I would expect a speedup of 1.95 or higher. It looks like it is the memory bandwidth which is limiting the scalability. Indeed, this is an increasingly important problem for modern computers. You may want to read: http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf ;-) The slower operations scale much better than faster ones. Below are some timings of successively faster operations. When the operation is slow enough, it scales like I was expecting... [clip] Yeah, for another example on this with more threads, see: http://code.google.com/p/numexpr/wiki/MultiThreadVM OTOH, I was curious about the performance of the new iterator with Intel's VML, but it seems to work decently too: $ python bench/vml_timing.py (original numexpr, *no* VML support) *** Numexpr vs NumPy speed-ups *** Contiguous case: 1.72 (mean), 0.92 (min), 3.07 (max) Strided case:2.1 (mean), 0.98 (min), 3.52 (max) Unaligned case: 2.35 (mean), 1.35 (min), 3.31 (max) $ python bench/vml_timing.py (original numexpr, VML support) *** Numexpr vs NumPy speed-ups *** Contiguous case: 3.83 (mean), 1.1 (min), 10.19 (max) Strided case:3.21 (mean), 0.98 (min), 7.45 (max) Unaligned case: 3.6 (mean), 1.47 (min), 7.87 (max) $ python bench/vml_timing.py (new iter numexpr, VML support) *** Numexpr vs NumPy speed-ups *** Contiguous case: 3.56 (mean), 1.12 (min), 7.38 (max) Strided case:2.37 (mean), 0.09 (min), 7.63 (max) Unaligned case: 3.56 (mean), 2.08 (min), 5.88 (max) However, there a couple of quirks here. 1) The original Numexpr performs generally faster than the iter version. 2) The strided case is quite worse for the iter version. I've isolated the tests that performs worse for the iter version, and here are a couple of samples: *** Expression: exp(f3) numpy: 0.0135 numpy strided: 0.0144 numpy unaligned: 0.0200 numexpr: 0.0020 Speed-up of numexpr over numpy: 6.6584 numexpr strided: 0.1495 Speed-up of numexpr over numpy: 0.0962 numexpr unaligned: 0.0049 Speed-up of numexpr over numpy: 4.0859 *** Expression: sin(f3)cos(f4) numpy: 0.0291 numpy strided: 0.0366 numpy unaligned: 0.0407 numexpr: 0.0166 Speed-up of numexpr over numpy: 1.7518 numexpr strided: 0.1551 Speed-up of numexpr over numpy: 0.2361 numexpr unaligned: 0.0175 Speed-up of numexpr over numpy: 2.3246 Maybe you can shed some light on what's going on here (shall we discuss this off-the-list so as to not bore people too much?). -- Francesc Alted ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Drawing circles in a numpy array
On Mon, Jan 10, 2011 at 11:53 AM, totonixs...@gmail.com totonixs...@gmail.com wrote: Hi all, I have this problem: Given some point draw a circle centered in this point with radius r. I'm doing that using numpy this way (Snippet code from here [1]): # Create the initial black and white image import numpy as np from scipy import ndimage a = np.zeros((512, 512)).astype(uint8) #unsigned integer type needed by watershed y, x = np.ogrid[0:512, 0:512] m1 = ((y-200)**2 + (x-100)**2 30**2) m2 = ((y-350)**2 + (x-400)**2 20**2) m3 = ((y-260)**2 + (x-200)**2 20**2) a[m1+m2+m3]=1 imshow(a, cmap = cm.gray)# left plot in the image above The problem is that it have to evaluate all values from 0 to image size (in snippet, from 0 to 512 in X and Y dimensions). There is a faster way of doing that? Without evaluate all that values? For example: only evaluate from 0 to 30, in a circle centered in (0, 0) with radius 30. Thanks! Thiago Franco de Moraes [1] - http://www.scipy.org/Cookbook/Watershed Hi, I've just seen I can do something like this: radius = 10 a = np.zeros((512, 512)).astype('uint8') cx, cy = 100, 100 # The center of circle y, x = np.ogrid[-radius: radius, -radius: radius] index = x**2 + y**2 = radius**2 a[cy-radius:cy+radius, cx-radius:cx+radius][index] = 255 Numpy is very cool! Is there other way of doing that? Only to know ... Thanks! Thiago Franco de Moraes ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] Rolling window (moving average, moving std, and more)
On Tue, Jan 4, 2011 at 8:14 AM, Keith Goodman kwgood...@gmail.com wrote: On Tue, Jan 4, 2011 at 8:06 AM, Sebastian Haase seb.ha...@gmail.com wrote: On Mon, Jan 3, 2011 at 5:32 PM, Erik Rigtorp e...@rigtorp.com wrote: On Mon, Jan 3, 2011 at 11:26, Eric Firing efir...@hawaii.edu wrote: Instead of calculating statistics independently each time the window is advanced one data point, the statistics are updated. I have not done any benchmarking, but I expect this approach to be quick. This might accumulate numerical errors. But could be fine for many applications. The code is old; I have not tried to update it to take advantage of cython's advances over pyrex. If I were writing it now, I might not bother with the C level at all; it could all be done in cython, probably with no speed penalty, and maybe even with reduced overhead. No doubt this would be faster, I just wanted to offer a general way to this in NumPy. ___ BTW, some of these operations can be done using scipy's ndimage - right ? Any comments ? How does the performance compare ? ndimage might have more options regarding edge handling, or ? Take a look at the moving window function in the development version of the la package: https://github.com/kwgoodman/la/blob/master/la/farray/mov.py Many of the moving window functions offer three calculation methods: filter (ndimage), strides (the strides trick discussed in this thread), and loop (a simple python loop). For example: a = np.random.rand(500,2000) timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') 1 loops, best of 3: 336 ms per loop timeit la.farray.mov_max(a, window=252, axis=-1, method='strides') 1 loops, best of 3: 609 ms per loop timeit la.farray.mov_max(a, window=252, axis=-1, method='loop') 1 loops, best of 3: 638 ms per loop No one method is best for all situations. That is one of the reasons I started the Bottleneck package. I figured Cython could beat them all. I added four new function to Bottleneck: move_min, move_max, move_nanmin, move_nanmax. They are much faster than using SciPy's ndimage.maximum_filter1d or the strides trick: a = np.random.rand(500,2000) timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') # ndimage 1 loops, best of 3: 336 ms per loop timeit bn.move_max(a, window=252, axis=-1) # bottleneck 100 loops, best of 3: 14.1 ms per loop That looks too good to be true. Are the outputs the same? a1 = la.farray.mov_max(a, window=252, axis=-1, method='filter') a2 = bn.move_max(a, window=252, axis=-1) np.testing.assert_array_almost_equal(a1, a2) Yes. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] histogram2d and decreasing bin edges
Hi all, I've noticed a change in numpy.histogram2d between (possibly very much) older versions and the current one: The function can no longer handle the situation where bin edges decrease instead of increasing monotonically. The reason for this seems to be the handling of outliers histogramdd, see the output of minimal example below. If I understand correctly, this is the only place where histogramdd implicitly assumes monotonically increasing bin edges. If so, this could be fixed to work with increasing and decreasing bin edges by taking abs(dedges[i]).min() when calculating the rounding precision. If not, it might be more consistent, and produce a more meaningful error message, if histogram2d asserted that bin edges increase monotonically and otherwise raised an AttributeError as the 1-d histogram() function does in that case (see below) Matthias In [1]: import numpy In [2]: numpy.__version__ Out[2]: '1.5.1' In [3]: ascending=numpy.array([0,1]) In [4]: descending=numpy.array([1,0]) In [5]: numpy.histogram2d([0.5],[0.5],bins=(ascending,ascending)) Out[5]: (array([[ 1.]]), array([ 0., 1.]), array([ 0., 1.])) In [6]: numpy.histogram2d([0.5],[0.5],bins=(descending,descending)) Warning: invalid value encountered in log10 --- ValueErrorTraceback (most recent call last) /lib/python2.6/site-packages/numpy/lib/twodim_base.pyc in histogram2d(x, y, bins, range, normed, weights) 613 xedges = yedges = asarray(bins, float) 614 bins = [xedges, yedges] -- 615 hist, edges = histogramdd([x,y], bins, range, normed, weights) 616 return hist, edges[0], edges[1] 617 /lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogramdd(sample, bins, range, normed, weights) 312 for i in arange(D): 313 # Rounding precision -- 314 decimal = int(-log10(dedges[i].min())) +6 315 # Find which points are on the rightmost edge. 316 on_edge = where(around(sample[:,i], decimal) == around(edges[i][-1], ValueError: cannot convert float NaN to integer Behavior of the 1-d histogram() In [8]: numpy.histogram([0.5],bins=ascending) Out[8]: (array([1]), array([0, 1])) In [9]: numpy.histogram([0.5],bins=descending) --- AttributeErrorTraceback (most recent call last) /lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogram(a, bins, range, normed, weights) 160 if (np.diff(bins) 0).any(): 161 raise AttributeError( -- 162 'bins must increase monotonically.') 163 164 # Histogram is an integer or a float array depending on the weights. AttributeError: bins must increase monotonically. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion