Re: [Numpy-discussion] numexpr with the new iterator

2011-01-11 Thread Francesc Alted
A Monday 10 January 2011 19:29:33 Mark Wiebe escrigué:
  so, the new code is just  5% slower.  I suppose that removing the
  NPY_ITER_ALIGNED flag would give us a bit more performance, but
  that's great as it is now.  How did you do that?  Your new_iter
  branch in NumPy already deals with unaligned data, right?
 
 Take a look at  lowlevel_strided_loops.c.src.  In this case, the
 buffering setup code calls PyArray_GetDTypeTransferFunction, which
 in turn calls PyArray_GetStridedCopyFn, which on an x86 platform
 returns
 _aligned_strided_to_contig_size8.  This function has a simple loop of
 copies using a npy_uint64 data type.

I see.  Brilliant!

  Well, if you can support reduce operations with your patch that
  would be extremely good news as I'm afraid that the current reduce
  code is a bit broken in Numexpr (at least, I vaguely remember
  seeing it working badly in some cases).
 
 Cool, I'll take a look at some point.  I imagine with the most
 obvious implementation small reductions would perform poorly.

IMO, reductions like sum() or prod() are mainly limited my memory 
access, so my advise would be to not try to over-optimize here, and just 
make use of the new iterator.  We can refine performance later on.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] numexpr with the new iterator

2011-01-11 Thread Francesc Alted
A Tuesday 11 January 2011 06:45:28 Mark Wiebe escrigué:
 On Mon, Jan 10, 2011 at 11:35 AM, Mark Wiebe mwwi...@gmail.com 
wrote:
  I'm a bit curious why the jump from 1 to 2 threads is scaling so
  poorly.
  
   Your timings have improvement factors of 1.85, 1.68, 1.64, and
   1.79.  Since
  
  the computation is trivial data parallelism, and I believe it's
  still pretty far off the memory bandwidth limit, I would expect a
  speedup of 1.95 or higher.
 
 It looks like it is the memory bandwidth which is limiting the
 scalability.

Indeed, this is an increasingly important problem for modern computers.  
You may want to read:

http://www.pytables.org/docs/CISE-12-2-ScientificPro.pdf

;-)

 The slower operations scale much better than faster
 ones.  Below are some timings of successively faster operations. 
 When the operation is slow enough, it scales like I was expecting...
[clip]

Yeah, for another example on this with more threads, see:

http://code.google.com/p/numexpr/wiki/MultiThreadVM

OTOH, I was curious about the performance of the new iterator with 
Intel's VML, but it seems to work decently too:

$ python bench/vml_timing.py (original numexpr, *no* VML support)
*** Numexpr vs NumPy speed-ups ***
Contiguous case: 1.72 (mean), 0.92 (min), 3.07 (max)
Strided case:2.1 (mean), 0.98 (min), 3.52 (max)
Unaligned case:  2.35 (mean), 1.35 (min), 3.31 (max)

$ python bench/vml_timing.py  (original numexpr, VML support)
*** Numexpr vs NumPy speed-ups ***
Contiguous case: 3.83 (mean), 1.1 (min), 10.19 (max)
Strided case:3.21 (mean), 0.98 (min), 7.45 (max)
Unaligned case:  3.6 (mean), 1.47 (min), 7.87 (max)

$ python bench/vml_timing.py (new iter numexpr, VML support)
*** Numexpr vs NumPy speed-ups ***
Contiguous case: 3.56 (mean), 1.12 (min), 7.38 (max)
Strided case:2.37 (mean), 0.09 (min), 7.63 (max)
Unaligned case:  3.56 (mean), 2.08 (min), 5.88 (max)

However, there a couple of quirks here.  1) The original Numexpr 
performs generally faster than the iter version.  2) The strided case is 
quite worse for the iter version.  I've isolated the tests that performs 
worse for the iter version, and here are a couple of samples:

*** Expression: exp(f3)
numpy: 0.0135
numpy strided: 0.0144
  numpy unaligned: 0.0200
  numexpr: 0.0020 Speed-up of numexpr over numpy: 6.6584
  numexpr strided: 0.1495 Speed-up of numexpr over numpy: 0.0962
numexpr unaligned: 0.0049 Speed-up of numexpr over numpy: 4.0859


*** Expression: sin(f3)cos(f4)
numpy: 0.0291
numpy strided: 0.0366
  numpy unaligned: 0.0407
  numexpr: 0.0166 Speed-up of numexpr over numpy: 1.7518
  numexpr strided: 0.1551 Speed-up of numexpr over numpy: 0.2361
numexpr unaligned: 0.0175 Speed-up of numexpr over numpy: 2.3246

Maybe you can shed some light on what's going on here (shall we discuss 
this off-the-list so as to not bore people too much?).

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Drawing circles in a numpy array

2011-01-11 Thread totonixs...@gmail.com
On Mon, Jan 10, 2011 at 11:53 AM, totonixs...@gmail.com
totonixs...@gmail.com wrote:
 Hi all,

 I have this problem: Given some point draw a circle centered in this
 point with radius r. I'm doing that using numpy this way (Snippet code
 from here [1]):

 # Create the initial black and white image
 import numpy as np
 from scipy import ndimage
 a = np.zeros((512, 512)).astype(uint8) #unsigned integer type needed by 
 watershed
 y, x = np.ogrid[0:512, 0:512]
 m1 = ((y-200)**2 + (x-100)**2  30**2)
 m2 = ((y-350)**2 + (x-400)**2  20**2)
 m3 = ((y-260)**2 + (x-200)**2  20**2)
 a[m1+m2+m3]=1
 imshow(a, cmap = cm.gray)# left plot in the image above

 The problem is that it have to evaluate all values from 0 to image
 size (in snippet, from 0 to 512 in X and Y dimensions). There is a
 faster way of doing that? Without evaluate all that values? For
 example: only evaluate from 0 to 30, in a circle centered in (0, 0)
 with radius 30.

 Thanks!
 Thiago Franco de Moraes

 [1] - http://www.scipy.org/Cookbook/Watershed

Hi,

I've just seen I can do something like this:

 radius = 10
 a = np.zeros((512, 512)).astype('uint8')
 cx, cy = 100, 100 # The center of circle
 y, x = np.ogrid[-radius: radius, -radius: radius]
 index = x**2 + y**2 = radius**2
 a[cy-radius:cy+radius, cx-radius:cx+radius][index] = 255

Numpy is very cool!

Is there other way of doing that? Only to know ...

Thanks!
Thiago Franco de Moraes
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Rolling window (moving average, moving std, and more)

2011-01-11 Thread Keith Goodman
On Tue, Jan 4, 2011 at 8:14 AM, Keith Goodman kwgood...@gmail.com wrote:
 On Tue, Jan 4, 2011 at 8:06 AM, Sebastian Haase seb.ha...@gmail.com wrote:
 On Mon, Jan 3, 2011 at 5:32 PM, Erik Rigtorp e...@rigtorp.com wrote:
 On Mon, Jan 3, 2011 at 11:26, Eric Firing efir...@hawaii.edu wrote:
 Instead of calculating statistics independently each time the window is
 advanced one data point, the statistics are updated.  I have not done
 any benchmarking, but I expect this approach to be quick.

 This might accumulate numerical errors. But could be fine for many 
 applications.

 The code is old; I have not tried to update it to take advantage of
 cython's advances over pyrex.  If I were writing it now, I might not
 bother with the C level at all; it could all be done in cython, probably
 with no speed penalty, and maybe even with reduced overhead.


 No doubt this would be faster, I just wanted to offer a general way to
 this in NumPy.
 ___

 BTW, some of these operations can be done using scipy's ndimage  - right ?
 Any comments ?  How does the performance compare ?
 ndimage might have more options regarding edge handling, or ?

 Take a look at the moving window function in the development version
 of the la package:

 https://github.com/kwgoodman/la/blob/master/la/farray/mov.py

 Many of the moving window functions offer three calculation methods:
 filter (ndimage), strides (the strides trick discussed in this
 thread), and loop (a simple python loop).

 For example:

 a = np.random.rand(500,2000)
 timeit la.farray.mov_max(a, window=252, axis=-1, method='filter')
 1 loops, best of 3: 336 ms per loop
 timeit la.farray.mov_max(a, window=252, axis=-1, method='strides')
 1 loops, best of 3: 609 ms per loop
 timeit la.farray.mov_max(a, window=252, axis=-1, method='loop')
 1 loops, best of 3: 638 ms per loop

 No one method is best for all situations. That is one of the reasons I
 started the Bottleneck package. I figured Cython could beat them all.

I added four new function to Bottleneck: move_min, move_max,
move_nanmin, move_nanmax. They are much faster than using SciPy's
ndimage.maximum_filter1d or the strides trick:

 a = np.random.rand(500,2000)
 timeit la.farray.mov_max(a, window=252, axis=-1, method='filter') # ndimage
1 loops, best of 3: 336 ms per loop
 timeit bn.move_max(a, window=252, axis=-1) # bottleneck
100 loops, best of 3: 14.1 ms per loop

That looks too good to be true. Are the outputs the same?

 a1 = la.farray.mov_max(a, window=252, axis=-1, method='filter')
 a2 = bn.move_max(a, window=252, axis=-1)
 np.testing.assert_array_almost_equal(a1, a2)


Yes.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] histogram2d and decreasing bin edges

2011-01-11 Thread Matthias Frank
Hi all,

I've noticed a change in numpy.histogram2d between (possibly very much)
older versions and the current one: The function can no longer handle
the situation where bin edges decrease instead of increasing monotonically.

The reason for this seems to be the handling of outliers histogramdd,
see the output of minimal example below.
If I understand correctly, this is the only place where histogramdd
implicitly assumes monotonically increasing bin edges. If so, this could
be fixed to work with increasing and decreasing bin edges by taking
abs(dedges[i]).min() when calculating the rounding precision. If not, it
might be more consistent, and produce a more meaningful error message,
if histogram2d asserted that bin edges increase monotonically and
otherwise raised an AttributeError as the 1-d histogram() function does
in that case (see below)

Matthias

In [1]: import numpy
In [2]: numpy.__version__
Out[2]: '1.5.1'

In [3]: ascending=numpy.array([0,1])
In [4]: descending=numpy.array([1,0])
In [5]: numpy.histogram2d([0.5],[0.5],bins=(ascending,ascending))
Out[5]: (array([[ 1.]]), array([ 0.,  1.]), array([ 0.,  1.]))

In [6]: numpy.histogram2d([0.5],[0.5],bins=(descending,descending))
Warning: invalid value encountered in log10
---
ValueErrorTraceback (most recent call last)
/lib/python2.6/site-packages/numpy/lib/twodim_base.pyc in histogram2d(x,
y, bins, range, normed, weights)
613 xedges = yedges = asarray(bins, float)
614 bins = [xedges, yedges]
-- 615 hist, edges = histogramdd([x,y], bins, range, normed, weights)
616 return hist, edges[0], edges[1]
617

/lib/python2.6/site-packages/numpy/lib/function_base.pyc in
histogramdd(sample, bins, range, normed, weights)
312 for i in arange(D):
313 # Rounding precision
-- 314 decimal = int(-log10(dedges[i].min())) +6
315 # Find which points are on the rightmost edge.
316 on_edge = where(around(sample[:,i], decimal) ==
around(edges[i][-1],

ValueError: cannot convert float NaN to integer


Behavior of the 1-d histogram()

In [8]: numpy.histogram([0.5],bins=ascending)
Out[8]: (array([1]), array([0, 1]))

In [9]: numpy.histogram([0.5],bins=descending)
---
AttributeErrorTraceback (most recent call last)
/lib/python2.6/site-packages/numpy/lib/function_base.pyc in histogram(a,
bins, range, normed, weights)
160 if (np.diff(bins)  0).any():
161 raise AttributeError(
-- 162 'bins must increase monotonically.')
   
163 

   
 
164 # Histogram is an integer or a float array depending on the
weights.

AttributeError: bins must increase monotonically.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion