Still slower and worse uses 2x the memory for the intermediate temporary

I propose allowing implicit reductions with ufuncs.  Specifically if out is
provided with shape[axis] = 1, then pass it on to the ufunc with a stride
of 0.  That should allow this to work:

x = np.arange(10)
def add_square_diff(x1, x2, x3):
    return x1 + (x2-x3)**2
result  =np.zeros(1)
np.frompyfunc(add_square_diff, 3, 1)(result, x, np.mean(x), result)

Essentially it creates a reduce for a function which isn't binary.  I think
this would be generally useful.  For instance, finding the min and max in
one pass would be nice:

def minmax(x1, x2, x3):
    return min(x1,x3), max(x2,x3)
minn = np.array([np.inf])
maxx = np.array([-np.inf])
np.frompyfunc(minmax, 3, 2)(minn, maxx, x, minn, maxx)

Note it also allows for arbitrary initial values or identity to be
specified, possibly determined at run time.  I think this would make ufuncs
even more universal.

On Mon, Nov 14, 2016 at 3:38 AM, Jerome Kieffer <>

> On Fri, 11 Nov 2016 11:25:58 -0500
> Matthew Harrigan <> wrote:
> > I started a ufunc to compute the sum of square differences here
> > <>.
> > It is about 4x faster and uses half the memory compared to
> > np.sum(np.square(x-c)).
> Hi Matt,
> Using *blas* you win already a factor two (maybe more depending on you
> blas implementation):
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "np.sum(np.square(x-2.))"
> 10 loops, best of 3: 135 msec per loop
> % python -m timeit -s "import numpy as np;x=np.linspace(0,1,int(1e7))"
> "y=x-2.;,y)"
> 10 loops, best of 3: 70.2 msec per loop
> Cheers,
> --
> Jérôme Kieffer
> _______________________________________________
> NumPy-Discussion mailing list
NumPy-Discussion mailing list

Reply via email to