Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-22 Thread Stéfan van der Walt
On Sat, Jul 20, 2013 at 7:44 AM,   wrote:
> related: is there any advantage to np.add.reduce?
> I find it more difficult to read than sum() and still see it used sometimes.

I think ``np.add.reduce`` just falls out of the ufunc
implementation--there's no "per ufunc" choice to remove certain parts
of the API, if I recall correctly.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread josef . pktd
On Fri, Jul 19, 2013 at 12:10 PM, Sebastian Berg
 wrote:
> On Fri, 2013-07-19 at 16:14 +0100, Nathaniel Smith wrote:
>> On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg
>>  wrote:
>> > On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote:
>> >> Hi all,
>> >>
> 
>>
>> What I mean is: Suppose we wrote a gufunc for 'sum', where the
>> intrinsic operation took a vector and returned a scalar. (E.g. we want
>> to implement one of the specialized algorithms for vector summation,
>> like Kahan summation, which can be more accurate than applying scalar
>> addition repeatedly.)
>>
>> Then we'd have:
>>
>> np.sum(ones((2, 3))).shape == ()
>> np.add.reduce(ones((2, 3))).shape == (3,)
>> gufunc_sum(ones((2, 3))).shape == (2,)
>>
>
> Ah, indeed! So we have a different default behaviour for ufunc.reduce
> and all other reduce-like functions, didn't realize that. Changing that
> would be one huge thing...

I thought reduce, accumulate and reduceat (and map in python) are
functions on iterators, and numpy still uses axis=0 to iterate over.

related: is there any advantage to np.add.reduce?
I find it more difficult to read than sum() and still see it used sometimes.

(dot with more than 3 dimension is weird, and I never found a use for it.)

Josef


> As to implementing such thing as a Kahan summation, it is true, I also
> can't see how it fits into the machinery. Maybe it shouldn't even be a
> gufunc, but we rather need a way to specialize the reduction, or tag on
> more information into the ufunc itself?
>
> - Sebastian
>
>> These are three names for exactly the same underlying function... but
>> they all have different defaults for how they vectorize.
>>
>> -n
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Nathaniel Smith
On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg
 wrote:
> On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote:
>> Hi all,
>>
> 
>>
>> So:
>>
>> QUESTION 1: does that sound right: that in a perfect world, the
>> current gufunc convention would be the only one, and that's what we
>> should work towards, at least in the cases where that's possible?
>>
>
> Sounds right to me, ufunc/gufunc broadcasting assumes the "inner"
> dimensions are the right-most. Since we are normally in C-order arrays,
> this also seems the sensible way if you consider the memory layout.
>
>> QUESTION 2: Assuming that's right, it would be *really nice* if we
>> could at least get np.dot onto our new convention, for consistency
>> with the rest of np.linalg, and to allow it to be overridden. I'm sort
>> of terrified to touch np.dot's API, but the only cases where it would
>> act differently is when *both* arguments have *3 or more dimensions*,
>> and I guess there are very very few users who fall into that category.
>> So maybe we could start raising some big FutureWarnings for this case
>> in the next release, and eventually switch?
>>
>
> It is noble to try to get do to use the gufunc convention, but if you
> look at the new gufunc linalg functions, they already have to have some
> weird tricks in the case of np.linalg.solve.
> It is so difficult because of the fact that dot is basically a
> combination of many functions:
>   o vector * vector -> vector
>   o vector * matrix -> matrix (add dimensions to vector on right)
>   o matrix * vector -> matrix (add dimensions to vector on left)
>   o matrix * matrix -> matrix
> plus scalar cases.

Oh ugh, I forgot about all those special cases.

> I somewhat believe we should not touch dot, or deprecate anything but
> the most basic dot functionality. Then we can point to matrix_multiply,
> inner1d, etc. which are gufuncs (even if they are not exposed at this
> time). The whole dance that is already done for np.linalg.solve right
> now is not pretty there, and it will be worse for dot. Because dot is
> basically overloaded, marrying it with the broadcasting machinery in a
> general way is impossible.

While it would be kind of nice if we could eventually make
isinstance(np.dot, np.ufunc) be True, I'm not so bothered if it
remains a wrapper around gufuncs (like the np.linalg wrappers
currently are). Most of these special cases, while ugly, can be
handled perfectly well by this sort of mechanism. What I'm most
bothered about is pseudo-outer case:
  np.dot(array with ndim >2, array with ndim >2)
This simply *can't* be emulated with a gufunc. And as long as that's
true it's going to be very hard to get 'dot' to play along with the
general ufunc machinery.

So that's specifically the case I was talking about in question 2.

>> (I'm even more terrified of trying to mess with np.sum or
>> np.add.reduce, so I'll leave that alone for now -- maybe we're just
>> stuck with them. And at least they do already go through the ufunc
>> machinery.)
>
> I did not understand where the inconsistency/problem for the reductions
> is.

What I mean is: Suppose we wrote a gufunc for 'sum', where the
intrinsic operation took a vector and returned a scalar. (E.g. we want
to implement one of the specialized algorithms for vector summation,
like Kahan summation, which can be more accurate than applying scalar
addition repeatedly.)

Then we'd have:

np.sum(ones((2, 3))).shape == ()
np.add.reduce(ones((2, 3))).shape == (3,)
gufunc_sum(ones((2, 3))).shape == (2,)

These are three names for exactly the same underlying function... but
they all have different defaults for how they vectorize.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Sebastian Berg
On Fri, 2013-07-19 at 16:14 +0100, Nathaniel Smith wrote:
> On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg
>  wrote:
> > On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote:
> >> Hi all,
> >>

> 
> What I mean is: Suppose we wrote a gufunc for 'sum', where the
> intrinsic operation took a vector and returned a scalar. (E.g. we want
> to implement one of the specialized algorithms for vector summation,
> like Kahan summation, which can be more accurate than applying scalar
> addition repeatedly.)
> 
> Then we'd have:
> 
> np.sum(ones((2, 3))).shape == ()
> np.add.reduce(ones((2, 3))).shape == (3,)
> gufunc_sum(ones((2, 3))).shape == (2,)
> 

Ah, indeed! So we have a different default behaviour for ufunc.reduce
and all other reduce-like functions, didn't realize that. Changing that
would be one huge thing...
As to implementing such thing as a Kahan summation, it is true, I also
can't see how it fits into the machinery. Maybe it shouldn't even be a
gufunc, but we rather need a way to specialize the reduction, or tag on
more information into the ufunc itself?

- Sebastian

> These are three names for exactly the same underlying function... but
> they all have different defaults for how they vectorize.
> 
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Sebastian Berg
On Fri, 2013-07-19 at 16:31 +0100, Nathaniel Smith wrote:
> On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg
>  wrote:
> > It is so difficult because of the fact that dot is basically a
> > combination of many functions:
> >   o vector * vector -> vector
> >   o vector * matrix -> matrix (add dimensions to vector on right)
> >   o matrix * vector -> matrix (add dimensions to vector on left)
> >   o matrix * matrix -> matrix
> > plus scalar cases.
> 
> Though, just throwing this out there for the archives since I was
> thinking about it...
> 
> I think we *could* consolidate all dot's functionality into a single
> gufunc, with a few small changes:
> 
> 1) Deprecate and get rid of the scalar special cases. (For those
> following along: right now, np.dot(10, array) does scalar
> multiplication, but this doesn't make much sense conceptually, it's
> not documented, and I don't think anyone uses it. Except maybe
> np.matrix.__mul__, but that could be fixed.)
> 
> 2) Deprecate the strange "broadcasting" behaviour for high-dimensional
> inputs, in favor of the gufunc version suggested in the previous
> email.
> 
> That leaves the vector * vector, vector * matrix, matrix * vector,
> matrix * matrix cases. To handle these:
> 
> 3) Extend the gufunc machinery to understand the idea that some core
> dimensions are allowed to take on a special "nonexistent" size. So the
> signature for dot would be:
>   (m*,k) x (k, n*) -> (m*, n*)
> where '*' denotes dimensions who are allowed to take on the
> "nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4)))
> would have
>   m = 2
>   k = 3
>   n = 4
> and produce an output with shape (m, n) = (2, 4). But dot(ones((2,
> 3)), ones((3,))) would have
>   m = 2
>   k = 3
>   n = 
> and produce an output with shape (m, n) = (2, ) = (2,). And
> dot(ones((3,)), ones((3,))) would have
>   m = 
>   k = 3
>   n = 
> and produce an output with shape (m, n) = (, ) = (),
> i.e., dot(vector, vector) would return a scalar.
> 
> I'm not sure if there are any other cases where this would be useful,
> but even if it were just for 'dot', that's still a pretty important
> case that might justify the mechanism all on its own.
> 

Yeah this would work. It is basically what np.linalg.solve currently
does in the preparation step. So maybe this is not that bad implemented
in the machinery. The logic itself is pretty simple after all. Though it
would be one of those features I would probably not want to see used a
lot ;).

- Sebastian

> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Stéfan van der Walt
On Fri, Jul 19, 2013 at 5:31 PM, Nathaniel Smith  wrote:
> 3) Extend the gufunc machinery to understand the idea that some core
> dimensions are allowed to take on a special "nonexistent" size. So the
> signature for dot would be:
>   (m*,k) x (k, n*) -> (m*, n*)
> where '*' denotes dimensions who are allowed to take on the
> "nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4)))
> would have
>   m = 2
>   k = 3
>   n = 4
> and produce an output with shape (m, n) = (2, 4). But dot(ones((2,
> 3)), ones((3,))) would have
>   m = 2
>   k = 3
>   n = 
> and produce an output with shape (m, n) = (2, ) = (2,). And
> dot(ones((3,)), ones((3,))) would have
>   m = 
>   k = 3
>   n = 
> and produce an output with shape (m, n) = (, ) = (),
> i.e., dot(vector, vector) would return a scalar.

This looks like a fairly clean solution; it could be implemented in a
shape pre- and post-processing step, where we pad the array dimensions
to match the full signature, and remove it again afterwards.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Nathaniel Smith
On Thu, Jul 18, 2013 at 2:23 PM, Sebastian Berg
 wrote:
> It is so difficult because of the fact that dot is basically a
> combination of many functions:
>   o vector * vector -> vector
>   o vector * matrix -> matrix (add dimensions to vector on right)
>   o matrix * vector -> matrix (add dimensions to vector on left)
>   o matrix * matrix -> matrix
> plus scalar cases.

Though, just throwing this out there for the archives since I was
thinking about it...

I think we *could* consolidate all dot's functionality into a single
gufunc, with a few small changes:

1) Deprecate and get rid of the scalar special cases. (For those
following along: right now, np.dot(10, array) does scalar
multiplication, but this doesn't make much sense conceptually, it's
not documented, and I don't think anyone uses it. Except maybe
np.matrix.__mul__, but that could be fixed.)

2) Deprecate the strange "broadcasting" behaviour for high-dimensional
inputs, in favor of the gufunc version suggested in the previous
email.

That leaves the vector * vector, vector * matrix, matrix * vector,
matrix * matrix cases. To handle these:

3) Extend the gufunc machinery to understand the idea that some core
dimensions are allowed to take on a special "nonexistent" size. So the
signature for dot would be:
  (m*,k) x (k, n*) -> (m*, n*)
where '*' denotes dimensions who are allowed to take on the
"nonexistent" size if necessary. So dot(ones((2, 3)), ones((3, 4)))
would have
  m = 2
  k = 3
  n = 4
and produce an output with shape (m, n) = (2, 4). But dot(ones((2,
3)), ones((3,))) would have
  m = 2
  k = 3
  n = 
and produce an output with shape (m, n) = (2, ) = (2,). And
dot(ones((3,)), ones((3,))) would have
  m = 
  k = 3
  n = 
and produce an output with shape (m, n) = (, ) = (),
i.e., dot(vector, vector) would return a scalar.

I'm not sure if there are any other cases where this would be useful,
but even if it were just for 'dot', that's still a pretty important
case that might justify the mechanism all on its own.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-19 Thread Stéfan van der Walt
On Thu, Jul 18, 2013 at 2:52 PM, Nathaniel Smith  wrote:
> Compare:
>   gu_dot_leftwards(ones((10, 11, 4)), ones((11, 12, 3, 4))) -> (10, 12, 3, 4)
> versus
>   gu_dot_rightwards(ones((4, 10, 11)), ones((3, 4, 11, 12))) -> (3, 4, 10, 12)

The second makes quite a bit more sense to me, and fits with the
current way we match broadcasting dimensions (align to the right,
match right to left).

The np.dot outer example you gave and other exceptions like that will
probably give us headaches in the future, so I'd opt for moving away
from them.  The way ellipses are broadcast, well that's a battle for
another day.

Stéfan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Bringing order to higher dimensional operations

2013-07-18 Thread Sebastian Berg
On Thu, 2013-07-18 at 13:52 +0100, Nathaniel Smith wrote:
> Hi all,
> 

> 
> So:
> 
> QUESTION 1: does that sound right: that in a perfect world, the
> current gufunc convention would be the only one, and that's what we
> should work towards, at least in the cases where that's possible?
> 

Sounds right to me, ufunc/gufunc broadcasting assumes the "inner"
dimensions are the right-most. Since we are normally in C-order arrays,
this also seems the sensible way if you consider the memory layout.

> QUESTION 2: Assuming that's right, it would be *really nice* if we
> could at least get np.dot onto our new convention, for consistency
> with the rest of np.linalg, and to allow it to be overridden. I'm sort
> of terrified to touch np.dot's API, but the only cases where it would
> act differently is when *both* arguments have *3 or more dimensions*,
> and I guess there are very very few users who fall into that category.
> So maybe we could start raising some big FutureWarnings for this case
> in the next release, and eventually switch?
> 

It is noble to try to get do to use the gufunc convention, but if you
look at the new gufunc linalg functions, they already have to have some
weird tricks in the case of np.linalg.solve.
It is so difficult because of the fact that dot is basically a
combination of many functions:
  o vector * vector -> vector
  o vector * matrix -> matrix (add dimensions to vector on right)
  o matrix * vector -> matrix (add dimensions to vector on left)
  o matrix * matrix -> matrix
plus scalar cases.

I somewhat believe we should not touch dot, or deprecate anything but
the most basic dot functionality. Then we can point to matrix_multiply,
inner1d, etc. which are gufuncs (even if they are not exposed at this
time). The whole dance that is already done for np.linalg.solve right
now is not pretty there, and it will be worse for dot. Because dot is
basically overloaded, marrying it with the broadcasting machinery in a
general way is impossible.

> (I'm even more terrified of trying to mess with np.sum or
> np.add.reduce, so I'll leave that alone for now -- maybe we're just
> stuck with them. And at least they do already go through the ufunc
> machinery.)

I did not understand where the inconsistency/problem for the reductions
is.

- Sebastian

> 
> -n
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Bringing order to higher dimensional operations

2013-07-18 Thread Nathaniel Smith
Hi all,

I hadn't realized until Pauli just pointed it out that np.dot and the
new gufuncs actually have different rules for how they handle extra
axes:
  https://github.com/numpy/numpy/pull/3524#issuecomment-21117601
This is turning into a big mess, and it may take some hard decisions to fix it.

Before I explain the problem, a bit of terminology: a "vectorized
operation" is built by taking an "intrinsic operation" and putting a
loop around it, to apply it many times. So for np.add for example, the
intrinsic operation is scalar addition, and if you put a loop around
this you get vectorized addition.

The question then is, given some input arrays: which parts do you loop
over, and which things you pass to the intrinsic operation?

In the case of scalar operations (like classic ufuncs), this is pretty
straightforward: we broadcast the input arrays together, and loop over
them in parallel. So
  np.add(ones((2, 3, 4)), ones((3, 4))).shape == (2, 3, 4)
The other obvious option is, instead of looping over the two arrays in
parallel, instead find all combinations. This is what the .outer
method on ufuncs does, so
  np.add.outer(ones((2, 3)), ones((4, 5))).shape == (2, 3, 4, 5)

Now, that's for vectorized versions scalar operations. We also have a
bunch of vectorized operations whose "intrinsic operation" is itself a
function over multidimensional arrays. For example, reduction
operations like 'sum' intrinsically take a 1-dim array and return a
0-dim (scalar), but they can be vectorized to apply to a single axis
of a >1-dim array. Or matrix multiplication intrinsically takes two
2-dim arrays and returns a 2-dim array; it can be vectorized to apply
to >2 dim inputs.

(As shorthand I'll write "takes two 2-dim arrays and returns a 2-dim
array" as "2,2->2"; so 'sum' is 1->0 and 'cumsum' is 1->1.)

-

Okay, now I can explain the problem. For vectorized multidimensional
operations, we have four (!) different conventions deployed:

Convention 1: Ufunc .reduce (1->0), .accumulate (1->1), .reduceat
(1,1->1): These pick the 0th axis for the intrinsic axis and loop over
the rest. (By default; can be modified by specifying axis.)
  np.add.reduce(np.ones((2, 3, 4))).shape == (3, 4)
  np.add.accumulate(np.ones((2, 3, 4))).shape == (2, 3, 4)

Convention 2: Reduction (1->0) and accumulation (1->1) operations
defined as top-level functions and ndarray methods (np.sum,
ndarray.sum, np.mean, np.cumprod, etc.): These flatten the array and
use the whole thing as the intrinsic axis. (By default; can be
modified by specifying axis=.)
  np.sum(np.ones((2, 3, 4))).shape == ()
  np.cumsum(np.ones((2, 3, 4))).shape == (24,)

Convention 3: gufuncs (any->any): These take the *last* k axes for the
intrinsic axes, and then broadcast and parallel-loop over the rest.
Cannot currently be modified.
  gu_dot = np.linalg._umath_linalg.matrix_multiply # requires current master
  gu_dot(np.ones((2, 3, 4, 5)), np.ones((1, 3, 5, 6))).shape == (2, 3, 4, 6)

(So what's happened here is that the gufunc pulled off the last two
axes of each array, so the intrinsic operation is always going to be a
matrix multiply of a (4, 5) array by a (5, 6) array, producing a (4,
6) array. Then it broadcast the remaining axes together: (2, 3) and
(1, 3) broadcast to (2, 3), and did a parallel iteration over them:
output[0, 0, :, :] is the result of dot(input1[0, 0, :, :], input2[0,
0, :, :]).)

Convention 4: np.dot (2->2): this one is bizarre:
  np.dot(np.ones((1, 2, 10, 11)), np.ones((101, 102, 11, 12)).shape
== (1, 2, 10, 101, 102, 12)

So what it's done is picked the last two axes to be the intrinsic
axes, just like the gufunc -- so it always does a bunch of matrix
multiplies of a (10, 11) array with an (11, 12) array. But then it
didn't do a ufunc-style parallel loop. Instead it did a
ufunc.outer-style outer loop, in which it found all n^2 ways of
matching up a matrix in the first input with a matrix in the second
input, and took the dot of each. And then it packed these up into an
array with a rather weird shape: first all the looping axes from the
first input, then the first axis of the output matrix, then all the
looping axes from the second input, and then finally the second axis
of the output matrix.

-

There are plenty of general reasons to want to simplify this -- it'd
make numpy easier to explain and understand, simplify the code, etc.
-- but also a few more specific reasons that make it urgent:

- gufuncs haven't really been used much yet, so maybe it'd be easy to
change how they work now. But in the next release, everything in
np.linalg will become a gufunc, so it'll become much harder to change
after that.

- we'd really like np.dot to become a gufunc -- in particular, in
combination with Blake's work on ufunc overrides, this would allow
np.dot() to work on scipy.sparse matrices.

- pretty soon we'll want to turn things like 'mean' into gufuncs too,
for the same reason.

-

Okay, what to do?

The gufunc convention actually seems like the rig