Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Sebastian Berg
On So, 2015-04-05 at 14:13 +0200, Sebastian Berg wrote:
 On So, 2015-04-05 at 00:45 -0700, Jaime Fernández del Río wrote:
  On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río 
 snip
  
  
  A PR it is, #5749 to be precise. I think it has all the bells and
  whistles: integers, boolean and integer 1-D arrays, slices, ellipsis,
  and even newaxis, both for getting and setting. No tests yet, so
  correctness of the implementation is dubious at best. As a small
  example:
  
 
 Looks neat, I am sure there will be some details. Just a quick thought,
 I wonder if it might make sense to even introduce a context manager. Not
 sure how easy it is to make sure that it is thread safe, etc?

Also wondering, because while I think that actually changing numpy is
probably impossible, I do think we can talk about something like:

np.enable_outer_indexing()
or along the lines of:
from numpy.future import outer_indexing

or some such, to do a module wide switch and maybe also allow at some
point to make it easier to write code that is compatible between a
possible followup such as blaze (or also pandas I guess), that uses
incompatible indexing.
I have no clue if this is technically feasible, though.

The python equivalent would be teaching someone to use:

from __future__ import division

even though you don't even tell them that python 3 exists ;), just
because you like the behaviour more.


 
snip
   a = np.arange(60).reshape(3, 4, 5)
   a.ix_
 snip
  
  Jaime
  
  
  -- 
  (\__/)
  ( O.o)
  (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
  planes de dominación mundial.
  ___
  NumPy-Discussion mailing list
  NumPy-Discussion@scipy.org
  http://mail.scipy.org/mailman/listinfo/numpy-discussion
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Sebastian Berg
On So, 2015-04-05 at 00:45 -0700, Jaime Fernández del Río wrote:
 On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río 
snip
 
 
 A PR it is, #5749 to be precise. I think it has all the bells and
 whistles: integers, boolean and integer 1-D arrays, slices, ellipsis,
 and even newaxis, both for getting and setting. No tests yet, so
 correctness of the implementation is dubious at best. As a small
 example:
 

Looks neat, I am sure there will be some details. Just a quick thought,
I wonder if it might make sense to even introduce a context manager. Not
sure how easy it is to make sure that it is thread safe, etc?

If the code is not too difficult, maybe it can even be moved to C.
Though I have to think about it, I think currently we parse from first
index to last, maybe it would be plausible to parse from last to first
so that adding dimensions could be done easily inside the preparation
function. The second axis remapping is probably reasonably easy (if,
like the first thing, tedious).

- Sebastian


PS: One side comment about the discussion. I don't think anyone suggests
that we should not/do not even consider proposals as such, even if it
might looks like that. Not that I can compare, but my guess is that
numpy is actually very open (though no idea if it appears like that,
too).

But also to me it does seem like a lost cause to try to actually change
indexing itself. So maybe that does not sound diplomatic, but without a
specific reasoning about how the change does not wreak havoc, talking
about switching indexing behaviour seems a waste time to me. Please try
to surprise me, but until then


 
  a = np.arange(60).reshape(3, 4, 5)
  a.ix_
snip
 
 Jaime
 
 
 -- 
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
 planes de dominación mundial.
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Jaime Fernández del Río
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
 based on Stephan's code plus some axis remapping, that provides all the
 needed functionality for getting and setting with orthogonal indices.

 Would those interested rather see it as a gist to play around with, or as
 a PR adding an orthogonally indexable `.ix_` argument to ndarray?


A PR it is, #5749 https://github.com/numpy/numpy/pull/5749 to be precise.
I think it has all the bells and whistles: integers, boolean and integer
1-D arrays, slices, ellipsis, and even newaxis, both for getting and
setting. No tests yet, so correctness of the implementation is dubious at
best. As a small example:

 a = np.arange(60).reshape(3, 4, 5)
 a.ix_
numpy.core._indexer.OrthogonalIndexer at 0x1027979d0
 a.ix_[[0, 1], :, [True, False, True, False, True]]
array([[[ 0,  2,  4],
[ 5,  7,  9],
[10, 12, 14],
[15, 17, 19]],

   [[20, 22, 24],
[25, 27, 29],
[30, 32, 34],
[35, 37, 39]]])
 a.ix_[[0, 1], :, [True, False, True, False, True]] = 0
 a
array([[[ 0,  1,  0,  3,  0],
[ 0,  6,  0,  8,  0],
[ 0, 11,  0, 13,  0],
[ 0, 16,  0, 18,  0]],

   [[ 0, 21,  0, 23,  0],
[ 0, 26,  0, 28,  0],
[ 0, 31,  0, 33,  0],
[ 0, 36,  0, 38,  0]],

   [[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-05 Thread Robert Kern
On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith n...@pobox.com wrote:

 On Apr 4, 2015 4:12 AM, Todd toddr...@gmail.com wrote:
 
 
  On Apr 4, 2015 10:54 AM, Nathaniel Smith n...@pobox.com wrote:
  
   Core python broke backcompat on a regular basis throughout the python
   2 series, and almost certainly will again -- the bar to doing so is
   *very* high, and they use elaborate mechanisms to ease the way
   (__future__, etc.), but they do it. A few months ago there was even
   some serious consideration given to changing py3 bytestring indexing
   to return bytestrings instead of integers. (Consensus was
   unsurprisingly that this was a bad idea, but there were core devs
   seriously exploring it, and no-one complained about the optics.)
 
  There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3
switch.  The basic mechanics of indexing are just too fundamental and touch
on too many things to make this sort of change feasible.

 I'm afraid I'm not clever enough to know how large or feasible a change
is without even seeing the proposed change.

It doesn't take any cleverness. The change in question was to make the
default indexing semantics to orthogonal indexing. No matter the details of
the ultimate proposal to achieve that end, it has known minimum
consequences, at least in the broad outline. Current documentation and
books become obsolete for a fundamental operation. Current code must be
modified by some step to continue working. These are consequences inherent
in the end, not just the means to the end; we don't need a concrete
proposal in front of us to know what they are. There are ways to mitigate
these consequences, but there are no silver bullets that eliminate them.
And we can compare those consequences to approaches like Jaime's that
achieve a majority of the benefits of such a change without any of the
negative consequences. That comparison does not bode well for any proposal.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Robert Kern
On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com
wrote:
 
  On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:

  So I'd be very happy to see worked out proposals for any or
  all of these approaches. It strikes me as really premature to be
  issuing proclamations about what changes might be considered. There is
  really no danger to *considering* a proposal;
 
  Sorry, I have to disagree. Numpy is already seen by some as having a
poor
  track record on backwards compatibility. Having core developers say
propose
  some backcompat break to how indexing works and we'll consider it
makes our
  stance on that look even worse. Of course everyone is free to make any
  technical proposal they deem fit and we'll consider the merits of it.
  However I'd like us to be clear that we do care strongly about backwards
  compatibility and that the fundamentals of the core of Numpy (things
like
  indexing, broadcasting, dtypes and ufuncs) will not be changed in
  backwards-incompatible ways.
 
  Ralf
 
  P.S. also not for a possible numpy 2.0 (or have we learned nothing from
  Python3?).

 I agree 100% that we should and do care strongly about backwards
 compatibility. But you're saying in one sentence that we should tell
 people that we won't consider backcompat breaks, and then in the next
 sentence that of course we actually will consider them (even if we
 almost always reject them). Basically, I think saying one thing and
 doing another is not a good way to build people's trust.

There is a difference between politely considering what proposals people
send us uninvited and inviting people to work on specific proposals. That
is what Ralf was getting at.

--
Robert Kern
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Todd
On Apr 4, 2015 10:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com
wrote:
 
 
  On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:
 
 
  But, the real problem here is that we have two different array duck
  types that force everyone to write their code twice. This is a
  terrible state of affairs! (And exactly analogous to the problems
  caused by np.ndarray disagreeing with np.matrix  scipy.sparse about
  the the proper definition of *, which PEP 465 may eventually
  alleviate.) IMO we should be solving this indexing problem directly,
  not applying bandaids to its symptoms, and the way to do that is to
  come up with some common duck type that everyone can agree on.
 
  Unfortunately, AFAICT this means our only options here are to have
  some kind of backcompat break in numpy, some kind of backcompat break
  in pandas, or to do nothing and continue indefinitely with the status
  quo where the same indexing operation might silently return different
  results depending on the types passed in. All of these options have
  real costs for users, and it isn't at all clear to me what the
  relative costs will be when we dig into the details of our various
  options.
 
 
  I doubt that there is a reasonable way to quantify those costs,
especially
  those of breaking backwards compatibility. If someone has a good
method, I'd
  be interested though.

 I'm a little nervous about how easily this argument might turn into
 either A or B is better but we can't be 100% *certain* which it is so
 instead of doing our best using the data available we should just
 choose B. Being a maintainer means accepting uncertainty and doing
 our best anyway.

I think the burden of proof needs to be on the side proposing a change, and
the more invasive the change the higher that burden needs to be.

When faced with a situation like this, where the proposed change will cause
fundamental alterations to the most basic, high-level operation of numpy,
and where the is an alternative approach with no backwards-compatibility
issues, I think the burden of proof would necessarily be nearly impossibly
large.

 But that said I'm still totally on board with erring on the side of
 caution (in particular, you can never go back and *un*break
 backcompat). An obvious challenge to anyone trying to take this
 forward (in any direction!) would definitely be to gather the most
 useful data possible. And it's not obviously impossible -- maybe one
 could do something useful by scanning ASTs of lots of packages (I have
 a copy of pypi if anyone wants it, that I downloaded with the idea of
 making some similar arguments for why core python should slightly
 break backcompat to allow overloading of a  b  c syntax), or adding
 instrumentation to numpy, or running small-scale usability tests, or
 surveying people, or ...

 (I was pretty surprised by some of the data gathered during the PEP
 465 process, e.g. on how common dot() calls are relative to existing
 built-in operators, and on its associativity in practice.)

Surveys like this have the problem of small sample size and selection bias.
Usability studies can't measure the effect of the compatibility break,  not
to mention the effect on numpy's reputation. This is considerably more
difficult to scan existing projects for than .dot because it depends on the
type being passed (which may not even be defined in the same project). And
I am not sure I much like the idea of numpy phoning home by default, and
an opt-in had the same issues as a survey.

So to make a long story short, in this sort of situation I have a hard time
imaging ways to get enough reliable, representative data to justify this
level of backwards compatibility break.

 Core python broke backcompat on a regular basis throughout the python
 2 series, and almost certainly will again -- the bar to doing so is
 *very* high, and they use elaborate mechanisms to ease the way
 (__future__, etc.), but they do it. A few months ago there was even
 some serious consideration given to changing py3 bytestring indexing
 to return bytestrings instead of integers. (Consensus was
 unsurprisingly that this was a bad idea, but there were core devs
 seriously exploring it, and no-one complained about the optics.)

There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3
switch.  The basic mechanics of indexing are just too fundamental and touch
on too many things to make this sort of change feasible. It would be better
to have a new language, or in this case anew project.

 It's true that numpy has something of a bad reputation in this area,
 and I think it's because until ~1.7 or so, we randomly broke stuff by
 accident on a pretty regular basis, even in bug fix releases. I
 think the way to rebuild that trust is to honestly say to our users
 that when we do break backcompat, we will never do it by 

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Ralf Gommers
On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:


 But, the real problem here is that we have two different array duck
 types that force everyone to write their code twice. This is a
 terrible state of affairs! (And exactly analogous to the problems
 caused by np.ndarray disagreeing with np.matrix  scipy.sparse about
 the the proper definition of *, which PEP 465 may eventually
 alleviate.) IMO we should be solving this indexing problem directly,
 not applying bandaids to its symptoms, and the way to do that is to
 come up with some common duck type that everyone can agree on.

 Unfortunately, AFAICT this means our only options here are to have
 some kind of backcompat break in numpy, some kind of backcompat break
 in pandas, or to do nothing and continue indefinitely with the status
 quo where the same indexing operation might silently return different
 results depending on the types passed in. All of these options have
 real costs for users, and it isn't at all clear to me what the
 relative costs will be when we dig into the details of our various
 options.


I doubt that there is a reasonable way to quantify those costs, especially
those of breaking backwards compatibility. If someone has a good method,
I'd be interested though.


 So I'd be very happy to see worked out proposals for any or
 all of these approaches. It strikes me as really premature to be
 issuing proclamations about what changes might be considered. There is
 really no danger to *considering* a proposal;


Sorry, I have to disagree. Numpy is already seen by some as having a poor
track record on backwards compatibility. Having core developers say
propose some backcompat break to how indexing works and we'll consider it
makes our stance on that look even worse. Of course everyone is free to
make any technical proposal they deem fit and we'll consider the merits of
it. However I'd like us to be clear that we do care strongly about
backwards compatibility and that the fundamentals of the core of Numpy
(things like indexing, broadcasting, dtypes and ufuncs) will not be changed
in backwards-incompatible ways.

Ralf

P.S. also not for a possible numpy 2.0 (or have we learned nothing from
Python3?).
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Nathaniel Smith
On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com wrote:


 On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:


 But, the real problem here is that we have two different array duck
 types that force everyone to write their code twice. This is a
 terrible state of affairs! (And exactly analogous to the problems
 caused by np.ndarray disagreeing with np.matrix  scipy.sparse about
 the the proper definition of *, which PEP 465 may eventually
 alleviate.) IMO we should be solving this indexing problem directly,
 not applying bandaids to its symptoms, and the way to do that is to
 come up with some common duck type that everyone can agree on.

 Unfortunately, AFAICT this means our only options here are to have
 some kind of backcompat break in numpy, some kind of backcompat break
 in pandas, or to do nothing and continue indefinitely with the status
 quo where the same indexing operation might silently return different
 results depending on the types passed in. All of these options have
 real costs for users, and it isn't at all clear to me what the
 relative costs will be when we dig into the details of our various
 options.


 I doubt that there is a reasonable way to quantify those costs, especially
 those of breaking backwards compatibility. If someone has a good method, I'd
 be interested though.

I'm a little nervous about how easily this argument might turn into
either A or B is better but we can't be 100% *certain* which it is so
instead of doing our best using the data available we should just
choose B. Being a maintainer means accepting uncertainty and doing
our best anyway.

But that said I'm still totally on board with erring on the side of
caution (in particular, you can never go back and *un*break
backcompat). An obvious challenge to anyone trying to take this
forward (in any direction!) would definitely be to gather the most
useful data possible. And it's not obviously impossible -- maybe one
could do something useful by scanning ASTs of lots of packages (I have
a copy of pypi if anyone wants it, that I downloaded with the idea of
making some similar arguments for why core python should slightly
break backcompat to allow overloading of a  b  c syntax), or adding
instrumentation to numpy, or running small-scale usability tests, or
surveying people, or ...

(I was pretty surprised by some of the data gathered during the PEP
465 process, e.g. on how common dot() calls are relative to existing
built-in operators, and on its associativity in practice.)


 So I'd be very happy to see worked out proposals for any or
 all of these approaches. It strikes me as really premature to be
 issuing proclamations about what changes might be considered. There is
 really no danger to *considering* a proposal;


 Sorry, I have to disagree. Numpy is already seen by some as having a poor
 track record on backwards compatibility. Having core developers say propose
 some backcompat break to how indexing works and we'll consider it makes our
 stance on that look even worse. Of course everyone is free to make any
 technical proposal they deem fit and we'll consider the merits of it.
 However I'd like us to be clear that we do care strongly about backwards
 compatibility and that the fundamentals of the core of Numpy (things like
 indexing, broadcasting, dtypes and ufuncs) will not be changed in
 backwards-incompatible ways.

 Ralf

 P.S. also not for a possible numpy 2.0 (or have we learned nothing from
 Python3?).

I agree 100% that we should and do care strongly about backwards
compatibility. But you're saying in one sentence that we should tell
people that we won't consider backcompat breaks, and then in the next
sentence that of course we actually will consider them (even if we
almost always reject them). Basically, I think saying one thing and
doing another is not a good way to build people's trust.

Core python broke backcompat on a regular basis throughout the python
2 series, and almost certainly will again -- the bar to doing so is
*very* high, and they use elaborate mechanisms to ease the way
(__future__, etc.), but they do it. A few months ago there was even
some serious consideration given to changing py3 bytestring indexing
to return bytestrings instead of integers. (Consensus was
unsurprisingly that this was a bad idea, but there were core devs
seriously exploring it, and no-one complained about the optics.)

It's true that numpy has something of a bad reputation in this area,
and I think it's because until ~1.7 or so, we randomly broke stuff by
accident on a pretty regular basis, even in bug fix releases. I
think the way to rebuild that trust is to honestly say to our users
that when we do break backcompat, we will never do it by accident, and
we will do it only rarely, after careful consideration, with the
smoothest transition possible, only in situations where we are
convinced that it the net best possible solution for our users, and
only after public discussion and 

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Nathaniel Smith
On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern robert.k...@gmail.com wrote:
 On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com
 wrote:
 
  On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com wrote:

  So I'd be very happy to see worked out proposals for any or
  all of these approaches. It strikes me as really premature to be
  issuing proclamations about what changes might be considered. There is
  really no danger to *considering* a proposal;
 
  Sorry, I have to disagree. Numpy is already seen by some as having a
  poor
  track record on backwards compatibility. Having core developers say
  propose
  some backcompat break to how indexing works and we'll consider it makes
  our
  stance on that look even worse. Of course everyone is free to make any
  technical proposal they deem fit and we'll consider the merits of it.
  However I'd like us to be clear that we do care strongly about backwards
  compatibility and that the fundamentals of the core of Numpy (things
  like
  indexing, broadcasting, dtypes and ufuncs) will not be changed in
  backwards-incompatible ways.
 
  Ralf
 
  P.S. also not for a possible numpy 2.0 (or have we learned nothing from
  Python3?).

 I agree 100% that we should and do care strongly about backwards
 compatibility. But you're saying in one sentence that we should tell
 people that we won't consider backcompat breaks, and then in the next
 sentence that of course we actually will consider them (even if we
 almost always reject them). Basically, I think saying one thing and
 doing another is not a good way to build people's trust.

 There is a difference between politely considering what proposals people
 send us uninvited and inviting people to work on specific proposals. That is
 what Ralf was getting at.

I mean, I get that Ralf read my bit quoted above and got worried that
people would read it as numpy core team announces they don't care
about backcompat, which is fair enough. Sometimes people jump to all
kinds of conclusions, esp. when confirmation bias meets skim-reading
meets hastily-written emails.

But it's just not true that I read people's proposals out of
politeness; I read them because I'm interested, because they might
surprise us by being more practical/awesome/whatever than we expect,
and because we all learn things by giving them due consideration
regardless of the final outcome. So yeah, I do honestly do want to see
people work on specific proposals for important problems (and this
indexing thing strikes me as important), even proposals that involve
breaking backcompat. Pretending otherwise would still be a lie, at
least on my part. So the distinction you're making here doesn't help
me much.

-n

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Ralf Gommers
On Sat, Apr 4, 2015 at 1:11 PM, Todd toddr...@gmail.com wrote:


 There was no break as large as this. In fact I would say this is even a
 larger change than any individual change we saw in the python 2 to 3
 switch.

Well, the impact of what Python3 did to everyone's string handling code
caused so much work that it's close to impossible to top that within numpy
I'd say:)

Ralf

The basic mechanics of indexing are just too fundamental and touch on too
 many things to make this sort of change feasible. It would be better to
 have a new language, or in this case anew project.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Ralf Gommers
On Sat, Apr 4, 2015 at 11:38 AM, Nathaniel Smith n...@pobox.com wrote:

 On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern robert.k...@gmail.com wrote:
  On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith n...@pobox.com wrote:
 
  On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers ralf.gomm...@gmail.com
  wrote:
  
   On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith n...@pobox.com
 wrote:
 
   So I'd be very happy to see worked out proposals for any or
   all of these approaches. It strikes me as really premature to be
   issuing proclamations about what changes might be considered. There
 is
   really no danger to *considering* a proposal;
  
   Sorry, I have to disagree. Numpy is already seen by some as having a
   poor
   track record on backwards compatibility. Having core developers say
   propose
   some backcompat break to how indexing works and we'll consider it
 makes
   our
   stance on that look even worse. Of course everyone is free to make any
   technical proposal they deem fit and we'll consider the merits of it.
   However I'd like us to be clear that we do care strongly about
 backwards
   compatibility and that the fundamentals of the core of Numpy (things
   like
   indexing, broadcasting, dtypes and ufuncs) will not be changed in
   backwards-incompatible ways.
  
   Ralf
  
   P.S. also not for a possible numpy 2.0 (or have we learned nothing
 from
   Python3?).
 
  I agree 100% that we should and do care strongly about backwards
  compatibility. But you're saying in one sentence that we should tell
  people that we won't consider backcompat breaks, and then in the next
  sentence that of course we actually will consider them (even if we
  almost always reject them). Basically, I think saying one thing and
  doing another is not a good way to build people's trust.
 
  There is a difference between politely considering what proposals people
  send us uninvited and inviting people to work on specific proposals.
 That is
  what Ralf was getting at.

 I mean, I get that Ralf read my bit quoted above and got worried that
 people would read it as numpy core team announces they don't care
 about backcompat, which is fair enough. Sometimes people jump to all
 kinds of conclusions, esp. when confirmation bias meets skim-reading
 meets hastily-written emails.

 But it's just not true that I read people's proposals out of
 politeness; I read them because I'm interested, because they might
 surprise us by being more practical/awesome/whatever than we expect,
 and because we all learn things by giving them due consideration
 regardless of the final outcome.


Thanks for explaining, good perspective.


 So yeah, I do honestly do want to see
 people work on specific proposals for important problems (and this
 indexing thing strikes me as important), even proposals that involve
 breaking backcompat. Pretending otherwise would still be a lie, at
 least on my part. So the distinction you're making here doesn't help
 me much.


A change in semantics would help already. If you'd phrased it for example
as:

  I'd personally be interested in seeing a description of what changes,
including backwards-incompatible ones, would need to be made to numpy
indexing behavior to resolve this situation. We could learn a lot from such
an exercise.,

that would have invited the same investigation from interested people
without creating worries about Numpy stability. And without potentially
leading new enthusiastic contributors to believe that this is an
opportunity to make an important change to Numpy: 99.9% chance that they'd
be disappointed after having their well thought out proposal rejected.

Cheers,
Ralf



 -n

 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-04 Thread Nathaniel Smith
On Apr 4, 2015 4:12 AM, Todd toddr...@gmail.com wrote:


 On Apr 4, 2015 10:54 AM, Nathaniel Smith n...@pobox.com wrote:
 
  Core python broke backcompat on a regular basis throughout the python
  2 series, and almost certainly will again -- the bar to doing so is
  *very* high, and they use elaborate mechanisms to ease the way
  (__future__, etc.), but they do it. A few months ago there was even
  some serious consideration given to changing py3 bytestring indexing
  to return bytestrings instead of integers. (Consensus was
  unsurprisingly that this was a bad idea, but there were core devs
  seriously exploring it, and no-one complained about the optics.)

 There was no break as large as this. In fact I would say this is even a
larger change than any individual change we saw in the python 2 to 3
switch.  The basic mechanics of indexing are just too fundamental and touch
on too many things to make this sort of change feasible.

I'm afraid I'm not clever enough to know how large or feasible a change is
without even seeing the proposed change. I may well agree with you when I
do see it; I just prefer to base important decisions on as much data as
possible.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Pauli Virtanen
03.04.2015, 04:09, josef.p...@gmail.com kirjoitti:
[clip]
 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.
 
 I think it should be DOA, except as a discussion topic for numpy 3000.

If you change how Numpy indexing works, you need to scrap a nontrivial
amount of existing code, at which point everybody should just go back to
Matlab, which at least provides a stable API.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Jaime Fernández del Río
I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
based on Stephan's code plus some axis remapping, that provides all the
needed functionality for getting and setting with orthogonal indices.

Would those interested rather see it as a gist to play around with, or as a
PR adding an orthogonally indexable `.ix_` argument to ndarray?

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Stephan Hoyer
On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely
 based on Stephan's code plus some axis remapping, that provides all the
 needed functionality for getting and setting with orthogonal indices.


Awesome, thanks!


 Would those interested rather see it as a gist to play around with, or as
 a PR adding an orthogonally indexable `.ix_` argument to ndarray?


My preference would be for a PR (even if it's purely a prototype) because
it supports inline comments better than a gist.

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Eric Firing
On 2015/04/03 7:59 AM, Jaime Fernández del Río wrote:
 I have an all-Pyhton implementation of an OrthogonalIndexer class,
 loosely based on Stephan's code plus some axis remapping, that provides
 all the needed functionality for getting and setting with orthogonal
 indices.

Excellent!


 Would those interested rather see it as a gist to play around with, or
 as a PR adding an orthogonally indexable `.ix_` argument to ndarray?

I think the PR would be easier to test.

Eric


 Jaime

 --
 (\__/)
 ( O.o)
 (  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus
 planes de dominación mundial.




 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Nathaniel Smith
On Apr 1, 2015 2:17 AM, R Hattersley rhatters...@gmail.com wrote:

 There are two different interpretations in common use of how to handle 
 multi-valued (array/sequence) indexes. The numpy style is to consider all 
 multi-valued indices together which allows arbitrary points to be extracted. 
 The orthogonal style (e.g. as provided by netcdf4-python) is to consider each 
 multi-valued index independently.

 For example:

  type(v)
 type 'netCDF4.Variable'
  v.shape
 (240, 37, 49)
  v[(0, 1), (0, 2, 3)].shape
 (2, 3, 49)
  np.array(v)[(0, 1), (0, 2, 3)].shape
 Traceback (most recent call last):
   File stdin, line 1, in module
 IndexError: shape mismatch: indexing arrays could not be broadcast together 
 with shapes (2,) (3,)


 In a netcdf4-python GitHub issue the authors of various orthogonal indexing 
 packages have been discussing how to distinguish the two behaviours and have 
 currently settled on a boolean __orthogonal_indexing__ attribute.

I guess my feeling is that this attribute is a fine solution to the
wrong problem. If I understand the situation correctly: users are
writing two copies of their indexing code to handle two different
array-duck-types (those that do broadcasting indexing and those that
do Cartesian product indexing), and then have trouble knowing which
set of code to use for a given object. The problem that
__orthogonal_indexing__ solves is that it makes easier to decide which
code to use. It works well for this, great.

But, the real problem here is that we have two different array duck
types that force everyone to write their code twice. This is a
terrible state of affairs! (And exactly analogous to the problems
caused by np.ndarray disagreeing with np.matrix  scipy.sparse about
the the proper definition of *, which PEP 465 may eventually
alleviate.) IMO we should be solving this indexing problem directly,
not applying bandaids to its symptoms, and the way to do that is to
come up with some common duck type that everyone can agree on.

Unfortunately, AFAICT this means our only options here are to have
some kind of backcompat break in numpy, some kind of backcompat break
in pandas, or to do nothing and continue indefinitely with the status
quo where the same indexing operation might silently return different
results depending on the types passed in. All of these options have
real costs for users, and it isn't at all clear to me what the
relative costs will be when we dig into the details of our various
options. So I'd be very happy to see worked out proposals for any or
all of these approaches. It strikes me as really premature to be
issuing proclamations about what changes might be considered. There is
really no danger to *considering* a proposal; the worst case is that
we end up rejecting it anyway, but based on better information.

-n
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-03 Thread Stephan Hoyer
On Fri, Apr 3, 2015 at 4:54 PM, Nathaniel Smith n...@pobox.com wrote:

 Unfortunately, AFAICT this means our only options here are to have
 some kind of backcompat break in numpy, some kind of backcompat break
 in pandas, or to do nothing and continue indefinitely with the status
 quo where the same indexing operation might silently return different
 results depending on the types passed in.


For what it's worth, DataFrame.__getitem__ is also pretty broken in pandas
(even worse than in NumPy). Not even the pandas devs can keep straight how
it works!
https://github.com/pydata/pandas/issues/9595

So we'll probably need a backwards incompatible switch there at some point,
too.

That said, the issues are somewhat different, and in my experience the
strict label and integer based indexers .loc and .iloc work pretty well. I
haven't heard any complaints about how they do cartesian indexing rather
than fancy indexing.

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Stephan Hoyer
On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río 
jaime.f...@gmail.com wrote:

 Is there any other package implementing non-orthogonal indexing aside from
 numpy?


I think we can safely say that NumPy's implementation of broadcasting
indexing is unique :).

The issue is that many other packages rely on numpy for implementation of
custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
immediately obvious what sort of indexing these objects represent.

If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
 for improving that to provide the full functionality of orthogonal
 indexing. I just need a little more convincing that those new
 attributes/indexers are going to ever see any real use.


Orthogonal indexing is close to the norm for packages that implement
labeled data structures, both because it's easier to understand and
implement, and because it's difficult to maintain associations with labels
through complex broadcasting indexing.

Unfortunately, the lack of a full featured implementation of orthogonal
indexing has lead to that wheel being reinvented at least three times (in
Iris, xray [1] and pandas). So it would be nice to have a canonical
implementation that supports slices and integers in numpy for that reason
alone. This could be done by building on the existing `np.ix_` function,
but a new indexer seems more elegant: there's just much less noise with
`arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.

It's also well known that indexing with __getitem__ can be much slower than
np.take. It seems plausible to me that a careful implementation of
orthogonal indexing could close or eliminate this speed gap, because the
model for orthogonal indexing is so much simpler than that for broadcasting
indexing: each element of the key tuple can be applied separately along the
corresponding axis.

So I think there could be a real benefit to having the feature in numpy. In
particular, if somebody is up for implementing it in C or Cython, I would
be very pleased.

 Cheers,
Stephan

[1] Here is my implementation of remapping from orthogonal to broadcasting
indexing. It works, but it's a real mess, especially because I try to
optimize by minimizing the number of times slices are converted into arrays:
https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Sebastian Berg
On Do, 2015-04-02 at 01:29 -0700, Stephan Hoyer wrote:
 On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río
 jaime.f...@gmail.com wrote:
 Is there any other package implementing non-orthogonal
 indexing aside from numpy?
 
 
 I think we can safely say that NumPy's implementation of broadcasting
 indexing is unique :).
 
 
 The issue is that many other packages rely on numpy for implementation
 of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's
 not immediately obvious what sort of indexing these objects represent.
 
 
 If the functionality is lacking, e,g, use of slices in
 `np.ix_`, I'm all for improving that to provide the full
 functionality of orthogonal indexing. I just need a little
 more convincing that those new attributes/indexers are going
 to ever see any real use.
 
 
 
 Orthogonal indexing is close to the norm for packages that implement
 labeled data structures, both because it's easier to understand and
 implement, and because it's difficult to maintain associations with
 labels through complex broadcasting indexing.
 
 
 Unfortunately, the lack of a full featured implementation of
 orthogonal indexing has lead to that wheel being reinvented at least
 three times (in Iris, xray [1] and pandas). So it would be nice to
 have a canonical implementation that supports slices and integers in
 numpy for that reason alone. This could be done by building on the
 existing `np.ix_` function, but a new indexer seems more elegant:
 there's just much less noise with `arr.ix_[:1, 2, [3]]` than
 `arr[np.ix_(slice(1), 2, [3])]`.
 
 
 It's also well known that indexing with __getitem__ can be much slower
 than np.take. It seems plausible to me that a careful implementation
 of orthogonal indexing could close or eliminate this speed gap,
 because the model for orthogonal indexing is so much simpler than that
 for broadcasting indexing: each element of the key tuple can be
 applied separately along the corresponding axis.
 

Wrong (sorry, couldn't resist ;)), since 1.9. take is not typically
faster unless you have a small subspace (subspace are the
non-indexed/slice-indexed axes, though I guess small subspace is common
in some cases, i.e. Nx3 array), it should typically be noticeably slower
for large subspaces at the moment.

Anyway, unfortunately while orthogonal indexing may seem simpler, as you
probably noticed, mapping it fully featured to advanced indexing does
not seem like a walk in the park due to how axis remapping works when
you have a combination of slices and advanced indices.

It might be possible to basically implement a second MapIterSwapaxis in
addition to adding extra axes to the inputs (which I think would need a
post-processing step, but that is not that bad). If you do that, you can
mostly reuse the current machinery and avoid most of the really annoying
code blocks which set up the iterators for the various special cases.
Otherwise, for hacking it of course you can replace the slices by arrays
as well ;).

 
 So I think there could be a real benefit to having the feature in
 numpy. In particular, if somebody is up for implementing it in C or
 Cython, I would be very pleased.
 
 
  Cheers,
 
 Stephan
 
 
 [1] Here is my implementation of remapping from orthogonal to
 broadcasting indexing. It works, but it's a real mess, especially
 because I try to optimize by minimizing the number of times slices are
 converted into arrays:
 https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68
 
 
 
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion



signature.asc
Description: This is a digitally signed message part
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Colin J. Williams


On 02-Apr-15 4:35 PM, Eric Firing wrote:
 On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote:
 Swapping the axis when slices are mixed with fancy indexing was a
 design mistake, IMO. But not fancy indexing itself.
 I'm not saying there should be no fancy indexing capability; I am saying
 that it should be available through a function or method, rather than
 via the square brackets.  Square brackets should do things that people
 expect them to do--the most common and easy-to-understand style of indexing.

 Eric
+1
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing
On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote:
 Swapping the axis when slices are mixed with fancy indexing was a
 design mistake, IMO. But not fancy indexing itself.

I'm not saying there should be no fancy indexing capability; I am saying 
that it should be available through a function or method, rather than 
via the square brackets.  Square brackets should do things that people 
expect them to do--the most common and easy-to-understand style of indexing.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd
On Thu, Apr 2, 2015 at 2:03 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
 We probably need more traction on the should this be done? discussion
 than on the can this be done? one, the need for a reordering of the
 axes swings me slightly in favor, but I mostly don't see it yet.

 As a long-time user of numpy, and an advocate and teacher of Python for
 science, here is my perspective:

 Fancy indexing is a horrible design mistake--a case of cleverness run
 amok.  As you can read in the Numpy documentation, it is hard to
 explain, hard to understand, hard to remember.  Its use easily leads to
 unreadable code and hard-to-see errors.  Here is the essence of an
 example that a student presented me with just this week, in the context
 of reordering eigenvectors based on argsort applied to eigenvalues:

 In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))

 In [26]: ii = np.arange(4)

 In [27]: print(xx[0])
 [[ 0  1  2  3]
   [ 4  5  6  7]
   [ 8  9 10 11]]

 In [28]: print(xx[0, :, ii])
 [[ 0  4  8]
   [ 1  5  9]
   [ 2  6 10]
   [ 3  7 11]]

 Quickly now, how many numpy users would look at that last expression and
 say, Of course, that is equivalent to transposing xx[0]?  And, Of
 course that expression should give a completely different result from
 xx[0][:, ii].?

 I would guess it would be less than 1%.  That should tell you right away
 that we have a real problem here.  Fancy indexing can't be *read* by a
 sub-genius--it has to be laboriously figured out piece by piece, with
 frequent reference to the baffling descriptions in the Numpy docs.

 So I think you should turn the question around and ask, What is the
 actual real-world use case for fancy indexing?  How often does real
 code rely on it?  I have taken advantage of it occasionally, maybe you
 have too, but I think a survey of existing code would show that the need
 for it is *far* less common than the need for simple orthogonal
 indexing.  That tells me that it is fancy indexing, not orthogonal
 indexing, that should be available through a function and/or special
 indexing attribute.  The question is then how to make that transition.


Swapping the axis when slices are mixed with fancy indexing was a
design mistake, IMO. But not fancy indexing itself.

 np.triu_indices(5)
(array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64),
array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64))
 m = np.arange(25).reshape(5, 5)[np.triu_indices(5)]
 m
array([ 0,  1,  2,  3,  4,  6,  7,  8,  9, 12, 13, 14, 18, 19, 24])

 m2 = np.zeros((5,5))
 m2[np.triu_indices(5)] = m
 m2
array([[  0.,   1.,   2.,   3.,   4.],
   [  0.,   6.,   7.,   8.,   9.],
   [  0.,   0.,  12.,  13.,  14.],
   [  0.,   0.,   0.,  18.,  19.],
   [  0.,   0.,   0.,   0.,  24.]])

(I don't remember what's fancy in indexing, just that broadcasting
rules apply.)

Josef



 Eric





 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd
On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett matthew.br...@gmail.com wrote:
 Hi,

 On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

 I guess it would be similar to python 3 string versus bytes, but
 without the overwhelming benefits.

 I don't think I would be in favor of deprecating fancy indexing even
 if it were possible. In general, my impression is that if there is a
 trade-off in numpy between powerful machinery versus easy to learn and
 teach, then the design philosophy when in favor of power.

 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.

 I'm sure y'all are totally on top of this, but for myself, I would
 like to distinguish:

 * fancy indexing with boolean arrays - I use it all the time and don't
 get confused;
 * fancy indexing with non-boolean arrays - horrendously confusing,
 almost never use it, except on a single axis when I can't confuse it
 with orthogonal indexing:

 In [3]: a = np.arange(24).reshape(6, 4)

 In [4]: a
 Out[4]:
 array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

 In [5]: a[[1, 2, 4]]
 Out[5]:
 array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[16, 17, 18, 19]])

 I also remember a discussion with Travis O where he was also saying
 that this indexing was confusing and that it would be good if there
 was some way to transition to what he called outer product indexing (I
 think that's the same as 'orthogonal' indexing).

 I think it should be DOA, except as a discussion topic for numpy 3000.

 I think there are two proposals here:

 1) Add some syntactic sugar to allow orthogonal indexing of numpy
 arrays, no backward compatibility break.

 That seems like a very good idea to me - were there any big objections to 
 that?

 2) Over some long time period, move the default behavior of np.array
 non-boolean indexing from the current behavior to the orthogonal
 behavior.

 That is going to be very tough, because it will cause very confusing
 breakage of legacy code.

 On the other hand, maybe it is worth going some way towards that, like this:

 * implement orthogonal indexing as a method arr.sensible_index[...]
 * implement the current non-boolean fancy indexing behavior as a
 method - arr.crazy_index[...]
 * deprecate non-boolean fancy indexing as standard arr[...] indexing;
 * wait a long time;
 * remove non-boolean fancy indexing as standard arr[...] (errors are
 preferable to change in behavior)

 Then if we are brave we could:

 * wait a very long time;
 * make orthogonal indexing the default.

 But the not-brave steps above seem less controversial, and fairly reasonable.

 What about that as an approach?

I also thought the transition would have to be something like that or
a clear break point, like numpy 3.0. I would be in favor something
like this for the axis swapping case with ndim2.

However, before going to that, you would still have to provide a list
of behaviors that will be deprecated, and make a poll in various
libraries for how much it is actually used.

My impression is that fancy indexing is used more often than
orthogonal indexing (beyond the trivial case x[:, idx]).
Also, many usecases for orthogonal indexing moved to using pandas, and
numpy is left with non-orthogonal indexing use cases.
And third, fancy indexing is a superset of orthogonal indexing (with
proper broadcasting), and you still need to justify why everyone
should be restricted to the subset instead of a voluntary constraint
to use code that is easier to understand.

I checked numpy.random.choice which I would have implemented with
fancy indexing, but it uses only `take`, AFAICS.

Switching to using a explicit method is not really a problem for
maintained library code, but I still don't really see why we should do
this.

Josef


 Cheers,

 Matthew
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Matthew Brett
Hi,

On Thu, Apr 2, 2015 at 8:20 PM, Jaime Fernández del Río
jaime.f...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.br...@gmail.com
 wrote:

 Hi,

 On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
  On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
  On 2015/04/02 1:14 PM, Hanno Klemm wrote:
  Well, I have written quite a bit of code that relies on fancy
  indexing, and I think the question, if the behaviour of the []
  operator should be changed has sailed with numpy now at version 1.9.
  Given the amount packages that rely on numpy, changing this
  fundamental behaviour would not be a clever move.
 
  Are you *positive* that there is no clever way to make a transition?
  It's not worth any further thought?
 
  I guess it would be similar to python 3 string versus bytes, but
  without the overwhelming benefits.
 
  I don't think I would be in favor of deprecating fancy indexing even
  if it were possible. In general, my impression is that if there is a
  trade-off in numpy between powerful machinery versus easy to learn and
  teach, then the design philosophy when in favor of power.
 
  I think numpy indexing is not too difficult and follows a consistent
  pattern, and I completely avoid mixing slices and index arrays with
  ndim  2.

 I'm sure y'all are totally on top of this, but for myself, I would
 like to distinguish:

 * fancy indexing with boolean arrays - I use it all the time and don't
 get confused;
 * fancy indexing with non-boolean arrays - horrendously confusing,
 almost never use it, except on a single axis when I can't confuse it
 with orthogonal indexing:

 In [3]: a = np.arange(24).reshape(6, 4)

 In [4]: a
 Out[4]:
 array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

 In [5]: a[[1, 2, 4]]
 Out[5]:
 array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[16, 17, 18, 19]])

 I also remember a discussion with Travis O where he was also saying
 that this indexing was confusing and that it would be good if there
 was some way to transition to what he called outer product indexing (I
 think that's the same as 'orthogonal' indexing).

  I think it should be DOA, except as a discussion topic for numpy 3000.

 I think there are two proposals here:

 1) Add some syntactic sugar to allow orthogonal indexing of numpy
 arrays, no backward compatibility break.

 That seems like a very good idea to me - were there any big objections to
 that?

 2) Over some long time period, move the default behavior of np.array
 non-boolean indexing from the current behavior to the orthogonal
 behavior.

 That is going to be very tough, because it will cause very confusing
 breakage of legacy code.

 On the other hand, maybe it is worth going some way towards that, like
 this:

 * implement orthogonal indexing as a method arr.sensible_index[...]
 * implement the current non-boolean fancy indexing behavior as a
 method - arr.crazy_index[...]
 * deprecate non-boolean fancy indexing as standard arr[...] indexing;
 * wait a long time;
 * remove non-boolean fancy indexing as standard arr[...] (errors are
 preferable to change in behavior)

 Then if we are brave we could:

 * wait a very long time;
 * make orthogonal indexing the default.

 But the not-brave steps above seem less controversial, and fairly
 reasonable.

 What about that as an approach?


 Your option 1 was what was being discussed before the posse was assembled to
 bring fancy indexing before justice... ;-)

Yes, sorry - I was trying to bring the argument back there.

 My background is in image processing, and I have used fancy indexing in all
 its fanciness far more often than orthogonal or outer product indexing. I
 actually have a vivid memory of the moment I fell in love with NumPy: after
 seeing a code snippet that ran a huge image through a look-up table by
 indexing the LUT with the image. Beautifully simple. And here is a younger
 me, learning to ride NumPy without the training wheels.

 Another obvious use case that you can find all over the place in
 scikit-image is drawing a curve on an image from the coordinates.

No question at all that it does have its uses - but then again, no-one
thinks that it should not be available, only, maybe, in the very far
future, not what you get by default...

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd
On Thu, Apr 2, 2015 at 11:30 PM, Nathaniel Smith n...@pobox.com wrote:
 On Thu, Apr 2, 2015 at 6:35 PM,  josef.p...@gmail.com wrote:
 (I thought about this because I was looking at accessing off-diagonal
 elements, m2[np.arange(4), np.arange(4) + 1] )

 Psst: np.diagonal(m2, offset=1)

It was just an example  (banded or toeplitz)
(I know how indexing works, kind off, but don't remember what diag or
other functions are exactly doing.)

 m2b = m2.copy()
 m2b[np.arange(4), np.arange(4) + 1]
array([  1.,   7.,  13.,  19.])
 m2b[np.arange(4), np.arange(4) + 1] = np.nan
 m2b
array([[  0.,  nan,   2.,   3.,   4.],
   [  0.,   6.,  nan,   8.,   9.],
   [  0.,   0.,  12.,  nan,  14.],
   [  0.,   0.,   0.,  18.,  nan],
   [  0.,   0.,   0.,   0.,  24.]])

 m2c = m2.copy()
 np.diagonal(m2c, offset=1) = np.nan
SyntaxError: can't assign to function call
 dd = np.diagonal(m2c, offset=1)
 dd[:] = np.nan
Traceback (most recent call last):
  File pyshell#89, line 1, in module
dd[:] = np.nan
ValueError: assignment destination is read-only
 np.__version__
'1.9.2rc1'

 m2d = m2.copy()
 m2d[np.arange(4)[::-1], np.arange(4) + 1] = np.nan

Josef


 --
 Nathaniel J. Smith -- http://vorpus.org
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett matthew.br...@gmail.com
wrote:

 Hi,

 On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
  On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
  On 2015/04/02 1:14 PM, Hanno Klemm wrote:
  Well, I have written quite a bit of code that relies on fancy
  indexing, and I think the question, if the behaviour of the []
  operator should be changed has sailed with numpy now at version 1.9.
  Given the amount packages that rely on numpy, changing this
  fundamental behaviour would not be a clever move.
 
  Are you *positive* that there is no clever way to make a transition?
  It's not worth any further thought?
 
  I guess it would be similar to python 3 string versus bytes, but
  without the overwhelming benefits.
 
  I don't think I would be in favor of deprecating fancy indexing even
  if it were possible. In general, my impression is that if there is a
  trade-off in numpy between powerful machinery versus easy to learn and
  teach, then the design philosophy when in favor of power.
 
  I think numpy indexing is not too difficult and follows a consistent
  pattern, and I completely avoid mixing slices and index arrays with
  ndim  2.

 I'm sure y'all are totally on top of this, but for myself, I would
 like to distinguish:

 * fancy indexing with boolean arrays - I use it all the time and don't
 get confused;
 * fancy indexing with non-boolean arrays - horrendously confusing,
 almost never use it, except on a single axis when I can't confuse it
 with orthogonal indexing:

 In [3]: a = np.arange(24).reshape(6, 4)

 In [4]: a
 Out[4]:
 array([[ 0,  1,  2,  3],
[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[12, 13, 14, 15],
[16, 17, 18, 19],
[20, 21, 22, 23]])

 In [5]: a[[1, 2, 4]]
 Out[5]:
 array([[ 4,  5,  6,  7],
[ 8,  9, 10, 11],
[16, 17, 18, 19]])

 I also remember a discussion with Travis O where he was also saying
 that this indexing was confusing and that it would be good if there
 was some way to transition to what he called outer product indexing (I
 think that's the same as 'orthogonal' indexing).

  I think it should be DOA, except as a discussion topic for numpy 3000.

 I think there are two proposals here:

 1) Add some syntactic sugar to allow orthogonal indexing of numpy
 arrays, no backward compatibility break.

 That seems like a very good idea to me - were there any big objections to
 that?

 2) Over some long time period, move the default behavior of np.array
 non-boolean indexing from the current behavior to the orthogonal
 behavior.

 That is going to be very tough, because it will cause very confusing
 breakage of legacy code.

 On the other hand, maybe it is worth going some way towards that, like
 this:

 * implement orthogonal indexing as a method arr.sensible_index[...]
 * implement the current non-boolean fancy indexing behavior as a
 method - arr.crazy_index[...]
 * deprecate non-boolean fancy indexing as standard arr[...] indexing;
 * wait a long time;
 * remove non-boolean fancy indexing as standard arr[...] (errors are
 preferable to change in behavior)

 Then if we are brave we could:

 * wait a very long time;
 * make orthogonal indexing the default.

 But the not-brave steps above seem less controversial, and fairly
 reasonable.

 What about that as an approach?


Your option 1 was what was being discussed before the posse was assembled
to bring fancy indexing before justice... ;-)

My background is in image processing, and I have used fancy indexing in all
its fanciness far more often than orthogonal or outer product indexing. I
actually have a vivid memory of the moment I fell in love with NumPy: after
seeing a code snippet that ran a huge image through a look-up table by
indexing the LUT with the image. Beautifully simple. And here
http://stackoverflow.com/questions/12014186/fancier-fancy-indexing-in-numpy
is a younger me, learning to ride NumPy without the training wheels.

Another obvious use case that you can find all over the place in
scikit-image is drawing a curve on an image from the coordinates.

If there is such strong agreement on an orthogonal indexer, we might as
well go ahead an implement it. But before considering any bolder steps, we
should probably give it a couple of releases to see how many people out
there really use it.

Jaime

P.S. As an aside on the remapping of axes when arrays and slices are mixed,
there really is no better way. Once you realize that the array indexing a
dimension does not have to be 1-D, it should clearly appear that what seems
the obvious way does not generalize to the general case. E.g.:

One may rightfully think that:

 a = np.arange(60).reshape(3, 4, 5)
 a[np.array([1])[:, None], ::2, [0, 1, 3]].shape
(1, 3, 2)

should not reorder the axes, and return an array of shape (1, 2, 3). But
what do you do in the following case?

 idx0 = np.random.randint(3, size=(10, 1, 10))
 idx2 = np.random.randint(5, size=(1, 20, 

Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd
On Thu, Apr 2, 2015 at 9:09 PM,  josef.p...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

 I guess it would be similar to python 3 string versus bytes, but
 without the overwhelming benefits.

 I don't think I would be in favor of deprecating fancy indexing even
 if it were possible. In general, my impression is that if there is a
 trade-off in numpy between powerful machinery versus easy to learn and
 teach, then the design philosophy when in favor of power.

 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.

 I think it should be DOA, except as a discussion topic for numpy 3000.

 just my opinion


is this fancy?

 vals
array([6, 5, 4, 1, 2, 3])
 a+b
array([[3, 2, 1, 0],
   [4, 3, 2, 1],
   [5, 4, 3, 2]])
 vals[a+b]
array([[1, 4, 5, 6],
   [2, 1, 4, 5],
   [3, 2, 1, 4]])

https://github.com/scipy/scipy/blob/v0.14.0/scipy/linalg/special_matrices.py#L178

(I thought about this because I was looking at accessing off-diagonal
elements, m2[np.arange(4), np.arange(4) + 1] )


How would you find all the code that would not be correct anymore with
a changed definition of indexing and slicing, if there is insufficient
test coverage and it doesn't raise an exception?
If we find it, who fixes all the legacy code? (I don't think it will
be minor unless there is a new method `fix_[...]`  (fancy ix)

Josef


 Josef



 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

 I'm not advocating adding even more complexity, I'm trying to think
 about ways to make it *less* complex from the typical user's standpoint.

 Eric
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread josef.pktd
On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

I guess it would be similar to python 3 string versus bytes, but
without the overwhelming benefits.

I don't think I would be in favor of deprecating fancy indexing even
if it were possible. In general, my impression is that if there is a
trade-off in numpy between powerful machinery versus easy to learn and
teach, then the design philosophy when in favor of power.

I think numpy indexing is not too difficult and follows a consistent
pattern, and I completely avoid mixing slices and index arrays with
ndim  2.

I think it should be DOA, except as a discussion topic for numpy 3000.

just my opinion

Josef



 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

 I'm not advocating adding even more complexity, I'm trying to think
 about ways to make it *less* complex from the typical user's standpoint.

 Eric
 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Matthew Brett
Hi,

On Thu, Apr 2, 2015 at 6:09 PM,  josef.p...@gmail.com wrote:
 On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing efir...@hawaii.edu wrote:
 On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

 Are you *positive* that there is no clever way to make a transition?
 It's not worth any further thought?

 I guess it would be similar to python 3 string versus bytes, but
 without the overwhelming benefits.

 I don't think I would be in favor of deprecating fancy indexing even
 if it were possible. In general, my impression is that if there is a
 trade-off in numpy between powerful machinery versus easy to learn and
 teach, then the design philosophy when in favor of power.

 I think numpy indexing is not too difficult and follows a consistent
 pattern, and I completely avoid mixing slices and index arrays with
 ndim  2.

I'm sure y'all are totally on top of this, but for myself, I would
like to distinguish:

* fancy indexing with boolean arrays - I use it all the time and don't
get confused;
* fancy indexing with non-boolean arrays - horrendously confusing,
almost never use it, except on a single axis when I can't confuse it
with orthogonal indexing:

In [3]: a = np.arange(24).reshape(6, 4)

In [4]: a
Out[4]:
array([[ 0,  1,  2,  3],
   [ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [12, 13, 14, 15],
   [16, 17, 18, 19],
   [20, 21, 22, 23]])

In [5]: a[[1, 2, 4]]
Out[5]:
array([[ 4,  5,  6,  7],
   [ 8,  9, 10, 11],
   [16, 17, 18, 19]])

I also remember a discussion with Travis O where he was also saying
that this indexing was confusing and that it would be good if there
was some way to transition to what he called outer product indexing (I
think that's the same as 'orthogonal' indexing).

 I think it should be DOA, except as a discussion topic for numpy 3000.

I think there are two proposals here:

1) Add some syntactic sugar to allow orthogonal indexing of numpy
arrays, no backward compatibility break.

That seems like a very good idea to me - were there any big objections to that?

2) Over some long time period, move the default behavior of np.array
non-boolean indexing from the current behavior to the orthogonal
behavior.

That is going to be very tough, because it will cause very confusing
breakage of legacy code.

On the other hand, maybe it is worth going some way towards that, like this:

* implement orthogonal indexing as a method arr.sensible_index[...]
* implement the current non-boolean fancy indexing behavior as a
method - arr.crazy_index[...]
* deprecate non-boolean fancy indexing as standard arr[...] indexing;
* wait a long time;
* remove non-boolean fancy indexing as standard arr[...] (errors are
preferable to change in behavior)

Then if we are brave we could:

* wait a very long time;
* make orthogonal indexing the default.

But the not-brave steps above seem less controversial, and fairly reasonable.

What about that as an approach?

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Hanno Klemm

 On 03 Apr 2015, at 00:04, Colin J. Williams c...@ncf.ca wrote:
 
 
 
 On 02-Apr-15 4:35 PM, Eric Firing wrote:
 On 2015/04/02 10:22 AM, josef.p...@gmail.com wrote:
 Swapping the axis when slices are mixed with fancy indexing was a
 design mistake, IMO. But not fancy indexing itself.
 I'm not saying there should be no fancy indexing capability; I am saying
 that it should be available through a function or method, rather than
 via the square brackets.  Square brackets should do things that people
 expect them to do--the most common and easy-to-understand style of indexing.
 
 Eric
 +1

Well, I have written quite a bit of code that relies on fancy indexing, and I 
think the question, if the behaviour of the [] operator should be changed has 
sailed with numpy now at version 1.9. Given the amount packages that rely on 
numpy, changing this fundamental behaviour would not be a clever move. 

If people want to implement orthogonal indexing with another method, by all 
means I might use it at some point in the future. However, adding even more 
complexity to the behaviour of the bracket slicing is probably not a good idea.

Hanno


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing
On 2015/04/02 1:14 PM, Hanno Klemm wrote:
 Well, I have written quite a bit of code that relies on fancy
 indexing, and I think the question, if the behaviour of the []
 operator should be changed has sailed with numpy now at version 1.9.
 Given the amount packages that rely on numpy, changing this
 fundamental behaviour would not be a clever move.

Are you *positive* that there is no clever way to make a transition? 
It's not worth any further thought?


 If people want to implement orthogonal indexing with another method,
 by all means I might use it at some point in the future. However,
 adding even more complexity to the behaviour of the bracket slicing
 is probably not a good idea.

I'm not advocating adding even more complexity, I'm trying to think 
about ways to make it *less* complex from the typical user's standpoint.

Eric
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Nathaniel Smith
On Thu, Apr 2, 2015 at 6:35 PM,  josef.p...@gmail.com wrote:
 (I thought about this because I was looking at accessing off-diagonal
 elements, m2[np.arange(4), np.arange(4) + 1] )

Psst: np.diagonal(m2, offset=1)

-- 
Nathaniel J. Smith -- http://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Benjamin Root
The distinction that boolean indexing has over the other 2 methods of
indexing is that it can guarantee that it references a position at most
once. Slicing and scalar indexes are also this way, hence why these methods
allow for in-place assignments. I don't see boolean indexing as an
extension of orthogonal indexing because of that.

Ben Root

On Thu, Apr 2, 2015 at 2:41 PM, Stephan Hoyer sho...@gmail.com wrote:

 On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efir...@hawaii.edu wrote:

 Fancy indexing is a horrible design mistake--a case of cleverness run
 amok.  As you can read in the Numpy documentation, it is hard to
 explain, hard to understand, hard to remember.


 Well put!

 I also failed to correct predict your example.


 So I think you should turn the question around and ask, What is the
 actual real-world use case for fancy indexing?  How often does real
 code rely on it?


 I'll just note that Indexing with a boolean array with the same shape as
 the array (e.g., x[x  0] when x has greater than 1 dimension) technically
 falls outside a strict interpretation of orthogonal indexing. But there's
 not any ambiguity in adding that as an extension to orthogonal indexing
 (which otherwise does not allow ndim  1), so I think your point still
 stands.

 Stephan

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Jaime Fernández del Río
On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer sho...@gmail.com wrote:

 On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fernández del Río 
 jaime.f...@gmail.com wrote:

 Is there any other package implementing non-orthogonal indexing aside
 from numpy?


 I think we can safely say that NumPy's implementation of broadcasting
 indexing is unique :).

 The issue is that many other packages rely on numpy for implementation of
 custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not
 immediately obvious what sort of indexing these objects represent.

 If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
 for improving that to provide the full functionality of orthogonal
 indexing. I just need a little more convincing that those new
 attributes/indexers are going to ever see any real use.


 Orthogonal indexing is close to the norm for packages that implement
 labeled data structures, both because it's easier to understand and
 implement, and because it's difficult to maintain associations with labels
 through complex broadcasting indexing.

 Unfortunately, the lack of a full featured implementation of orthogonal
 indexing has lead to that wheel being reinvented at least three times (in
 Iris, xray [1] and pandas). So it would be nice to have a canonical
 implementation that supports slices and integers in numpy for that reason
 alone. This could be done by building on the existing `np.ix_` function,
 but a new indexer seems more elegant: there's just much less noise with
 `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`.

 It's also well known that indexing with __getitem__ can be much slower
 than np.take. It seems plausible to me that a careful implementation of
 orthogonal indexing could close or eliminate this speed gap, because the
 model for orthogonal indexing is so much simpler than that for broadcasting
 indexing: each element of the key tuple can be applied separately along the
 corresponding axis.

 So I think there could be a real benefit to having the feature in numpy.
 In particular, if somebody is up for implementing it in C or Cython, I
 would be very pleased.

  Cheers,
 Stephan

 [1] Here is my implementation of remapping from orthogonal to broadcasting
 indexing. It works, but it's a real mess, especially because I try to
 optimize by minimizing the number of times slices are converted into arrays:

 https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68


I believe you can leave all slices unchanged if you later reshuffle your
axes. Basically all the fancy-indexed axes go in the front of the shape in
order, and the subspace follows, e.g.:

 a = np.arange(60).reshape(3, 4, 5)
 a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape
(1, 3, 2)

So you would need to swap the second and last axes and be done. You would
not get a contiguous array without a copy, but that's a different story.
Assigning to an orthogonally indexed subarray is an entirely different
beast, not sure if there is a use case for that.

We probably need more traction on the should this be done? discussion
than on the can this be done? one, the need for a reordering of the axes
swings me slightly in favor, but I mostly don't see it yet. Nathaniel
usually has good insights on who we are, where do we come from, where are
we going to, type of questions, would be good to have him chime in.

Jaime

-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Stephan Hoyer
On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing efir...@hawaii.edu wrote:

 Fancy indexing is a horrible design mistake--a case of cleverness run
 amok.  As you can read in the Numpy documentation, it is hard to
 explain, hard to understand, hard to remember.


Well put!

I also failed to correct predict your example.


 So I think you should turn the question around and ask, What is the
 actual real-world use case for fancy indexing?  How often does real
 code rely on it?


I'll just note that Indexing with a boolean array with the same shape as
the array (e.g., x[x  0] when x has greater than 1 dimension) technically
falls outside a strict interpretation of orthogonal indexing. But there's
not any ambiguity in adding that as an extension to orthogonal indexing
(which otherwise does not allow ndim  1), so I think your point still
stands.

Stephan
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-02 Thread Eric Firing
On 2015/04/02 4:15 AM, Jaime Fernández del Río wrote:
 We probably need more traction on the should this be done? discussion
 than on the can this be done? one, the need for a reordering of the
 axes swings me slightly in favor, but I mostly don't see it yet.

As a long-time user of numpy, and an advocate and teacher of Python for 
science, here is my perspective:

Fancy indexing is a horrible design mistake--a case of cleverness run 
amok.  As you can read in the Numpy documentation, it is hard to 
explain, hard to understand, hard to remember.  Its use easily leads to 
unreadable code and hard-to-see errors.  Here is the essence of an 
example that a student presented me with just this week, in the context 
of reordering eigenvectors based on argsort applied to eigenvalues:

In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4))

In [26]: ii = np.arange(4)

In [27]: print(xx[0])
[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

In [28]: print(xx[0, :, ii])
[[ 0  4  8]
  [ 1  5  9]
  [ 2  6 10]
  [ 3  7 11]]

Quickly now, how many numpy users would look at that last expression and 
say, Of course, that is equivalent to transposing xx[0]?  And, Of 
course that expression should give a completely different result from 
xx[0][:, ii].?

I would guess it would be less than 1%.  That should tell you right away 
that we have a real problem here.  Fancy indexing can't be *read* by a 
sub-genius--it has to be laboriously figured out piece by piece, with 
frequent reference to the baffling descriptions in the Numpy docs.

So I think you should turn the question around and ask, What is the 
actual real-world use case for fancy indexing?  How often does real 
code rely on it?  I have taken advantage of it occasionally, maybe you 
have too, but I think a survey of existing code would show that the need 
for it is *far* less common than the need for simple orthogonal 
indexing.  That tells me that it is fancy indexing, not orthogonal 
indexing, that should be available through a function and/or special 
indexing attribute.  The question is then how to make that transition.

Eric





___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-01 Thread R Hattersley
There are two different interpretations in common use of how to handle
multi-valued (array/sequence) indexes. The numpy style is to consider all
multi-valued indices together which allows arbitrary points to be
extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to
consider each multi-valued index independently.

For example:

 type(v)type 'netCDF4.Variable' v.shape
(240, 37, 49) v[(0, 1), (0, 2, 3)].shape
(2, 3, 49) np.array(v)[(0, 1), (0, 2, 3)].shape
Traceback (most recent call last):
  File stdin, line 1, in moduleIndexError: shape mismatch:
indexing arrays could not be broadcast together with shapes (2,) (3,)


In a netcdf4-python GitHub issue
https://github.com/Unidata/netcdf4-python/issues/385 the authors of
various orthogonal indexing packages have been discussing how to
distinguish the two behaviours and have currently settled on a boolean
__orthogonal_indexing__ attribute.

1. Is there any appetite for adding that attribute (with the value `False`)
to ndarray?

2. As suggested by shoyer
https://github.com/Unidata/netcdf4-python/issues/385#issuecomment-87775034,
is there any appetite for adding an alternative indexer to ndarray where
__orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]

Richard
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Advanced indexing: fancy vs. orthogonal

2015-04-01 Thread Jaime Fernández del Río
On Wed, Apr 1, 2015 at 2:17 AM, R Hattersley rhatters...@gmail.com wrote:

 There are two different interpretations in common use of how to handle
 multi-valued (array/sequence) indexes. The numpy style is to consider all
 multi-valued indices together which allows arbitrary points to be
 extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to
 consider each multi-valued index independently.

 For example:

  type(v)type 'netCDF4.Variable' v.shape
 (240, 37, 49) v[(0, 1), (0, 2, 3)].shape
 (2, 3, 49) np.array(v)[(0, 1), (0, 2, 3)].shape
 Traceback (most recent call last):
   File stdin, line 1, in moduleIndexError: shape mismatch: indexing 
 arrays could not be broadcast together with shapes (2,) (3,)


 In a netcdf4-python GitHub issue
 https://github.com/Unidata/netcdf4-python/issues/385 the authors of
 various orthogonal indexing packages have been discussing how to
 distinguish the two behaviours and have currently settled on a boolean
 __orthogonal_indexing__ attribute.

 1. Is there any appetite for adding that attribute (with the value
 `False`) to ndarray?

 2. As suggested by shoyer
 https://github.com/Unidata/netcdf4-python/issues/385#issuecomment-87775034,
 is there any appetite for adding an alternative indexer to ndarray where
 __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)]


Is there any other package implementing non-orthogonal indexing aside from
numpy? I understand that it would be nice to do:

if x.__orthogonal_indexing__:
return x[idx]
else:
return x.ix_[idx]

But I think you would get the exact same result doing:

if isinstance(x, np.ndarray):
return x[np.ix_(*idx)]
else:
return x[idx]

If `not x.__orthogonal_indexing__` is going to be a proxy for
`isinstance(x, ndarray)` I don't really see the point of disguising it,
explicit is better than implicit and all that.

If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all
for improving that to provide the full functionality of orthogonal
indexing. I just need a little more convincing that those new
attributes/indexers are going to ever see any real use.

Jaime


 Richard

 ___
 NumPy-Discussion mailing list
 NumPy-Discussion@scipy.org
 http://mail.scipy.org/mailman/listinfo/numpy-discussion




-- 
(\__/)
( O.o)
(  ) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes
de dominación mundial.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion