I agree with Dag, NumPy should provide consistent handling of empty arrays. It does require some work, but it should be at least declared a bug when it doesn't.
Travis -- Travis Oliphant (on a mobile) 512-826-7480 On Dec 28, 2011, at 7:45 AM, Dag Sverre Seljebotn <d.s.seljeb...@astro.uio.no> wrote: > On 12/28/2011 02:21 PM, Ralf Gommers wrote: >> >> >> On Wed, Dec 28, 2011 at 1:57 PM, Dag Sverre Seljebotn >> <d.s.seljeb...@astro.uio.no <mailto:d.s.seljeb...@astro.uio.no>> wrote: >> >> On 12/28/2011 01:52 PM, Dag Sverre Seljebotn wrote: >>> On 12/28/2011 09:33 AM, Ralf Gommers wrote: >>>> >>>> >>>> 2011/12/27 Jordi Gutiérrez Hermoso<jord...@octave.org >> <mailto:jord...@octave.org> >>>> <mailto:jord...@octave.org <mailto:jord...@octave.org>>> >>>> >>>> On 26 December 2011 14:56, Ralf >> Gommers<ralf.gomm...@googlemail.com <mailto:ralf.gomm...@googlemail.com> >>>> <mailto:ralf.gomm...@googlemail.com >> <mailto:ralf.gomm...@googlemail.com>>> wrote: >>>>> >>>>> >>>>> On Mon, Dec 26, 2011 at 8:50 PM,<josef.p...@gmail.com >> <mailto:josef.p...@gmail.com> >>>> <mailto:josef.p...@gmail.com <mailto:josef.p...@gmail.com>>> wrote: >>>>>> I have a hard time thinking through empty 2-dim arrays, and >>>> don't know >>>>>> what rules should apply. >>>>>> However, in my code I might want to catch these cases rather >> early >>>>>> than late and then having to work my way backwards to find >> out where >>>>>> the content disappeared. >>>>> >>>>> >>>>> Same here. Almost always, my empty arrays are either due to bugs >>>> or they >>>>> signal that I do need to special-case something. Silent passing >>>> through of >>>>> empty arrays to all numpy functions is not what I would want. >>>> >>>> I find it quite annoying to treat the empty set with special >>>> deference. "All of my great-grandkids live in Antarctica" >> should be >>>> true for me (I'm only 30 years old). If you decide that is >> not true >>>> for me, it leads to a bunch of other logical annoyances up >> there >>>> >>>> >>>> Guess you don't mean true/false, because it's neither. But I >> understand >>>> you want an empty array back instead of an error. >>>> >>>> Currently the problem is that when you do get that empty array back, >>>> you'll then use that for something else and it will probably still >>>> crash. Many numpy functions do not check for empty input and >> will still >>>> give exceptions. My impression is that you're better off >> handling these >>>> where you create the empty array, rather than in some random >> place later >>>> on. The alternative is to have consistent rules for empty >> arrays, and >>>> handle them explicitly in all functions. Can be done, but is of >> course a >>>> lot of work and has some overhead. >>> >>> Are you saying that the existence of other bugs means that this bug >>> shouldn't be fixed? I just fail to see the relevance of these >> other bugs >>> to this discussion. >> >> >> See below. >> >>> For the record, I've encountered this bug many times myself and it's >>> rather irritating, since it leads to more verbose code. >>> >>> It is useful whenever you want to return data that is a subset of the >>> input data (since the selected subset can usually be zero-sized >>> sometimes -- remember, in computer science the only numbers are 0, 1, >>> and "any number"). >>> >>> Here's one of the examples I've had. The Interpolative Decomposition >>> decomposes a m-by-n matrix A of rank k as >>> >>> A = B C >>> >>> where B is an m-by-k matrix consisting of a subset of the columns >> of A, >>> and C is a k-by-n matrix. >>> >>> Now, if A is all zeros (which is often the case for me), then k >> is 0. I >>> would still like to create the m-by-0 matrix B by doing >>> >>> B = A[:, selected_columns] >>> >>> But now I have to do this instead: >>> >>> if len(selected_columns) == 0: >>> B = np.zeros((A.shape[0], 0), dtype=A.dtype) >>> else: >>> B = A[:, selected_columns] >>> >>> In this case, zero-sized B and C are of course perfectly valid and >>> useful results: >>> >>> In [2]: np.dot(np.ones((3,0)), np.ones((0, 5))) >>> Out[2]: >>> array([[ 0., 0., 0., 0., 0.], >>> [ 0., 0., 0., 0., 0.], >>> [ 0., 0., 0., 0., 0.]]) >>> >> >> And to answer the obvious question: Yes, this is a real usecase. It is >> used for something similar to image compression, where sub-sections of >> the images may well be all-zero and have zero rank (full story at [1]). >> >> Thanks for the example. I was a little surprised that dot works. Then I >> read what wikipedia had to say about empty arrays. It mentions dot like >> you do, and that the determinant of the 0-by-0 matrix is 1. So I try: >> >> In [1]: a = np.zeros((0,0)) >> >> In [2]: a >> Out[2]: array([], shape=(0, 0), dtype=float64) >> >> In [3]: np.linalg.det(a) >> Parameter 4 to routine DGETRF was incorrect >> <segfault> > > :-) > > Well, a segfault is most certainly a bug, so this must be fixed one way > or the other way anyway, and returning 1 seems at least as good a > solution as raising an exception. Both solutions require an extra if-test. > >> >> Reading the above thread I understand Ralf's reasoning better, but >> really, relying on NumPy's buggy behaviour to discover bugs in user code >> seems like the wrong approach. Tools should be dumb unless there are >> good reasons to make them smart. I'd be rather irritated about my hammer >> if it refused to drive in nails that it decided where in the wrong spot. >> >> >> The point is not that we shouldn't fix it, but that it's a waste of time >> to fix it in only one place. I remember fixing several functions to >> explicitly check for empty arrays and then returning an empty array or >> giving a sensible error. >> >> So can you answer my question: do you think it's worth the time and >> computational overhead to handle empty arrays in all functions? > > I'd hope the computational overhead is negligible? > > I do believe that handling this correctly everywhere is the right thing > to do and would improve overall code quality (as witnessed by the > segfault found above). > > Of course, likely nobody is ready to actually perform all that work. So > the right thing to do seems to be to state that places where NumPy does > not handle zero-size arrays is a bug, but not do anything about it until > somebody actually submits a patch. That means, ending this email > discussion by verifying that this is indeed a bug on Trac, and then wait > and see if anybody bothers to submit a patch. > > Dag Sverre > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion