On 5 May 2011, at 22:31, Ralf Gommers wrote: > > On Thu, May 5, 2011 at 9:46 PM, Benjamin Root <ben.r...@ou.edu> wrote: > > > On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers <ralf.gomm...@googlemail.com > > wrote: > > > On Thu, May 5, 2011 at 9:18 PM, Benjamin Root <ben.r...@ou.edu> wrote: > > > On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes > <paul.anton.let...@gmail.com > > wrote: > > On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > > > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes > > <paul.anton.let...@gmail.com > > wrote: > > > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier > > > <de...@astro.physik.uni-goettingen.de > > wrote: > > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions > written for this? Shouldn't we reuse them? Perhaps it's overkill, > and perhaps it will reintroduce the 'transposed' problem? > > > > > > Yes, good point, one could replace the > > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > > but for the ndmin=2 case, we'd need to replace > > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > > not sure which solution is more efficient in terms of memory > access etc... > > > > > > Cheers, > > > Derek > > > > > > > > > I can confirm that the current behavior is not sufficient for > all of the original corner cases that ndmin was supposed to > address. Keep in mind that np.loadtxt takes a one-column data file > and a one-row data file down to the same shape. I don't see how the > current code is able to produce the correct array shape when > ndmin=2. Do we have some sort of counter in loadtxt for counting > the number of rows and columns read? Could we use those to help > guide the ndmin=2 case? > > > > > > I think that using atleast_1d(X) might be a bit overkill, but it > would be very clear as to the code's intent. I don't think we have > to worry about memory usage if we limit its use to only situations > where ndmin is greater than the number of dimensions of the array. > In those cases, the array is either an empty result, a scalar value > (in which memory access is trivial), or 1-d (in which a transpose is > cheap). > > > > What if one does things the other way around - avoid calling > squeeze until _after_ doing the atleast_Nd() magic? That way the row/ > column information should be conserved, right? Also, we avoid > transposing, memory use, ... > > > > Oh, and someone could conceivably have a _looong_ 1D file, but > would want it read as a 2D array. > > > > Paul > > > > > > > > @Derek, good catch with noticing the error in the tests. We do > still need to handle the case I mentioned, however. I have attached > an example script to demonstrate the issue. In this script, I would > expect the second-to-last array to be a shape of (1, 5). I believe > that the single-row, multi-column case would actually be the more > common type of edge-case encountered by users than the others. > Therefore, I believe that this ndmin fix is not adequate until this > is addressed. > > > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. > That would just undo whatever we had just done. Also, wrt the > transpose, a (1, 100000) array looks the same in memory as a > (100000, 1) array, right? > Agree. I thought more along the lines of (pseudocode-ish) > if ndmin == 0: > squeeze() > if ndmin == 1: > atleast_1D() > elif ndmin == 2: > atleast_2D() > else: > I don't rightly know what would go here, maybe raise > ValueError? > > That would avoid the squeeze call before the atleast_Nd magic. But > the code was changed, so I think my comment doesn't make sense > anymore. It's probably fine the way it is! > > Paul > > > I have thought of that too, but the problem with that approach is > that after reading the file, X will have 2 or 3 dimensions, > regardless of how many singleton dims were in the file. A squeeze > will always be needed. Also, the purpose of squeeze is opposite > that of the atleast_*d() functions: squeeze reduces dimensions, > while atleast_*d will add dimensions. > > Therefore, I re-iterate... the patch by Derek gets the job done. I > have tested it for a wide variety of inputs for both regular arrays > and record arrays. Is there room for improvements? Yes, but I > think that can wait for later. Derek's patch however fixes an > important bug in the ndmin implementation and should be included for > the release. > Thanks for the additional tests and discussion, Paul and Ben! Yes, I think the main issue is that multi-column input first comes back as 3- dimensional, with a "singleton" dimension in front, so I was getting rid of that first - maybe could add a comment on that. Single-column, multi-row apparently already come back as 1-dimensional, so we only need the expansion there.
> Two questions: can you point me to the patch/ticket, and is this a > regression? > > Thanks, > Ralf > This was ticket #1562, had been closed as "resolved" in the meantime, though. > > I don't know if he did a pull-request or not, but here is the link > he provided earlier in the thread. > > https://github.com/dhomeier/numpy/compare/master...ndmin-cols > > Technically, this is not a "regression" as the ndmin feature is new > in this release. > > Yes right, I forgot this was a recent change. > > However, the problem that ndmin is supposed to address is not fixed > by the current implementation for the rc. Essentially, a single- > row, multi-column file with ndmin=2 comes out as a Nx1 array which > is the same result for a multi-row, single-column file. My feeling > is that if we let the current implementation stand as is, and > developers use it in their code, then fixing it in a later release > would introduce more problems (maybe the devels would transpose the > result themselves or something). Better to fix it now in rc with > the two lines of code (and the correction to the tests), then to > introduce a buggy feature that will be hard to fix in future > releases, IMHO. > > Looks okay, and I agree that it's better to fix it now. The timing > is a bit unfortunate though, just after RC2. I'll have closer look > tomorrow and if it can go in, probably tag RC3. > > If in the meantime a few more people could test this, that would be > helpful. > > Ralf I agree, wish I had time to push this before rc2. I could add the explanatory comments mentioned above and switch to use the atleast_[12]d() solution, test that and push it in a couple of minutes, or should I better leave it as is now for testing? Cheers, Derek _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion