On Thu, May 5, 2011 at 9:46 PM, Benjamin Root <ben.r...@ou.edu> wrote:
> > > On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers > <ralf.gomm...@googlemail.com>wrote: > >> >> >> On Thu, May 5, 2011 at 9:18 PM, Benjamin Root <ben.r...@ou.edu> wrote: >> >>> >>> >>> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < >>> paul.anton.let...@gmail.com> wrote: >>> >>>> >>>> On 5. mai 2011, at 08.49, Benjamin Root wrote: >>>> >>>> > >>>> > >>>> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < >>>> paul.anton.let...@gmail.com> wrote: >>>> > >>>> > On 4. mai 2011, at 20.33, Benjamin Root wrote: >>>> > >>>> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < >>>> de...@astro.physik.uni-goettingen.de> wrote: >>>> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: >>>> > > >>>> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions >>>> written for this? Shouldn't we reuse them? Perhaps it's overkill, and >>>> perhaps it will reintroduce the 'transposed' problem? >>>> > > >>>> > > Yes, good point, one could replace the >>>> > > X.shape = (X.size, ) with X = np.atleast_1d(X), >>>> > > but for the ndmin=2 case, we'd need to replace >>>> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - >>>> > > not sure which solution is more efficient in terms of memory access >>>> etc... >>>> > > >>>> > > Cheers, >>>> > > Derek >>>> > > >>>> > > >>>> > > I can confirm that the current behavior is not sufficient for all of >>>> the original corner cases that ndmin was supposed to address. Keep in mind >>>> that np.loadtxt takes a one-column data file and a one-row data file down >>>> to >>>> the same shape. I don't see how the current code is able to produce the >>>> correct array shape when ndmin=2. Do we have some sort of counter in >>>> loadtxt for counting the number of rows and columns read? Could we use >>>> those to help guide the ndmin=2 case? >>>> > > >>>> > > I think that using atleast_1d(X) might be a bit overkill, but it >>>> would be very clear as to the code's intent. I don't think we have to >>>> worry >>>> about memory usage if we limit its use to only situations where ndmin is >>>> greater than the number of dimensions of the array. In those cases, the >>>> array is either an empty result, a scalar value (in which memory access is >>>> trivial), or 1-d (in which a transpose is cheap). >>>> > >>>> > What if one does things the other way around - avoid calling squeeze >>>> until _after_ doing the atleast_Nd() magic? That way the row/column >>>> information should be conserved, right? Also, we avoid transposing, memory >>>> use, ... >>>> > >>>> > Oh, and someone could conceivably have a _looong_ 1D file, but would >>>> want it read as a 2D array. >>>> > >>>> > Paul >>>> > >>>> > >>>> > >>>> > @Derek, good catch with noticing the error in the tests. We do still >>>> need to handle the case I mentioned, however. I have attached an example >>>> script to demonstrate the issue. In this script, I would expect the >>>> second-to-last array to be a shape of (1, 5). I believe that the >>>> single-row, multi-column case would actually be the more common type of >>>> edge-case encountered by users than the others. Therefore, I believe that >>>> this ndmin fix is not adequate until this is addressed. >>>> > >>>> > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That >>>> would just undo whatever we had just done. Also, wrt the transpose, a (1, >>>> 100000) array looks the same in memory as a (100000, 1) array, right? >>>> Agree. I thought more along the lines of (pseudocode-ish) >>>> if ndmin == 0: >>>> squeeze() >>>> if ndmin == 1: >>>> atleast_1D() >>>> elif ndmin == 2: >>>> atleast_2D() >>>> else: >>>> I don't rightly know what would go here, maybe raise ValueError? >>>> >>>> That would avoid the squeeze call before the atleast_Nd magic. But the >>>> code was changed, so I think my comment doesn't make sense anymore. It's >>>> probably fine the way it is! >>>> >>>> Paul >>>> >>>> >>> I have thought of that too, but the problem with that approach is that >>> after reading the file, X will have 2 or 3 dimensions, regardless of how >>> many singleton dims were in the file. A squeeze will always be needed. >>> Also, the purpose of squeeze is opposite that of the atleast_*d() >>> functions: squeeze reduces dimensions, while atleast_*d will add >>> dimensions. >>> >>> Therefore, I re-iterate... the patch by Derek gets the job done. I have >>> tested it for a wide variety of inputs for both regular arrays and record >>> arrays. Is there room for improvements? Yes, but I think that can wait for >>> later. Derek's patch however fixes an important bug in the ndmin >>> implementation and should be included for the release. >>> >>> Two questions: can you point me to the patch/ticket, and is this a >> regression? >> >> Thanks, >> Ralf >> >> >> > I don't know if he did a pull-request or not, but here is the link he > provided earlier in the thread. > > https://github.com/dhomeier/numpy/compare/master...ndmin-cols > > Technically, this is not a "regression" as the ndmin feature is new in this > release. > Yes right, I forgot this was a recent change. > However, the problem that ndmin is supposed to address is not fixed by the > current implementation for the rc. Essentially, a single-row, multi-column > file with ndmin=2 comes out as a Nx1 array which is the same result for a > multi-row, single-column file. My feeling is that if we let the current > implementation stand as is, and developers use it in their code, then fixing > it in a later release would introduce more problems (maybe the devels would > transpose the result themselves or something). Better to fix it now in rc > with the two lines of code (and the correction to the tests), then to > introduce a buggy feature that will be hard to fix in future releases, IMHO. > > Looks okay, and I agree that it's better to fix it now. The timing is a bit unfortunate though, just after RC2. I'll have closer look tomorrow and if it can go in, probably tag RC3. If in the meantime a few more people could test this, that would be helpful. Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion