Re: [Numpy-discussion] loadtxt ndmin option

Paul Anton Letnes Thu, 05 May 2011 11:08:42 -0700

On 5. mai 2011, at 08.49, Benjamin Root wrote:

> 
> 
> On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
> <paul.anton.let...@gmail.com> wrote:
> 
> On 4. mai 2011, at 20.33, Benjamin Root wrote:
> 
> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
> > <de...@astro.physik.uni-goettingen.de> wrote:
> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> >
> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written 
> > > for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it 
> > > will reintroduce the 'transposed' problem?
> >
> > Yes, good point, one could replace the
> > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > but for the ndmin=2 case, we'd need to replace
> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > not sure which solution is more efficient in terms of memory access etc...
> >
> > Cheers,
> >                                                Derek
> >
> >
> > I can confirm that the current behavior is not sufficient for all of the 
> > original corner cases that ndmin was supposed to address.  Keep in mind 
> > that np.loadtxt takes a one-column data file and a one-row data file down 
> > to the same shape.  I don't see how the current code is able to produce the 
> > correct array shape when ndmin=2.  Do we have some sort of counter in 
> > loadtxt for counting the number of rows and columns read?  Could we use 
> > those to help guide the ndmin=2 case?
> >
> > I think that using atleast_1d(X) might be a bit overkill, but it would be 
> > very clear as to the code's intent.  I don't think we have to worry about 
> > memory usage if we limit its use to only situations where ndmin is greater 
> > than the number of dimensions of the array.  In those cases, the array is 
> > either an empty result, a scalar value (in which memory access is trivial), 
> > or 1-d (in which a transpose is cheap).
> 
> What if one does things the other way around - avoid calling squeeze until 
> _after_ doing the atleast_Nd() magic? That way the row/column information 
> should be conserved, right? Also, we avoid transposing, memory use, ...
> 
> Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
> read as a 2D array.
> 
> Paul
> 
> 
> 
> @Derek, good catch with noticing the error in the tests. We do still need to 
> handle the case I mentioned, however.  I have attached an example script to 
> demonstrate the issue.  In this script, I would expect the second-to-last 
> array to be a shape of (1, 5).  I believe that the single-row, multi-column 
> case would actually be the more common type of edge-case encountered by users 
> than the others.  Therefore, I believe that this ndmin fix is not adequate 
> until this is addressed.
> 
> @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would 
> just undo whatever we had just done.  Also, wrt the transpose, a (1, 100000) 
> array looks the same in memory as a (100000, 1) array, right?
Agree. I thought more along the lines of (pseudocode-ish)
if ndmin == 0:
        squeeze()
if ndmin == 1:
        atleast_1D()
elif ndmin == 2:
        atleast_2D()
else:
        I don't rightly know what would go here, maybe raise ValueError?


That would avoid the squeeze call before the atleast_Nd magic. But the code was 
changed, so I think my comment doesn't make sense anymore. It's probably fine 
the way it is!

Paul

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] loadtxt ndmin option

Reply via email to