Re: [Numpy-discussion] loadtxt ndmin option

2011-06-19 Thread Derek Homeier
On 31 May 2011, at 21:28, Pierre GM wrote:

> On May 31, 2011, at 6:37 PM, Derek Homeier wrote:
>
>> On 31 May 2011, at 18:25, Pierre GM wrote:
>>
>>> On May 31, 2011, at 5:52 PM, Derek Homeier wrote:

 I think stuff like multiple delimiters should have been dealt with
 before, as the right place to insert the ndmin code (which includes
 the decision to squeeze or not to squeeze as well as to add
 additional
 dimensions, if required) would be right at the end before the
 'unpack'
 switch, or  rather replacing the bit:

  if usemask:
  output = output.view(MaskedArray)
  output._mask = outputmask
  if unpack:
  return output.squeeze().T
  return output.squeeze()

 But there it's already not clear to me how to deal with the
 MaskedArray case...
>>>
>>> Oh, easy.
>>> You need to replace only the last three lines of genfromtxt with the
>>> ones from loadtxt  (808-833). Then, if usemask is True, you need to
>>> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
>>> Comments:
>>> * I would raise an exception if ndmin isn't correct *before* trying
>>> to read the file...
>>> * You could define a `collapse_function` that would be
>>> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on
>>> the values of `usemask` and `ndmin`...

Thanks, that helped to clean up a little bit.

>>> If you have any question about numpy.ma, don't hesitate to contact
>>> me directly.
>>
>> Thanks for the directions! I was not sure about the usemask case
>> because it presently does not invoke .squeeze() either...
>
> The idea is that if `usemask` is True, you build a second array (the  
> mask), that you attach to your main array at the very end (in the  
> `output=output.view(MaskedArray), output._mask = mask` combo...).  
> Afterwards, it's a regular MaskedArray that supports the .squeeze()  
> method...
>
OK, in both cases output.squeeze() is now used if ndim>ndmin and  
usemask is False - at least it does not break any tests, so it seems  
to work with MaskedArrays as well.
>
>> On a
>> possibly related note, genfromtxt also treats the 'unpack'ing of
>> structured arrays differently from loadtxt (which returns a list of
>> arrays in that case) - do you know if this is on purpose, or also
>> rather missing functionality (I guess it might break  
>> recfromtxt()...)?
>
> Keep in mind that I haven't touched genfromtxt since 8-10 months or  
> so. I wouldn't be surprised that it were lagging a bit behind  
> loadtxt in terms of development. Yes, there'll be some tweaking to  
> do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the  
> defaults) and others, but nothing major.

Well, at long last I got to implement the above and added the  
corresponding tests for genfromtxt - with the exception of the  
dimension-0 cases, since genfromtxt raises an error on empty files.  
There already is a comment it should perhaps better return an empty  
array, so I am putting that idea up for discussion here again.
I tried to devise a very basic test with masked arrays, just added to  
test_withmissing now.
I also implemented the same unpacking behaviour for structured arrays  
and just made recfromtxt set unpack=False to work (or should it issue  
a warning?).

The patches are up for review as commit 8ac01636 in my iocnv-wildcard  
branch:
https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-06-03 Thread cgraves


Derek Homeier wrote:
> 
> Hi Chris,
> 
> On 31 May 2011, at 13:56, cgraves wrote:
> 
>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
>> option,
>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
>> Should they have inherited the option? Who can make it happen?
> 
> you are mistaken, genfromtxt is not using loadtxt (and could not  
> possibly, since it has the more complex parser to handle missing  
> data); thus ndmin could not be inherited automatically.
> It certainly would make sense to provide the same functionality for  
> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
> I'd go ahead and file a feature (enhancement) request. I can't promise  
> I can take care of it myself, as I am less familiar with genfromtxt,  
> but I'd certainly have a look at it.
> 
> Does anyone have an opinion whether this is a case for reopening (yet  
> again...)
> http://projects.scipy.org/numpy/ticket/1562
> or create a new ticket?
> 

Thanks Derek. That would be greatly appreciated! Based on the 
follow-up messages in this thread, it looks like (hopefully) there will 
not be too much additional work in implementing it. For now I'll just 
use the temporary fix, a .reshape(-1), on any recfromtxt's that 
might read in a single row of data..

Kind regards,
Chris
-- 
View this message in context: 
http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31769169.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 6:37 PM, Derek Homeier wrote:

> On 31 May 2011, at 18:25, Pierre GM wrote:
> 
>> On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
>>> 
>>> I think stuff like multiple delimiters should have been dealt with
>>> before, as the right place to insert the ndmin code (which includes
>>> the decision to squeeze or not to squeeze as well as to add  
>>> additional
>>> dimensions, if required) would be right at the end before the  
>>> 'unpack'
>>> switch, or  rather replacing the bit:
>>> 
>>>   if usemask:
>>>   output = output.view(MaskedArray)
>>>   output._mask = outputmask
>>>   if unpack:
>>>   return output.squeeze().T
>>>   return output.squeeze()
>>> 
>>> But there it's already not clear to me how to deal with the
>>> MaskedArray case...
>> 
>> Oh, easy.
>> You need to replace only the last three lines of genfromtxt with the  
>> ones from loadtxt  (808-833). Then, if usemask is True, you need to  
>> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
>> Comments:
>> * I would raise an exception if ndmin isn't correct *before* trying  
>> to read the file...
>> * You could define a `collapse_function` that would be  
>> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on  
>> the values of `usemask` and `ndmin`...
>> If you have any question about numpy.ma, don't hesitate to contact  
>> me directly.
> 
> Thanks for the directions! I was not sure about the usemask case  
> because it presently does not invoke .squeeze() either...

The idea is that if `usemask` is True, you build a second array (the mask), 
that you attach to your main array at the very end (in the 
`output=output.view(MaskedArray), output._mask = mask` combo...). Afterwards, 
it's a regular MaskedArray that supports the .squeeze() method...


> On a  
> possibly related note, genfromtxt also treats the 'unpack'ing of  
> structured arrays differently from loadtxt (which returns a list of  
> arrays in that case) - do you know if this is on purpose, or also  
> rather missing functionality (I guess it might break recfromtxt()...)?

Keep in mind that I haven't touched genfromtxt since 8-10 months or so. I 
wouldn't be surprised that it were lagging a bit behind loadtxt in terms of 
development. Yes, there'll be some tweaking to do for recfromtxt (it's OK for 
now if `ndmin` and `unpack` are the defaults) and others, but nothing major.



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 18:25, Pierre GM wrote:

> On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
>>
>> I think stuff like multiple delimiters should have been dealt with
>> before, as the right place to insert the ndmin code (which includes
>> the decision to squeeze or not to squeeze as well as to add  
>> additional
>> dimensions, if required) would be right at the end before the  
>> 'unpack'
>> switch, or  rather replacing the bit:
>>
>>if usemask:
>>output = output.view(MaskedArray)
>>output._mask = outputmask
>>if unpack:
>>return output.squeeze().T
>>return output.squeeze()
>>
>> But there it's already not clear to me how to deal with the
>> MaskedArray case...
>
> Oh, easy.
> You need to replace only the last three lines of genfromtxt with the  
> ones from loadtxt  (808-833). Then, if usemask is True, you need to  
> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà.
> Comments:
> * I would raise an exception if ndmin isn't correct *before* trying  
> to read the file...
> * You could define a `collapse_function` that would be  
> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on  
> the values of `usemask` and `ndmin`...
> If you have any question about numpy.ma, don't hesitate to contact  
> me directly.

Thanks for the directions! I was not sure about the usemask case  
because it presently does not invoke .squeeze() either... On a  
possibly related note, genfromtxt also treats the 'unpack'ing of  
structured arrays differently from loadtxt (which returns a list of  
arrays in that case) - do you know if this is on purpose, or also  
rather missing functionality (I guess it might break recfromtxt()...)?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 5:52 PM, Derek Homeier wrote:
> 
> I think stuff like multiple delimiters should have been dealt with  
> before, as the right place to insert the ndmin code (which includes  
> the decision to squeeze or not to squeeze as well as to add additional  
> dimensions, if required) would be right at the end before the 'unpack'  
> switch, or  rather replacing the bit:
> 
> if usemask:
> output = output.view(MaskedArray)
> output._mask = outputmask
> if unpack:
> return output.squeeze().T
> return output.squeeze()
> 
> But there it's already not clear to me how to deal with the  
> MaskedArray case...

Oh, easy.
You need to replace only the last three lines of genfromtxt with the ones from 
loadtxt  (808-833). Then, if usemask is True, you need to use ma.atleast_Xd 
instead of np.atleast_Xd. Et voilà.
Comments:
* I would raise an exception if ndmin isn't correct *before* trying to read the 
file... 
* You could define a `collapse_function` that would be `np.atleast_1d`, 
`np.atleast_2d`, `ma.atleast_1d`... depending on the values of `usemask` and 
`ndmin`...
If you have any question about numpy.ma, don't hesitate to contact me directly.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 17:45, Benjamin Root wrote:
>
> At this point, I wonder if it might be smarter to create  
> a .atleast_Nd() function and use that everywhere it is needed.   
> Having similar logic tailored for each loading function might be a  
> little dangerous if bug fixes are made to one, but not the others.

Like a generalised version of .atleast_1d / .atleast_2d?
It would also have to include an .atmost_Nd functionality of some  
sorts, to replace the .squeeze(), generally a good idea (e.g.  
something like np.atleast_Nd(X, ndmin=0, ndmax=-1), where the default  
is not to reduce the maximum number of dimensions...).
But for the io routines the situation is a bit more complex, since  
different shapes are expected to be returned  depending on the text  
input (e.g. (1, N) for a single row vs. (N, 1) for a single column of  
data).

Cheers,
Derek
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
On 31 May 2011, at 17:33, Bruce Southey wrote:

>>> It certainly would make sense to provide the same functionality for
>>> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt),  
>>> so
>>> I'd go ahead and file a feature (enhancement) request. I can't  
>>> promise
>>> I can take care of it myself, as I am less familiar with genfromtxt,
>>> but I'd certainly have a look at it.
>> Oh, that shouldn't be too difficult: 'ndmin' tells whether the  
>> array must be squeezed before being returned, right ? You can add  
>> some test at the very end of genfromtxt to check what to do with  
>> the output (whether to squeeze it or not, whether to transpose it  
>> or not)... If you don't mind doing it, I'd be quite grateful (I  
>> don't have time to work on numpy these days, much to my regret).  
>> Don't forget to change the user manual as well...
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
> (Different function so different ticket.)
>
> Sure you can change the end of the code but that may hide various
> problem. Unlike loadtxt, genfromtxt has a lot of flexibility  
> especially
> handling missing values and using converter functions. So I think that
> some examples must be provided that can not be handled by providing a
> suitable converter or that require multiple assumptions about input  
> file
> (such as having more than one delimiter).

I think stuff like multiple delimiters should have been dealt with  
before, as the right place to insert the ndmin code (which includes  
the decision to squeeze or not to squeeze as well as to add additional  
dimensions, if required) would be right at the end before the 'unpack'  
switch, or  rather replacing the bit:

 if usemask:
 output = output.view(MaskedArray)
 output._mask = outputmask
 if unpack:
 return output.squeeze().T
 return output.squeeze()

But there it's already not clear to me how to deal with the  
MaskedArray case...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Benjamin Root
On Tue, May 31, 2011 at 10:33 AM, Bruce Southey  wrote:

> On 05/31/2011 10:18 AM, Pierre GM wrote:
> > On May 31, 2011, at 4:53 PM, Derek Homeier wrote:
> >
> >> Hi Chris,
> >>
> >> On 31 May 2011, at 13:56, cgraves wrote:
> >>
> >>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin
> >>> option,
> >>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
> >>> Should they have inherited the option? Who can make it happen?
> >> you are mistaken, genfromtxt is not using loadtxt (and could not
> >> possibly, since it has the more complex parser to handle missing
> >> data); thus ndmin could not be inherited automatically.
> >> It certainly would make sense to provide the same functionality for
> >> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so
> >> I'd go ahead and file a feature (enhancement) request. I can't promise
> >> I can take care of it myself, as I am less familiar with genfromtxt,
> >> but I'd certainly have a look at it.
> > Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must
> be squeezed before being returned, right ? You can add some test at the very
> end of genfromtxt to check what to do with the output (whether to squeeze it
> or not, whether to transpose it or not)... If you don't mind doing it, I'd
> be quite grateful (I don't have time to work on numpy these days, much to my
> regret). Don't forget to change the user manual as well...
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@scipy.org
> > http://mail.scipy.org/mailman/listinfo/numpy-discussion
> (Different function so different ticket.)
>
> Sure you can change the end of the code but that may hide various
> problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially
> handling missing values and using converter functions. So I think that
> some examples must be provided that can not be handled by providing a
> suitable converter or that require multiple assumptions about input file
> (such as having more than one delimiter).
>
> Bruce
>

At this point, I wonder if it might be smarter to create a .atleast_Nd()
function and use that everywhere it is needed.  Having similar logic
tailored for each loading function might be a little dangerous if bug fixes
are made to one, but not the others.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Bruce Southey
On 05/31/2011 10:18 AM, Pierre GM wrote:
> On May 31, 2011, at 4:53 PM, Derek Homeier wrote:
>
>> Hi Chris,
>>
>> On 31 May 2011, at 13:56, cgraves wrote:
>>
>>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin
>>> option,
>>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
>>> Should they have inherited the option? Who can make it happen?
>> you are mistaken, genfromtxt is not using loadtxt (and could not
>> possibly, since it has the more complex parser to handle missing
>> data); thus ndmin could not be inherited automatically.
>> It certainly would make sense to provide the same functionality for
>> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so
>> I'd go ahead and file a feature (enhancement) request. I can't promise
>> I can take care of it myself, as I am less familiar with genfromtxt,
>> but I'd certainly have a look at it.
> Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be 
> squeezed before being returned, right ? You can add some test at the very end 
> of genfromtxt to check what to do with the output (whether to squeeze it or 
> not, whether to transpose it or not)... If you don't mind doing it, I'd be 
> quite grateful (I don't have time to work on numpy these days, much to my 
> regret). Don't forget to change the user manual as well...
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
(Different function so different ticket.)

Sure you can change the end of the code but that may hide various 
problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially 
handling missing values and using converter functions. So I think that 
some examples must be provided that can not be handled by providing a 
suitable converter or that require multiple assumptions about input file 
(such as having more than one delimiter).

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Pierre GM

On May 31, 2011, at 4:53 PM, Derek Homeier wrote:

> Hi Chris,
> 
> On 31 May 2011, at 13:56, cgraves wrote:
> 
>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
>> option,
>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
>> Should they have inherited the option? Who can make it happen?
> 
> you are mistaken, genfromtxt is not using loadtxt (and could not  
> possibly, since it has the more complex parser to handle missing  
> data); thus ndmin could not be inherited automatically.
> It certainly would make sense to provide the same functionality for  
> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
> I'd go ahead and file a feature (enhancement) request. I can't promise  
> I can take care of it myself, as I am less familiar with genfromtxt,  
> but I'd certainly have a look at it.

Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be 
squeezed before being returned, right ? You can add some test at the very end 
of genfromtxt to check what to do with the output (whether to squeeze it or 
not, whether to transpose it or not)... If you don't mind doing it, I'd be 
quite grateful (I don't have time to work on numpy these days, much to my 
regret). Don't forget to change the user manual as well...
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread Derek Homeier
Hi Chris,

On 31 May 2011, at 13:56, cgraves wrote:

> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin  
> option,
> however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
> Should they have inherited the option? Who can make it happen?

you are mistaken, genfromtxt is not using loadtxt (and could not  
possibly, since it has the more complex parser to handle missing  
data); thus ndmin could not be inherited automatically.
It certainly would make sense to provide the same functionality for  
genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so  
I'd go ahead and file a feature (enhancement) request. I can't promise  
I can take care of it myself, as I am less familiar with genfromtxt,  
but I'd certainly have a look at it.

Does anyone have an opinion whether this is a case for reopening (yet  
again...)
http://projects.scipy.org/numpy/ticket/1562
or create a new ticket?

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-31 Thread cgraves


Ralf Gommers-2 wrote:
> 
> On Fri, May 6, 2011 at 12:57 PM, Derek Homeier <
> de...@astro.physik.uni-goettingen.de> wrote:
> 
>>
>> On 6 May 2011, at 07:53, Ralf Gommers wrote:
>>
>> >
>> > >> Looks okay, and I agree that it's better to fix it now. The timing
>> > >> is a bit unfortunate though, just after RC2. I'll have closer look
>> > >> tomorrow and if it can go in, probably tag RC3.
>> > >>
>> > >> If in the meantime a few more people could test this, that would be
>> > >> helpful.
>> > >>
>> > >> Ralf
>> > >
>> > > I agree, wish I had time to push this before rc2. I could add the
>> > > explanatory comments
>> > > mentioned above and switch to use the atleast_[12]d() solution, test
>> > > that and push it
>> > > in a couple of minutes, or should I better leave it as is now for
>> > > testing?
>> >
>> > Quick follow-up: I just applied the above changes, added some tests to
>> > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
>> > i386+ppc
>> > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
>> > and do
>> > my (first) pull request...
>> >
>> > Go ahead, I'll have a look at it tonight. Thanks for testing on
>> > several Pythons, that definitely helps.
>>
>>
>> Done, the request only appears on my repo
>> https://github.com/dhomeier/numpy/
>>
>> is that correct? If someone could test it on Linux and Windows as
>> well...
>>
> 
> Committed, thanks for all the work.
> 
> The pull request was in the wrong place, that's a minor flaw in the github
> UI. After you press "Pull Request" you need to read the small print to see
> where it's going.
> 
> Cheers,
> Ralf
> 
> 


Dear all,

I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin option,
however neither genfromtxt nor recfromtxt, which use loadtxt, have it.
Should they have inherited the option? Who can make it happen?

Best,
Chris 

-- 
View this message in context: 
http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31740152.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-07 Thread Ralf Gommers
On Fri, May 6, 2011 at 12:57 PM, Derek Homeier <
de...@astro.physik.uni-goettingen.de> wrote:

>
> On 6 May 2011, at 07:53, Ralf Gommers wrote:
>
> >
> > >> Looks okay, and I agree that it's better to fix it now. The timing
> > >> is a bit unfortunate though, just after RC2. I'll have closer look
> > >> tomorrow and if it can go in, probably tag RC3.
> > >>
> > >> If in the meantime a few more people could test this, that would be
> > >> helpful.
> > >>
> > >> Ralf
> > >
> > > I agree, wish I had time to push this before rc2. I could add the
> > > explanatory comments
> > > mentioned above and switch to use the atleast_[12]d() solution, test
> > > that and push it
> > > in a couple of minutes, or should I better leave it as is now for
> > > testing?
> >
> > Quick follow-up: I just applied the above changes, added some tests to
> > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
> > i386+ppc
> > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
> > and do
> > my (first) pull request...
> >
> > Go ahead, I'll have a look at it tonight. Thanks for testing on
> > several Pythons, that definitely helps.
>
>
> Done, the request only appears on my repo
> https://github.com/dhomeier/numpy/
>
> is that correct? If someone could test it on Linux and Windows as
> well...
>

Committed, thanks for all the work.

The pull request was in the wrong place, that's a minor flaw in the github
UI. After you press "Pull Request" you need to read the small print to see
where it's going.

Cheers,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-06 Thread Derek Homeier

On 6 May 2011, at 07:53, Ralf Gommers wrote:

>
> >> Looks okay, and I agree that it's better to fix it now. The timing
> >> is a bit unfortunate though, just after RC2. I'll have closer look
> >> tomorrow and if it can go in, probably tag RC3.
> >>
> >> If in the meantime a few more people could test this, that would be
> >> helpful.
> >>
> >> Ralf
> >
> > I agree, wish I had time to push this before rc2. I could add the
> > explanatory comments
> > mentioned above and switch to use the atleast_[12]d() solution, test
> > that and push it
> > in a couple of minutes, or should I better leave it as is now for
> > testing?
>
> Quick follow-up: I just applied the above changes, added some tests to
> cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
> i386+ppc
> + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
> and do
> my (first) pull request...
>
> Go ahead, I'll have a look at it tonight. Thanks for testing on  
> several Pythons, that definitely helps.


Done, the request only appears on my repo
https://github.com/dhomeier/numpy/

is that correct? If someone could test it on Linux and Windows as  
well...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Fri, May 6, 2011 at 12:12 AM, Derek Homeier <
de...@astro.physik.uni-goettingen.de> wrote:

>
> On 5 May 2011, at 22:53, Derek Homeier wrote:
>
> >>
> >> However, the problem that ndmin is supposed to address is not fixed
> >> by the current implementation for the rc.  Essentially, a single-
> >> row, multi-column file with ndmin=2 comes out as a Nx1 array which
> >> is the same result for a multi-row, single-column file.  My feeling
> >> is that if we let the current implementation stand as is, and
> >> developers use it in their code, then fixing it in a later release
> >> would introduce more problems (maybe the devels would transpose the
> >> result themselves or something).  Better to fix it now in rc with
> >> the two lines of code (and the correction to the tests), then to
> >> introduce a buggy feature that will be hard to fix in future
> >> releases, IMHO.
> >>
> >> Looks okay, and I agree that it's better to fix it now. The timing
> >> is a bit unfortunate though, just after RC2. I'll have closer look
> >> tomorrow and if it can go in, probably tag RC3.
> >>
> >> If in the meantime a few more people could test this, that would be
> >> helpful.
> >>
> >> Ralf
> >
> > I agree, wish I had time to push this before rc2. I could add the
> > explanatory comments
> > mentioned above and switch to use the atleast_[12]d() solution, test
> > that and push it
> > in a couple of minutes, or should I better leave it as is now for
> > testing?
>
> Quick follow-up: I just applied the above changes, added some tests to
> cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
> i386+ppc
> + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
> and do
> my (first) pull request...
>

Go ahead, I'll have a look at it tonight. Thanks for testing on several
Pythons, that definitely helps.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Derek Homeier

On 5 May 2011, at 22:53, Derek Homeier wrote:

>>
>> However, the problem that ndmin is supposed to address is not fixed
>> by the current implementation for the rc.  Essentially, a single-
>> row, multi-column file with ndmin=2 comes out as a Nx1 array which
>> is the same result for a multi-row, single-column file.  My feeling
>> is that if we let the current implementation stand as is, and
>> developers use it in their code, then fixing it in a later release
>> would introduce more problems (maybe the devels would transpose the
>> result themselves or something).  Better to fix it now in rc with
>> the two lines of code (and the correction to the tests), then to
>> introduce a buggy feature that will be hard to fix in future
>> releases, IMHO.
>>
>> Looks okay, and I agree that it's better to fix it now. The timing
>> is a bit unfortunate though, just after RC2. I'll have closer look
>> tomorrow and if it can go in, probably tag RC3.
>>
>> If in the meantime a few more people could test this, that would be
>> helpful.
>>
>> Ralf
>
> I agree, wish I had time to push this before rc2. I could add the
> explanatory comments
> mentioned above and switch to use the atleast_[12]d() solution, test
> that and push it
> in a couple of minutes, or should I better leave it as is now for
> testing?

Quick follow-up: I just applied the above changes, added some tests to
cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5  
i386+ppc
+ 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo  
and do
my (first) pull request...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Derek Homeier
On 5 May 2011, at 22:31, Ralf Gommers wrote:

>
> On Thu, May 5, 2011 at 9:46 PM, Benjamin Root  wrote:
>
>
> On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers  > wrote:
>
>
> On Thu, May 5, 2011 at 9:18 PM, Benjamin Root  wrote:
>
>
> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
>  > wrote:
>
> On 5. mai 2011, at 08.49, Benjamin Root wrote:
>
> >
> >
> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
> >  > wrote:
> >
> > On 4. mai 2011, at 20.33, Benjamin Root wrote:
> >
> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
> > >  > wrote:
> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> > >
> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions  
> written for this? Shouldn't we reuse them? Perhaps it's overkill,  
> and perhaps it will reintroduce the 'transposed' problem?
> > >
> > > Yes, good point, one could replace the
> > > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > > but for the ndmin=2 case, we'd need to replace
> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > > not sure which solution is more efficient in terms of memory  
> access etc...
> > >
> > > Cheers,
> > >Derek
> > >
> > >
> > > I can confirm that the current behavior is not sufficient for  
> all of the original corner cases that ndmin was supposed to  
> address.  Keep in mind that np.loadtxt takes a one-column data file  
> and a one-row data file down to the same shape.  I don't see how the  
> current code is able to produce the correct array shape when  
> ndmin=2.  Do we have some sort of counter in loadtxt for counting  
> the number of rows and columns read?  Could we use those to help  
> guide the ndmin=2 case?
> > >
> > > I think that using atleast_1d(X) might be a bit overkill, but it  
> would be very clear as to the code's intent.  I don't think we have  
> to worry about memory usage if we limit its use to only situations  
> where ndmin is greater than the number of dimensions of the array.   
> In those cases, the array is either an empty result, a scalar value  
> (in which memory access is trivial), or 1-d (in which a transpose is  
> cheap).
> >
> > What if one does things the other way around - avoid calling  
> squeeze until _after_ doing the atleast_Nd() magic? That way the row/ 
> column information should be conserved, right? Also, we avoid  
> transposing, memory use, ...
> >
> > Oh, and someone could conceivably have a _looong_ 1D file, but  
> would want it read as a 2D array.
> >
> > Paul
> >
> >
> >
> > @Derek, good catch with noticing the error in the tests. We do  
> still need to handle the case I mentioned, however.  I have attached  
> an example script to demonstrate the issue.  In this script, I would  
> expect the second-to-last array to be a shape of (1, 5).  I believe  
> that the single-row, multi-column case would actually be the more  
> common type of edge-case encountered by users than the others.   
> Therefore, I believe that this ndmin fix is not adequate until this  
> is addressed.
> >
> > @Paul, we can't call squeeze after doing the atleast_Nd() magic.   
> That would just undo whatever we had just done.  Also, wrt the  
> transpose, a (1, 10) array looks the same in memory as a  
> (10, 1) array, right?
> Agree. I thought more along the lines of (pseudocode-ish)
> if ndmin == 0:
>squeeze()
> if ndmin == 1:
>atleast_1D()
> elif ndmin == 2:
>atleast_2D()
> else:
>I don't rightly know what would go here, maybe raise  
> ValueError?
>
> That would avoid the squeeze call before the atleast_Nd magic. But  
> the code was changed, so I think my comment doesn't make sense  
> anymore. It's probably fine the way it is!
>
> Paul
>
>
> I have thought of that too, but the problem with that approach is  
> that after reading the file, X will have 2 or 3 dimensions,  
> regardless of how many singleton dims were in the file.  A squeeze  
> will always be needed.  Also, the purpose of squeeze is opposite  
> that of the atleast_*d() functions:  squeeze reduces dimensions,  
> while atleast_*d will add dimensions.
>
> Therefore, I re-iterate... the patch by Derek gets the job done.  I  
> have tested it for a wide variety of inputs for both regular arrays  
> and record arrays.  Is there room for improvements?  Yes, but I  
> think that can wait for later.  Derek's patch however fixes an  
> important bug in the ndmin implementation and should be included for  
> the release.
>
Thanks for the additional tests and discussion, Paul and Ben! Yes, I  
think the
main issue is that multi-column input first comes back as 3- 
dimensional, with
a "singleton" dimension in front, so I was getting rid of that first -  
maybe could
add a comment on that. Single-column, multi-row apparently already come
back as 1-dimensional, so we only need the expansion there.

> Two questions: can you point me to the patch/ticket, and is this a  
> regression?
>
> Thanks,
> Ralf
>
This wa

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:46 PM, Benjamin Root  wrote:

>
>
> On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers 
> wrote:
>
>>
>>
>> On Thu, May 5, 2011 at 9:18 PM, Benjamin Root  wrote:
>>
>>>
>>>
>>> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes <
>>> paul.anton.let...@gmail.com> wrote:
>>>

 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 >
 >
 > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
 paul.anton.let...@gmail.com> wrote:
 >
 > On 4. mai 2011, at 20.33, Benjamin Root wrote:
 >
 > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
 de...@astro.physik.uni-goettingen.de> wrote:
 > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 > >
 > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
 > >
 > > Yes, good point, one could replace the
 > > X.shape = (X.size, ) with X = np.atleast_1d(X),
 > > but for the ndmin=2 case, we'd need to replace
 > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
 > > not sure which solution is more efficient in terms of memory access
 etc...
 > >
 > > Cheers,
 > >Derek
 > >
 > >
 > > I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down 
 to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
 > >
 > > I think that using atleast_1d(X) might be a bit overkill, but it
 would be very clear as to the code's intent.  I don't think we have to 
 worry
 about memory usage if we limit its use to only situations where ndmin is
 greater than the number of dimensions of the array.  In those cases, the
 array is either an empty result, a scalar value (in which memory access is
 trivial), or 1-d (in which a transpose is cheap).
 >
 > What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 >
 > Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 >
 > Paul
 >
 >
 >
 > @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 >
 > @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


>>> I have thought of that too, but the problem with that approach is that
>>> after reading the file, X will have 2 or 3 dimensions, regardless of how
>>> many singleton dims were in the file.  A squeeze will always be needed.
>>> Also, the purpose of squeeze is opposite that of the atleast_*d()
>>> functions:  squeeze reduces dimensions, while atleast_*d will add
>>> dimensions.
>>>
>>> Therefore, I re-iterate... the patch by Derek gets the job done.  I have
>>> tested it for a wide variety of inputs for both regular arrays and record
>>> arrays.  Is there room for improvements?  Yes, but I think that can wait for
>>> later.  Derek's patch however fixes an important bug in the ndmin
>>> implementation and should be included for the release.
>>>
>>> Two questions: can you point me to the patch/ticket, and is this a
>> regression?
>>
>> Thanks,
>> Ralf
>>
>>
>>
> I don't know if he did a pull-request or not, but here is the link he
> provided earlier in the thread.
>
> https://github.com/

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers wrote:

>
>
> On Thu, May 5, 2011 at 9:18 PM, Benjamin Root  wrote:
>
>>
>>
>> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes <
>> paul.anton.let...@gmail.com> wrote:
>>
>>>
>>> On 5. mai 2011, at 08.49, Benjamin Root wrote:
>>>
>>> >
>>> >
>>> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
>>> paul.anton.let...@gmail.com> wrote:
>>> >
>>> > On 4. mai 2011, at 20.33, Benjamin Root wrote:
>>> >
>>> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
>>> de...@astro.physik.uni-goettingen.de> wrote:
>>> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>>> > >
>>> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
>>> written for this? Shouldn't we reuse them? Perhaps it's overkill, and
>>> perhaps it will reintroduce the 'transposed' problem?
>>> > >
>>> > > Yes, good point, one could replace the
>>> > > X.shape = (X.size, ) with X = np.atleast_1d(X),
>>> > > but for the ndmin=2 case, we'd need to replace
>>> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
>>> > > not sure which solution is more efficient in terms of memory access
>>> etc...
>>> > >
>>> > > Cheers,
>>> > >Derek
>>> > >
>>> > >
>>> > > I can confirm that the current behavior is not sufficient for all of
>>> the original corner cases that ndmin was supposed to address.  Keep in mind
>>> that np.loadtxt takes a one-column data file and a one-row data file down to
>>> the same shape.  I don't see how the current code is able to produce the
>>> correct array shape when ndmin=2.  Do we have some sort of counter in
>>> loadtxt for counting the number of rows and columns read?  Could we use
>>> those to help guide the ndmin=2 case?
>>> > >
>>> > > I think that using atleast_1d(X) might be a bit overkill, but it
>>> would be very clear as to the code's intent.  I don't think we have to worry
>>> about memory usage if we limit its use to only situations where ndmin is
>>> greater than the number of dimensions of the array.  In those cases, the
>>> array is either an empty result, a scalar value (in which memory access is
>>> trivial), or 1-d (in which a transpose is cheap).
>>> >
>>> > What if one does things the other way around - avoid calling squeeze
>>> until _after_ doing the atleast_Nd() magic? That way the row/column
>>> information should be conserved, right? Also, we avoid transposing, memory
>>> use, ...
>>> >
>>> > Oh, and someone could conceivably have a _looong_ 1D file, but would
>>> want it read as a 2D array.
>>> >
>>> > Paul
>>> >
>>> >
>>> >
>>> > @Derek, good catch with noticing the error in the tests. We do still
>>> need to handle the case I mentioned, however.  I have attached an example
>>> script to demonstrate the issue.  In this script, I would expect the
>>> second-to-last array to be a shape of (1, 5).  I believe that the
>>> single-row, multi-column case would actually be the more common type of
>>> edge-case encountered by users than the others.  Therefore, I believe that
>>> this ndmin fix is not adequate until this is addressed.
>>> >
>>> > @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
>>> would just undo whatever we had just done.  Also, wrt the transpose, a (1,
>>> 10) array looks the same in memory as a (10, 1) array, right?
>>> Agree. I thought more along the lines of (pseudocode-ish)
>>> if ndmin == 0:
>>>squeeze()
>>> if ndmin == 1:
>>>atleast_1D()
>>> elif ndmin == 2:
>>>atleast_2D()
>>> else:
>>>I don't rightly know what would go here, maybe raise ValueError?
>>>
>>> That would avoid the squeeze call before the atleast_Nd magic. But the
>>> code was changed, so I think my comment doesn't make sense anymore. It's
>>> probably fine the way it is!
>>>
>>> Paul
>>>
>>>
>> I have thought of that too, but the problem with that approach is that
>> after reading the file, X will have 2 or 3 dimensions, regardless of how
>> many singleton dims were in the file.  A squeeze will always be needed.
>> Also, the purpose of squeeze is opposite that of the atleast_*d()
>> functions:  squeeze reduces dimensions, while atleast_*d will add
>> dimensions.
>>
>> Therefore, I re-iterate... the patch by Derek gets the job done.  I have
>> tested it for a wide variety of inputs for both regular arrays and record
>> arrays.  Is there room for improvements?  Yes, but I think that can wait for
>> later.  Derek's patch however fixes an important bug in the ndmin
>> implementation and should be included for the release.
>>
>> Two questions: can you point me to the patch/ticket, and is this a
> regression?
>
> Thanks,
> Ralf
>
>
>
I don't know if he did a pull-request or not, but here is the link he
provided earlier in the thread.

https://github.com/dhomeier/numpy/compare/master...ndmin-cols

Technically, this is not a "regression" as the ndmin feature is new in this
release.  However, the problem that ndmin is supposed to address is not
fixed 

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root  wrote:

>
>
> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes <
> paul.anton.let...@gmail.com> wrote:
>
>>
>> On 5. mai 2011, at 08.49, Benjamin Root wrote:
>>
>> >
>> >
>> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
>> paul.anton.let...@gmail.com> wrote:
>> >
>> > On 4. mai 2011, at 20.33, Benjamin Root wrote:
>> >
>> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
>> de...@astro.physik.uni-goettingen.de> wrote:
>> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>> > >
>> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
>> written for this? Shouldn't we reuse them? Perhaps it's overkill, and
>> perhaps it will reintroduce the 'transposed' problem?
>> > >
>> > > Yes, good point, one could replace the
>> > > X.shape = (X.size, ) with X = np.atleast_1d(X),
>> > > but for the ndmin=2 case, we'd need to replace
>> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
>> > > not sure which solution is more efficient in terms of memory access
>> etc...
>> > >
>> > > Cheers,
>> > >Derek
>> > >
>> > >
>> > > I can confirm that the current behavior is not sufficient for all of
>> the original corner cases that ndmin was supposed to address.  Keep in mind
>> that np.loadtxt takes a one-column data file and a one-row data file down to
>> the same shape.  I don't see how the current code is able to produce the
>> correct array shape when ndmin=2.  Do we have some sort of counter in
>> loadtxt for counting the number of rows and columns read?  Could we use
>> those to help guide the ndmin=2 case?
>> > >
>> > > I think that using atleast_1d(X) might be a bit overkill, but it would
>> be very clear as to the code's intent.  I don't think we have to worry about
>> memory usage if we limit its use to only situations where ndmin is greater
>> than the number of dimensions of the array.  In those cases, the array is
>> either an empty result, a scalar value (in which memory access is trivial),
>> or 1-d (in which a transpose is cheap).
>> >
>> > What if one does things the other way around - avoid calling squeeze
>> until _after_ doing the atleast_Nd() magic? That way the row/column
>> information should be conserved, right? Also, we avoid transposing, memory
>> use, ...
>> >
>> > Oh, and someone could conceivably have a _looong_ 1D file, but would
>> want it read as a 2D array.
>> >
>> > Paul
>> >
>> >
>> >
>> > @Derek, good catch with noticing the error in the tests. We do still
>> need to handle the case I mentioned, however.  I have attached an example
>> script to demonstrate the issue.  In this script, I would expect the
>> second-to-last array to be a shape of (1, 5).  I believe that the
>> single-row, multi-column case would actually be the more common type of
>> edge-case encountered by users than the others.  Therefore, I believe that
>> this ndmin fix is not adequate until this is addressed.
>> >
>> > @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
>> would just undo whatever we had just done.  Also, wrt the transpose, a (1,
>> 10) array looks the same in memory as a (10, 1) array, right?
>> Agree. I thought more along the lines of (pseudocode-ish)
>> if ndmin == 0:
>>squeeze()
>> if ndmin == 1:
>>atleast_1D()
>> elif ndmin == 2:
>>atleast_2D()
>> else:
>>I don't rightly know what would go here, maybe raise ValueError?
>>
>> That would avoid the squeeze call before the atleast_Nd magic. But the
>> code was changed, so I think my comment doesn't make sense anymore. It's
>> probably fine the way it is!
>>
>> Paul
>>
>>
> I have thought of that too, but the problem with that approach is that
> after reading the file, X will have 2 or 3 dimensions, regardless of how
> many singleton dims were in the file.  A squeeze will always be needed.
> Also, the purpose of squeeze is opposite that of the atleast_*d()
> functions:  squeeze reduces dimensions, while atleast_*d will add
> dimensions.
>
> Therefore, I re-iterate... the patch by Derek gets the job done.  I have
> tested it for a wide variety of inputs for both regular arrays and record
> arrays.  Is there room for improvements?  Yes, but I think that can wait for
> later.  Derek's patch however fixes an important bug in the ndmin
> implementation and should be included for the release.
>
> Two questions: can you point me to the patch/ticket, and is this a
regression?

Thanks,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes <
paul.anton.let...@gmail.com> wrote:

>
> On 5. mai 2011, at 08.49, Benjamin Root wrote:
>
> >
> >
> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
> paul.anton.let...@gmail.com> wrote:
> >
> > On 4. mai 2011, at 20.33, Benjamin Root wrote:
> >
> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
> de...@astro.physik.uni-goettingen.de> wrote:
> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> > >
> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
> written for this? Shouldn't we reuse them? Perhaps it's overkill, and
> perhaps it will reintroduce the 'transposed' problem?
> > >
> > > Yes, good point, one could replace the
> > > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > > but for the ndmin=2 case, we'd need to replace
> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > > not sure which solution is more efficient in terms of memory access
> etc...
> > >
> > > Cheers,
> > >Derek
> > >
> > >
> > > I can confirm that the current behavior is not sufficient for all of
> the original corner cases that ndmin was supposed to address.  Keep in mind
> that np.loadtxt takes a one-column data file and a one-row data file down to
> the same shape.  I don't see how the current code is able to produce the
> correct array shape when ndmin=2.  Do we have some sort of counter in
> loadtxt for counting the number of rows and columns read?  Could we use
> those to help guide the ndmin=2 case?
> > >
> > > I think that using atleast_1d(X) might be a bit overkill, but it would
> be very clear as to the code's intent.  I don't think we have to worry about
> memory usage if we limit its use to only situations where ndmin is greater
> than the number of dimensions of the array.  In those cases, the array is
> either an empty result, a scalar value (in which memory access is trivial),
> or 1-d (in which a transpose is cheap).
> >
> > What if one does things the other way around - avoid calling squeeze
> until _after_ doing the atleast_Nd() magic? That way the row/column
> information should be conserved, right? Also, we avoid transposing, memory
> use, ...
> >
> > Oh, and someone could conceivably have a _looong_ 1D file, but would want
> it read as a 2D array.
> >
> > Paul
> >
> >
> >
> > @Derek, good catch with noticing the error in the tests. We do still need
> to handle the case I mentioned, however.  I have attached an example script
> to demonstrate the issue.  In this script, I would expect the second-to-last
> array to be a shape of (1, 5).  I believe that the single-row, multi-column
> case would actually be the more common type of edge-case encountered by
> users than the others.  Therefore, I believe that this ndmin fix is not
> adequate until this is addressed.
> >
> > @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
> would just undo whatever we had just done.  Also, wrt the transpose, a (1,
> 10) array looks the same in memory as a (10, 1) array, right?
> Agree. I thought more along the lines of (pseudocode-ish)
> if ndmin == 0:
>squeeze()
> if ndmin == 1:
>atleast_1D()
> elif ndmin == 2:
>atleast_2D()
> else:
>I don't rightly know what would go here, maybe raise ValueError?
>
> That would avoid the squeeze call before the atleast_Nd magic. But the code
> was changed, so I think my comment doesn't make sense anymore. It's probably
> fine the way it is!
>
> Paul
>
>
I have thought of that too, but the problem with that approach is that after
reading the file, X will have 2 or 3 dimensions, regardless of how many
singleton dims were in the file.  A squeeze will always be needed.  Also,
the purpose of squeeze is opposite that of the atleast_*d() functions:
squeeze reduces dimensions, while atleast_*d will add dimensions.

Therefore, I re-iterate... the patch by Derek gets the job done.  I have
tested it for a wide variety of inputs for both regular arrays and record
arrays.  Is there room for improvements?  Yes, but I think that can wait for
later.  Derek's patch however fixes an important bug in the ndmin
implementation and should be included for the release.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Paul Anton Letnes

On 5. mai 2011, at 08.49, Benjamin Root wrote:

> 
> 
> On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
>  wrote:
> 
> On 4. mai 2011, at 20.33, Benjamin Root wrote:
> 
> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
> >  wrote:
> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> >
> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written 
> > > for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it 
> > > will reintroduce the 'transposed' problem?
> >
> > Yes, good point, one could replace the
> > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > but for the ndmin=2 case, we'd need to replace
> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > not sure which solution is more efficient in terms of memory access etc...
> >
> > Cheers,
> >Derek
> >
> >
> > I can confirm that the current behavior is not sufficient for all of the 
> > original corner cases that ndmin was supposed to address.  Keep in mind 
> > that np.loadtxt takes a one-column data file and a one-row data file down 
> > to the same shape.  I don't see how the current code is able to produce the 
> > correct array shape when ndmin=2.  Do we have some sort of counter in 
> > loadtxt for counting the number of rows and columns read?  Could we use 
> > those to help guide the ndmin=2 case?
> >
> > I think that using atleast_1d(X) might be a bit overkill, but it would be 
> > very clear as to the code's intent.  I don't think we have to worry about 
> > memory usage if we limit its use to only situations where ndmin is greater 
> > than the number of dimensions of the array.  In those cases, the array is 
> > either an empty result, a scalar value (in which memory access is trivial), 
> > or 1-d (in which a transpose is cheap).
> 
> What if one does things the other way around - avoid calling squeeze until 
> _after_ doing the atleast_Nd() magic? That way the row/column information 
> should be conserved, right? Also, we avoid transposing, memory use, ...
> 
> Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
> read as a 2D array.
> 
> Paul
> 
> 
> 
> @Derek, good catch with noticing the error in the tests. We do still need to 
> handle the case I mentioned, however.  I have attached an example script to 
> demonstrate the issue.  In this script, I would expect the second-to-last 
> array to be a shape of (1, 5).  I believe that the single-row, multi-column 
> case would actually be the more common type of edge-case encountered by users 
> than the others.  Therefore, I believe that this ndmin fix is not adequate 
> until this is addressed.
> 
> @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would 
> just undo whatever we had just done.  Also, wrt the transpose, a (1, 10) 
> array looks the same in memory as a (10, 1) array, right?
Agree. I thought more along the lines of (pseudocode-ish)
if ndmin == 0:
squeeze()
if ndmin == 1:
atleast_1D()
elif ndmin == 2:
atleast_2D()
else:
I don't rightly know what would go here, maybe raise ValueError?

That would avoid the squeeze call before the atleast_Nd magic. But the code was 
changed, so I think my comment doesn't make sense anymore. It's probably fine 
the way it is!

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 10:49 AM, Benjamin Root  wrote:

>
>
> On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
> paul.anton.let...@gmail.com> wrote:
>
>>
>> On 4. mai 2011, at 20.33, Benjamin Root wrote:
>>
>> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
>> de...@astro.physik.uni-goettingen.de> wrote:
>> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>> >
>> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
>> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
>> will reintroduce the 'transposed' problem?
>> >
>> > Yes, good point, one could replace the
>> > X.shape = (X.size, ) with X = np.atleast_1d(X),
>> > but for the ndmin=2 case, we'd need to replace
>> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
>> > not sure which solution is more efficient in terms of memory access
>> etc...
>> >
>> > Cheers,
>> >Derek
>> >
>> >
>> > I can confirm that the current behavior is not sufficient for all of the
>> original corner cases that ndmin was supposed to address.  Keep in mind that
>> np.loadtxt takes a one-column data file and a one-row data file down to the
>> same shape.  I don't see how the current code is able to produce the correct
>> array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
>> counting the number of rows and columns read?  Could we use those to help
>> guide the ndmin=2 case?
>> >
>> > I think that using atleast_1d(X) might be a bit overkill, but it would
>> be very clear as to the code's intent.  I don't think we have to worry about
>> memory usage if we limit its use to only situations where ndmin is greater
>> than the number of dimensions of the array.  In those cases, the array is
>> either an empty result, a scalar value (in which memory access is trivial),
>> or 1-d (in which a transpose is cheap).
>>
>> What if one does things the other way around - avoid calling squeeze until
>> _after_ doing the atleast_Nd() magic? That way the row/column information
>> should be conserved, right? Also, we avoid transposing, memory use, ...
>>
>> Oh, and someone could conceivably have a _looong_ 1D file, but would want
>> it read as a 2D array.
>>
>> Paul
>>
>>
>>
> @Derek, good catch with noticing the error in the tests. We do still need
> to handle the case I mentioned, however.  I have attached an example script
> to demonstrate the issue.  In this script, I would expect the second-to-last
> array to be a shape of (1, 5).  I believe that the single-row, multi-column
> case would actually be the more common type of edge-case encountered by
> users than the others.  Therefore, I believe that this ndmin fix is not
> adequate until this is addressed.
>
>
Apologies Derek, your patch does address the issue I raised.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes <
paul.anton.let...@gmail.com> wrote:

>
> On 4. mai 2011, at 20.33, Benjamin Root wrote:
>
> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
> de...@astro.physik.uni-goettingen.de> wrote:
> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> >
> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
> will reintroduce the 'transposed' problem?
> >
> > Yes, good point, one could replace the
> > X.shape = (X.size, ) with X = np.atleast_1d(X),
> > but for the ndmin=2 case, we'd need to replace
> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> > not sure which solution is more efficient in terms of memory access
> etc...
> >
> > Cheers,
> >Derek
> >
> >
> > I can confirm that the current behavior is not sufficient for all of the
> original corner cases that ndmin was supposed to address.  Keep in mind that
> np.loadtxt takes a one-column data file and a one-row data file down to the
> same shape.  I don't see how the current code is able to produce the correct
> array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
> counting the number of rows and columns read?  Could we use those to help
> guide the ndmin=2 case?
> >
> > I think that using atleast_1d(X) might be a bit overkill, but it would be
> very clear as to the code's intent.  I don't think we have to worry about
> memory usage if we limit its use to only situations where ndmin is greater
> than the number of dimensions of the array.  In those cases, the array is
> either an empty result, a scalar value (in which memory access is trivial),
> or 1-d (in which a transpose is cheap).
>
> What if one does things the other way around - avoid calling squeeze until
> _after_ doing the atleast_Nd() magic? That way the row/column information
> should be conserved, right? Also, we avoid transposing, memory use, ...
>
> Oh, and someone could conceivably have a _looong_ 1D file, but would want
> it read as a 2D array.
>
> Paul
>
>
>
@Derek, good catch with noticing the error in the tests. We do still need to
handle the case I mentioned, however.  I have attached an example script to
demonstrate the issue.  In this script, I would expect the second-to-last
array to be a shape of (1, 5).  I believe that the single-row, multi-column
case would actually be the more common type of edge-case encountered by
users than the others.  Therefore, I believe that this ndmin fix is not
adequate until this is addressed.

@Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would
just undo whatever we had just done.  Also, wrt the transpose, a (1, 10)
array looks the same in memory as a (10, 1) array, right?

Ben Root


loadtest.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Paul Anton Letnes

On 4. mai 2011, at 20.33, Benjamin Root wrote:

> On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
>  wrote:
> On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
> 
> > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
> > this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
> > reintroduce the 'transposed' problem?
> 
> Yes, good point, one could replace the
> X.shape = (X.size, ) with X = np.atleast_1d(X),
> but for the ndmin=2 case, we'd need to replace
> X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> not sure which solution is more efficient in terms of memory access etc...
> 
> Cheers,
>Derek
> 
> 
> I can confirm that the current behavior is not sufficient for all of the 
> original corner cases that ndmin was supposed to address.  Keep in mind that 
> np.loadtxt takes a one-column data file and a one-row data file down to the 
> same shape.  I don't see how the current code is able to produce the correct 
> array shape when ndmin=2.  Do we have some sort of counter in loadtxt for 
> counting the number of rows and columns read?  Could we use those to help 
> guide the ndmin=2 case?
> 
> I think that using atleast_1d(X) might be a bit overkill, but it would be 
> very clear as to the code's intent.  I don't think we have to worry about 
> memory usage if we limit its use to only situations where ndmin is greater 
> than the number of dimensions of the array.  In those cases, the array is 
> either an empty result, a scalar value (in which memory access is trivial), 
> or 1-d (in which a transpose is cheap).

What if one does things the other way around - avoid calling squeeze until 
_after_ doing the atleast_Nd() magic? That way the row/column information 
should be conserved, right? Also, we avoid transposing, memory use, ...

Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
read as a 2D array.

Paul


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Benjamin Root
On Wed, May 4, 2011 at 7:54 PM, Derek Homeier <
de...@astro.physik.uni-goettingen.de> wrote:

> On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
>
> > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
> will reintroduce the 'transposed' problem?
>
> Yes, good point, one could replace the
> X.shape = (X.size, ) with X = np.atleast_1d(X),
> but for the ndmin=2 case, we'd need to replace
> X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
> not sure which solution is more efficient in terms of memory access etc...
>
> Cheers,
> Derek
>
>
I can confirm that the current behavior is not sufficient for all of the
original corner cases that ndmin was supposed to address.  Keep in mind that
np.loadtxt takes a one-column data file and a one-row data file down to the
same shape.  I don't see how the current code is able to produce the correct
array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
counting the number of rows and columns read?  Could we use those to help
guide the ndmin=2 case?

I think that using atleast_1d(X) might be a bit overkill, but it would be
very clear as to the code's intent.  I don't think we have to worry about
memory usage if we limit its use to only situations where ndmin is greater
than the number of dimensions of the array.  In those cases, the array is
either an empty result, a scalar value (in which memory access is trivial),
or 1-d (in which a transpose is cheap).

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Derek Homeier
On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:

> But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
> this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
> reintroduce the 'transposed' problem?

Yes, good point, one could replace the 
X.shape = (X.size, ) with X = np.atleast_1d(X), 
but for the ndmin=2 case, we'd need to replace 
X.shape = (X.size, 1) with X = np.atleast_2d(X).T - 
not sure which solution is more efficient in terms of memory access etc...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Paul Anton Letnes

On 4. mai 2011, at 17.34, Derek Homeier wrote:

> Hi Paul,
> 
> I've got back to your suggestion re. the ndmin flag for loadtxt from a few 
> weeks ago...
> 
> On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote:
> 
 1562:
 I attach a possible patch. This could also be the default  
 behavior to my mind, since the function caller can simply call  
 numpy.squeeze if needed. Changing default behavior would probably  
 break old code, however.
>>> 
>>> See comments on Trac as well.
>> 
>> Your patch is better, but there is one thing I disagree with.
>> 808if X.ndim < ndmin:
>> 809if ndmin == 1:
>> 810X.shape = (X.size, )
>> 811elif ndmin == 2:
>> 812X.shape = (X.size, 1) 
>> The last line should be:
>> 812X.shape = (1, X.size) 
>> If someone wants a 2D array out, they would most likely expect a one-row 
>> file to come out as a one-row array, not the other way around. IMHO.
> 
> I think you are completely right for the test case with one row. More 
> generally though, 
> since a file of N rows and M columns is read into an array of shape (N, M), 
> ndmin=2 
> should enforce X.shape = (1, X.size) for single-row input, and X.shape = 
> (X.size, 1) 
> for single-column input.
> I thought this would be handled automatically by preserving the original 2 
> dimensions, 
> but apparently with single-row/multi-column input an extra dimension 1 is 
> prepended 
> when the array is returned from the parser. I've put up a fix for this at 
> 
> https://github.com/dhomeier/numpy/compare/master...ndmin-cols
> 
> and also tested the patch against 1.6.0.rc2.
> 
> Cheers,
>   Derek


Looks sensible to me at least!

But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for 
this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will 
reintroduce the 'transposed' problem?

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-04 Thread Derek Homeier
Hi Paul,

I've got back to your suggestion re. the ndmin flag for loadtxt from a few 
weeks ago...

On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote:

>>> 1562:
>>>  I attach a possible patch. This could also be the default  
>>> behavior to my mind, since the function caller can simply call  
>>> numpy.squeeze if needed. Changing default behavior would probably  
>>> break old code, however.
>> 
>> See comments on Trac as well.
> 
> Your patch is better, but there is one thing I disagree with.
> 808if X.ndim < ndmin:
> 809if ndmin == 1:
> 810X.shape = (X.size, )
> 811elif ndmin == 2:
> 812X.shape = (X.size, 1) 
> The last line should be:
> 812X.shape = (1, X.size) 
> If someone wants a 2D array out, they would most likely expect a one-row file 
> to come out as a one-row array, not the other way around. IMHO.

I think you are completely right for the test case with one row. More generally 
though, 
since a file of N rows and M columns is read into an array of shape (N, M), 
ndmin=2 
should enforce X.shape = (1, X.size) for single-row input, and X.shape = 
(X.size, 1) 
for single-column input.
I thought this would be handled automatically by preserving the original 2 
dimensions, 
but apparently with single-row/multi-column input an extra dimension 1 is 
prepended 
when the array is returned from the parser. I've put up a fix for this at 

https://github.com/dhomeier/numpy/compare/master...ndmin-cols

and also tested the patch against 1.6.0.rc2.

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion