Re: [Numpy-discussion] loadtxt ndmin option
On 31 May 2011, at 21:28, Pierre GM wrote: > On May 31, 2011, at 6:37 PM, Derek Homeier wrote: > >> On 31 May 2011, at 18:25, Pierre GM wrote: >> >>> On May 31, 2011, at 5:52 PM, Derek Homeier wrote: I think stuff like multiple delimiters should have been dealt with before, as the right place to insert the ndmin code (which includes the decision to squeeze or not to squeeze as well as to add additional dimensions, if required) would be right at the end before the 'unpack' switch, or rather replacing the bit: if usemask: output = output.view(MaskedArray) output._mask = outputmask if unpack: return output.squeeze().T return output.squeeze() But there it's already not clear to me how to deal with the MaskedArray case... >>> >>> Oh, easy. >>> You need to replace only the last three lines of genfromtxt with the >>> ones from loadtxt (808-833). Then, if usemask is True, you need to >>> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà. >>> Comments: >>> * I would raise an exception if ndmin isn't correct *before* trying >>> to read the file... >>> * You could define a `collapse_function` that would be >>> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on >>> the values of `usemask` and `ndmin`... Thanks, that helped to clean up a little bit. >>> If you have any question about numpy.ma, don't hesitate to contact >>> me directly. >> >> Thanks for the directions! I was not sure about the usemask case >> because it presently does not invoke .squeeze() either... > > The idea is that if `usemask` is True, you build a second array (the > mask), that you attach to your main array at the very end (in the > `output=output.view(MaskedArray), output._mask = mask` combo...). > Afterwards, it's a regular MaskedArray that supports the .squeeze() > method... > OK, in both cases output.squeeze() is now used if ndim>ndmin and usemask is False - at least it does not break any tests, so it seems to work with MaskedArrays as well. > >> On a >> possibly related note, genfromtxt also treats the 'unpack'ing of >> structured arrays differently from loadtxt (which returns a list of >> arrays in that case) - do you know if this is on purpose, or also >> rather missing functionality (I guess it might break >> recfromtxt()...)? > > Keep in mind that I haven't touched genfromtxt since 8-10 months or > so. I wouldn't be surprised that it were lagging a bit behind > loadtxt in terms of development. Yes, there'll be some tweaking to > do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the > defaults) and others, but nothing major. Well, at long last I got to implement the above and added the corresponding tests for genfromtxt - with the exception of the dimension-0 cases, since genfromtxt raises an error on empty files. There already is a comment it should perhaps better return an empty array, so I am putting that idea up for discussion here again. I tried to devise a very basic test with masked arrays, just added to test_withmissing now. I also implemented the same unpacking behaviour for structured arrays and just made recfromtxt set unpack=False to work (or should it issue a warning?). The patches are up for review as commit 8ac01636 in my iocnv-wildcard branch: https://github.com/dhomeier/numpy/compare/master...iocnv-wildcard Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
Derek Homeier wrote: > > Hi Chris, > > On 31 May 2011, at 13:56, cgraves wrote: > >> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin >> option, >> however neither genfromtxt nor recfromtxt, which use loadtxt, have it. >> Should they have inherited the option? Who can make it happen? > > you are mistaken, genfromtxt is not using loadtxt (and could not > possibly, since it has the more complex parser to handle missing > data); thus ndmin could not be inherited automatically. > It certainly would make sense to provide the same functionality for > genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so > I'd go ahead and file a feature (enhancement) request. I can't promise > I can take care of it myself, as I am less familiar with genfromtxt, > but I'd certainly have a look at it. > > Does anyone have an opinion whether this is a case for reopening (yet > again...) > http://projects.scipy.org/numpy/ticket/1562 > or create a new ticket? > Thanks Derek. That would be greatly appreciated! Based on the follow-up messages in this thread, it looks like (hopefully) there will not be too much additional work in implementing it. For now I'll just use the temporary fix, a .reshape(-1), on any recfromtxt's that might read in a single row of data.. Kind regards, Chris -- View this message in context: http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31769169.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On May 31, 2011, at 6:37 PM, Derek Homeier wrote: > On 31 May 2011, at 18:25, Pierre GM wrote: > >> On May 31, 2011, at 5:52 PM, Derek Homeier wrote: >>> >>> I think stuff like multiple delimiters should have been dealt with >>> before, as the right place to insert the ndmin code (which includes >>> the decision to squeeze or not to squeeze as well as to add >>> additional >>> dimensions, if required) would be right at the end before the >>> 'unpack' >>> switch, or rather replacing the bit: >>> >>> if usemask: >>> output = output.view(MaskedArray) >>> output._mask = outputmask >>> if unpack: >>> return output.squeeze().T >>> return output.squeeze() >>> >>> But there it's already not clear to me how to deal with the >>> MaskedArray case... >> >> Oh, easy. >> You need to replace only the last three lines of genfromtxt with the >> ones from loadtxt (808-833). Then, if usemask is True, you need to >> use ma.atleast_Xd instead of np.atleast_Xd. Et voilà. >> Comments: >> * I would raise an exception if ndmin isn't correct *before* trying >> to read the file... >> * You could define a `collapse_function` that would be >> `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on >> the values of `usemask` and `ndmin`... >> If you have any question about numpy.ma, don't hesitate to contact >> me directly. > > Thanks for the directions! I was not sure about the usemask case > because it presently does not invoke .squeeze() either... The idea is that if `usemask` is True, you build a second array (the mask), that you attach to your main array at the very end (in the `output=output.view(MaskedArray), output._mask = mask` combo...). Afterwards, it's a regular MaskedArray that supports the .squeeze() method... > On a > possibly related note, genfromtxt also treats the 'unpack'ing of > structured arrays differently from loadtxt (which returns a list of > arrays in that case) - do you know if this is on purpose, or also > rather missing functionality (I guess it might break recfromtxt()...)? Keep in mind that I haven't touched genfromtxt since 8-10 months or so. I wouldn't be surprised that it were lagging a bit behind loadtxt in terms of development. Yes, there'll be some tweaking to do for recfromtxt (it's OK for now if `ndmin` and `unpack` are the defaults) and others, but nothing major. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 31 May 2011, at 18:25, Pierre GM wrote: > On May 31, 2011, at 5:52 PM, Derek Homeier wrote: >> >> I think stuff like multiple delimiters should have been dealt with >> before, as the right place to insert the ndmin code (which includes >> the decision to squeeze or not to squeeze as well as to add >> additional >> dimensions, if required) would be right at the end before the >> 'unpack' >> switch, or rather replacing the bit: >> >>if usemask: >>output = output.view(MaskedArray) >>output._mask = outputmask >>if unpack: >>return output.squeeze().T >>return output.squeeze() >> >> But there it's already not clear to me how to deal with the >> MaskedArray case... > > Oh, easy. > You need to replace only the last three lines of genfromtxt with the > ones from loadtxt (808-833). Then, if usemask is True, you need to > use ma.atleast_Xd instead of np.atleast_Xd. Et voilà. > Comments: > * I would raise an exception if ndmin isn't correct *before* trying > to read the file... > * You could define a `collapse_function` that would be > `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on > the values of `usemask` and `ndmin`... > If you have any question about numpy.ma, don't hesitate to contact > me directly. Thanks for the directions! I was not sure about the usemask case because it presently does not invoke .squeeze() either... On a possibly related note, genfromtxt also treats the 'unpack'ing of structured arrays differently from loadtxt (which returns a list of arrays in that case) - do you know if this is on purpose, or also rather missing functionality (I guess it might break recfromtxt()...)? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On May 31, 2011, at 5:52 PM, Derek Homeier wrote: > > I think stuff like multiple delimiters should have been dealt with > before, as the right place to insert the ndmin code (which includes > the decision to squeeze or not to squeeze as well as to add additional > dimensions, if required) would be right at the end before the 'unpack' > switch, or rather replacing the bit: > > if usemask: > output = output.view(MaskedArray) > output._mask = outputmask > if unpack: > return output.squeeze().T > return output.squeeze() > > But there it's already not clear to me how to deal with the > MaskedArray case... Oh, easy. You need to replace only the last three lines of genfromtxt with the ones from loadtxt (808-833). Then, if usemask is True, you need to use ma.atleast_Xd instead of np.atleast_Xd. Et voilà. Comments: * I would raise an exception if ndmin isn't correct *before* trying to read the file... * You could define a `collapse_function` that would be `np.atleast_1d`, `np.atleast_2d`, `ma.atleast_1d`... depending on the values of `usemask` and `ndmin`... If you have any question about numpy.ma, don't hesitate to contact me directly. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 31 May 2011, at 17:45, Benjamin Root wrote: > > At this point, I wonder if it might be smarter to create > a .atleast_Nd() function and use that everywhere it is needed. > Having similar logic tailored for each loading function might be a > little dangerous if bug fixes are made to one, but not the others. Like a generalised version of .atleast_1d / .atleast_2d? It would also have to include an .atmost_Nd functionality of some sorts, to replace the .squeeze(), generally a good idea (e.g. something like np.atleast_Nd(X, ndmin=0, ndmax=-1), where the default is not to reduce the maximum number of dimensions...). But for the io routines the situation is a bit more complex, since different shapes are expected to be returned depending on the text input (e.g. (1, N) for a single row vs. (N, 1) for a single column of data). Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 31 May 2011, at 17:33, Bruce Southey wrote: >>> It certainly would make sense to provide the same functionality for >>> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), >>> so >>> I'd go ahead and file a feature (enhancement) request. I can't >>> promise >>> I can take care of it myself, as I am less familiar with genfromtxt, >>> but I'd certainly have a look at it. >> Oh, that shouldn't be too difficult: 'ndmin' tells whether the >> array must be squeezed before being returned, right ? You can add >> some test at the very end of genfromtxt to check what to do with >> the output (whether to squeeze it or not, whether to transpose it >> or not)... If you don't mind doing it, I'd be quite grateful (I >> don't have time to work on numpy these days, much to my regret). >> Don't forget to change the user manual as well... >> ___ >> NumPy-Discussion mailing list >> NumPy-Discussion@scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > (Different function so different ticket.) > > Sure you can change the end of the code but that may hide various > problem. Unlike loadtxt, genfromtxt has a lot of flexibility > especially > handling missing values and using converter functions. So I think that > some examples must be provided that can not be handled by providing a > suitable converter or that require multiple assumptions about input > file > (such as having more than one delimiter). I think stuff like multiple delimiters should have been dealt with before, as the right place to insert the ndmin code (which includes the decision to squeeze or not to squeeze as well as to add additional dimensions, if required) would be right at the end before the 'unpack' switch, or rather replacing the bit: if usemask: output = output.view(MaskedArray) output._mask = outputmask if unpack: return output.squeeze().T return output.squeeze() But there it's already not clear to me how to deal with the MaskedArray case... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Tue, May 31, 2011 at 10:33 AM, Bruce Southey wrote: > On 05/31/2011 10:18 AM, Pierre GM wrote: > > On May 31, 2011, at 4:53 PM, Derek Homeier wrote: > > > >> Hi Chris, > >> > >> On 31 May 2011, at 13:56, cgraves wrote: > >> > >>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin > >>> option, > >>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it. > >>> Should they have inherited the option? Who can make it happen? > >> you are mistaken, genfromtxt is not using loadtxt (and could not > >> possibly, since it has the more complex parser to handle missing > >> data); thus ndmin could not be inherited automatically. > >> It certainly would make sense to provide the same functionality for > >> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so > >> I'd go ahead and file a feature (enhancement) request. I can't promise > >> I can take care of it myself, as I am less familiar with genfromtxt, > >> but I'd certainly have a look at it. > > Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must > be squeezed before being returned, right ? You can add some test at the very > end of genfromtxt to check what to do with the output (whether to squeeze it > or not, whether to transpose it or not)... If you don't mind doing it, I'd > be quite grateful (I don't have time to work on numpy these days, much to my > regret). Don't forget to change the user manual as well... > > ___ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > (Different function so different ticket.) > > Sure you can change the end of the code but that may hide various > problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially > handling missing values and using converter functions. So I think that > some examples must be provided that can not be handled by providing a > suitable converter or that require multiple assumptions about input file > (such as having more than one delimiter). > > Bruce > At this point, I wonder if it might be smarter to create a .atleast_Nd() function and use that everywhere it is needed. Having similar logic tailored for each loading function might be a little dangerous if bug fixes are made to one, but not the others. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 05/31/2011 10:18 AM, Pierre GM wrote: > On May 31, 2011, at 4:53 PM, Derek Homeier wrote: > >> Hi Chris, >> >> On 31 May 2011, at 13:56, cgraves wrote: >> >>> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin >>> option, >>> however neither genfromtxt nor recfromtxt, which use loadtxt, have it. >>> Should they have inherited the option? Who can make it happen? >> you are mistaken, genfromtxt is not using loadtxt (and could not >> possibly, since it has the more complex parser to handle missing >> data); thus ndmin could not be inherited automatically. >> It certainly would make sense to provide the same functionality for >> genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so >> I'd go ahead and file a feature (enhancement) request. I can't promise >> I can take care of it myself, as I am less familiar with genfromtxt, >> but I'd certainly have a look at it. > Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be > squeezed before being returned, right ? You can add some test at the very end > of genfromtxt to check what to do with the output (whether to squeeze it or > not, whether to transpose it or not)... If you don't mind doing it, I'd be > quite grateful (I don't have time to work on numpy these days, much to my > regret). Don't forget to change the user manual as well... > ___ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion (Different function so different ticket.) Sure you can change the end of the code but that may hide various problem. Unlike loadtxt, genfromtxt has a lot of flexibility especially handling missing values and using converter functions. So I think that some examples must be provided that can not be handled by providing a suitable converter or that require multiple assumptions about input file (such as having more than one delimiter). Bruce ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On May 31, 2011, at 4:53 PM, Derek Homeier wrote: > Hi Chris, > > On 31 May 2011, at 13:56, cgraves wrote: > >> I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin >> option, >> however neither genfromtxt nor recfromtxt, which use loadtxt, have it. >> Should they have inherited the option? Who can make it happen? > > you are mistaken, genfromtxt is not using loadtxt (and could not > possibly, since it has the more complex parser to handle missing > data); thus ndmin could not be inherited automatically. > It certainly would make sense to provide the same functionality for > genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so > I'd go ahead and file a feature (enhancement) request. I can't promise > I can take care of it myself, as I am less familiar with genfromtxt, > but I'd certainly have a look at it. Oh, that shouldn't be too difficult: 'ndmin' tells whether the array must be squeezed before being returned, right ? You can add some test at the very end of genfromtxt to check what to do with the output (whether to squeeze it or not, whether to transpose it or not)... If you don't mind doing it, I'd be quite grateful (I don't have time to work on numpy these days, much to my regret). Don't forget to change the user manual as well... ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
Hi Chris, On 31 May 2011, at 13:56, cgraves wrote: > I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin > option, > however neither genfromtxt nor recfromtxt, which use loadtxt, have it. > Should they have inherited the option? Who can make it happen? you are mistaken, genfromtxt is not using loadtxt (and could not possibly, since it has the more complex parser to handle missing data); thus ndmin could not be inherited automatically. It certainly would make sense to provide the same functionality for genfromtxt (which should then be inherited by [nd,ma,rec]fromtxt), so I'd go ahead and file a feature (enhancement) request. I can't promise I can take care of it myself, as I am less familiar with genfromtxt, but I'd certainly have a look at it. Does anyone have an opinion whether this is a case for reopening (yet again...) http://projects.scipy.org/numpy/ticket/1562 or create a new ticket? Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
Ralf Gommers-2 wrote: > > On Fri, May 6, 2011 at 12:57 PM, Derek Homeier < > de...@astro.physik.uni-goettingen.de> wrote: > >> >> On 6 May 2011, at 07:53, Ralf Gommers wrote: >> >> > >> > >> Looks okay, and I agree that it's better to fix it now. The timing >> > >> is a bit unfortunate though, just after RC2. I'll have closer look >> > >> tomorrow and if it can go in, probably tag RC3. >> > >> >> > >> If in the meantime a few more people could test this, that would be >> > >> helpful. >> > >> >> > >> Ralf >> > > >> > > I agree, wish I had time to push this before rc2. I could add the >> > > explanatory comments >> > > mentioned above and switch to use the atleast_[12]d() solution, test >> > > that and push it >> > > in a couple of minutes, or should I better leave it as is now for >> > > testing? >> > >> > Quick follow-up: I just applied the above changes, added some tests to >> > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 >> > i386+ppc >> > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo >> > and do >> > my (first) pull request... >> > >> > Go ahead, I'll have a look at it tonight. Thanks for testing on >> > several Pythons, that definitely helps. >> >> >> Done, the request only appears on my repo >> https://github.com/dhomeier/numpy/ >> >> is that correct? If someone could test it on Linux and Windows as >> well... >> > > Committed, thanks for all the work. > > The pull request was in the wrong place, that's a minor flaw in the github > UI. After you press "Pull Request" you need to read the small print to see > where it's going. > > Cheers, > Ralf > > Dear all, I've downloaded the latest numpy (1.6.0) and loadtxt has the ndmin option, however neither genfromtxt nor recfromtxt, which use loadtxt, have it. Should they have inherited the option? Who can make it happen? Best, Chris -- View this message in context: http://old.nabble.com/loadtxt-savetxt-tickets-tp31238871p31740152.html Sent from the Numpy-discussion mailing list archive at Nabble.com. ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Fri, May 6, 2011 at 12:57 PM, Derek Homeier < de...@astro.physik.uni-goettingen.de> wrote: > > On 6 May 2011, at 07:53, Ralf Gommers wrote: > > > > > >> Looks okay, and I agree that it's better to fix it now. The timing > > >> is a bit unfortunate though, just after RC2. I'll have closer look > > >> tomorrow and if it can go in, probably tag RC3. > > >> > > >> If in the meantime a few more people could test this, that would be > > >> helpful. > > >> > > >> Ralf > > > > > > I agree, wish I had time to push this before rc2. I could add the > > > explanatory comments > > > mentioned above and switch to use the atleast_[12]d() solution, test > > > that and push it > > > in a couple of minutes, or should I better leave it as is now for > > > testing? > > > > Quick follow-up: I just applied the above changes, added some tests to > > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 > > i386+ppc > > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo > > and do > > my (first) pull request... > > > > Go ahead, I'll have a look at it tonight. Thanks for testing on > > several Pythons, that definitely helps. > > > Done, the request only appears on my repo > https://github.com/dhomeier/numpy/ > > is that correct? If someone could test it on Linux and Windows as > well... > Committed, thanks for all the work. The pull request was in the wrong place, that's a minor flaw in the github UI. After you press "Pull Request" you need to read the small print to see where it's going. Cheers, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 6 May 2011, at 07:53, Ralf Gommers wrote: > > >> Looks okay, and I agree that it's better to fix it now. The timing > >> is a bit unfortunate though, just after RC2. I'll have closer look > >> tomorrow and if it can go in, probably tag RC3. > >> > >> If in the meantime a few more people could test this, that would be > >> helpful. > >> > >> Ralf > > > > I agree, wish I had time to push this before rc2. I could add the > > explanatory comments > > mentioned above and switch to use the atleast_[12]d() solution, test > > that and push it > > in a couple of minutes, or should I better leave it as is now for > > testing? > > Quick follow-up: I just applied the above changes, added some tests to > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 > i386+ppc > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo > and do > my (first) pull request... > > Go ahead, I'll have a look at it tonight. Thanks for testing on > several Pythons, that definitely helps. Done, the request only appears on my repo https://github.com/dhomeier/numpy/ is that correct? If someone could test it on Linux and Windows as well... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Fri, May 6, 2011 at 12:12 AM, Derek Homeier < de...@astro.physik.uni-goettingen.de> wrote: > > On 5 May 2011, at 22:53, Derek Homeier wrote: > > >> > >> However, the problem that ndmin is supposed to address is not fixed > >> by the current implementation for the rc. Essentially, a single- > >> row, multi-column file with ndmin=2 comes out as a Nx1 array which > >> is the same result for a multi-row, single-column file. My feeling > >> is that if we let the current implementation stand as is, and > >> developers use it in their code, then fixing it in a later release > >> would introduce more problems (maybe the devels would transpose the > >> result themselves or something). Better to fix it now in rc with > >> the two lines of code (and the correction to the tests), then to > >> introduce a buggy feature that will be hard to fix in future > >> releases, IMHO. > >> > >> Looks okay, and I agree that it's better to fix it now. The timing > >> is a bit unfortunate though, just after RC2. I'll have closer look > >> tomorrow and if it can go in, probably tag RC3. > >> > >> If in the meantime a few more people could test this, that would be > >> helpful. > >> > >> Ralf > > > > I agree, wish I had time to push this before rc2. I could add the > > explanatory comments > > mentioned above and switch to use the atleast_[12]d() solution, test > > that and push it > > in a couple of minutes, or should I better leave it as is now for > > testing? > > Quick follow-up: I just applied the above changes, added some tests to > cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 > i386+ppc > + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo > and do > my (first) pull request... > Go ahead, I'll have a look at it tonight. Thanks for testing on several Pythons, that definitely helps. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 5 May 2011, at 22:53, Derek Homeier wrote: >> >> However, the problem that ndmin is supposed to address is not fixed >> by the current implementation for the rc. Essentially, a single- >> row, multi-column file with ndmin=2 comes out as a Nx1 array which >> is the same result for a multi-row, single-column file. My feeling >> is that if we let the current implementation stand as is, and >> developers use it in their code, then fixing it in a later release >> would introduce more problems (maybe the devels would transpose the >> result themselves or something). Better to fix it now in rc with >> the two lines of code (and the correction to the tests), then to >> introduce a buggy feature that will be hard to fix in future >> releases, IMHO. >> >> Looks okay, and I agree that it's better to fix it now. The timing >> is a bit unfortunate though, just after RC2. I'll have closer look >> tomorrow and if it can go in, probably tag RC3. >> >> If in the meantime a few more people could test this, that would be >> helpful. >> >> Ralf > > I agree, wish I had time to push this before rc2. I could add the > explanatory comments > mentioned above and switch to use the atleast_[12]d() solution, test > that and push it > in a couple of minutes, or should I better leave it as is now for > testing? Quick follow-up: I just applied the above changes, added some tests to cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 i386+ppc + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo and do my (first) pull request... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 5 May 2011, at 22:31, Ralf Gommers wrote: > > On Thu, May 5, 2011 at 9:46 PM, Benjamin Root wrote: > > > On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers > wrote: > > > On Thu, May 5, 2011 at 9:18 PM, Benjamin Root wrote: > > > On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes > > wrote: > > On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > > > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes > > > wrote: > > > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier > > > > wrote: > > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions > written for this? Shouldn't we reuse them? Perhaps it's overkill, > and perhaps it will reintroduce the 'transposed' problem? > > > > > > Yes, good point, one could replace the > > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > > but for the ndmin=2 case, we'd need to replace > > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > > not sure which solution is more efficient in terms of memory > access etc... > > > > > > Cheers, > > >Derek > > > > > > > > > I can confirm that the current behavior is not sufficient for > all of the original corner cases that ndmin was supposed to > address. Keep in mind that np.loadtxt takes a one-column data file > and a one-row data file down to the same shape. I don't see how the > current code is able to produce the correct array shape when > ndmin=2. Do we have some sort of counter in loadtxt for counting > the number of rows and columns read? Could we use those to help > guide the ndmin=2 case? > > > > > > I think that using atleast_1d(X) might be a bit overkill, but it > would be very clear as to the code's intent. I don't think we have > to worry about memory usage if we limit its use to only situations > where ndmin is greater than the number of dimensions of the array. > In those cases, the array is either an empty result, a scalar value > (in which memory access is trivial), or 1-d (in which a transpose is > cheap). > > > > What if one does things the other way around - avoid calling > squeeze until _after_ doing the atleast_Nd() magic? That way the row/ > column information should be conserved, right? Also, we avoid > transposing, memory use, ... > > > > Oh, and someone could conceivably have a _looong_ 1D file, but > would want it read as a 2D array. > > > > Paul > > > > > > > > @Derek, good catch with noticing the error in the tests. We do > still need to handle the case I mentioned, however. I have attached > an example script to demonstrate the issue. In this script, I would > expect the second-to-last array to be a shape of (1, 5). I believe > that the single-row, multi-column case would actually be the more > common type of edge-case encountered by users than the others. > Therefore, I believe that this ndmin fix is not adequate until this > is addressed. > > > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. > That would just undo whatever we had just done. Also, wrt the > transpose, a (1, 10) array looks the same in memory as a > (10, 1) array, right? > Agree. I thought more along the lines of (pseudocode-ish) > if ndmin == 0: >squeeze() > if ndmin == 1: >atleast_1D() > elif ndmin == 2: >atleast_2D() > else: >I don't rightly know what would go here, maybe raise > ValueError? > > That would avoid the squeeze call before the atleast_Nd magic. But > the code was changed, so I think my comment doesn't make sense > anymore. It's probably fine the way it is! > > Paul > > > I have thought of that too, but the problem with that approach is > that after reading the file, X will have 2 or 3 dimensions, > regardless of how many singleton dims were in the file. A squeeze > will always be needed. Also, the purpose of squeeze is opposite > that of the atleast_*d() functions: squeeze reduces dimensions, > while atleast_*d will add dimensions. > > Therefore, I re-iterate... the patch by Derek gets the job done. I > have tested it for a wide variety of inputs for both regular arrays > and record arrays. Is there room for improvements? Yes, but I > think that can wait for later. Derek's patch however fixes an > important bug in the ndmin implementation and should be included for > the release. > Thanks for the additional tests and discussion, Paul and Ben! Yes, I think the main issue is that multi-column input first comes back as 3- dimensional, with a "singleton" dimension in front, so I was getting rid of that first - maybe could add a comment on that. Single-column, multi-row apparently already come back as 1-dimensional, so we only need the expansion there. > Two questions: can you point me to the patch/ticket, and is this a > regression? > > Thanks, > Ralf > This wa
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 9:46 PM, Benjamin Root wrote: > > > On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers > wrote: > >> >> >> On Thu, May 5, 2011 at 9:18 PM, Benjamin Root wrote: >> >>> >>> >>> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < >>> paul.anton.let...@gmail.com> wrote: >>> On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < paul.anton.let...@gmail.com> wrote: > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < de...@astro.physik.uni-goettingen.de> wrote: > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? > > > > Yes, good point, one could replace the > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > but for the ndmin=2 case, we'd need to replace > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > not sure which solution is more efficient in terms of memory access etc... > > > > Cheers, > >Derek > > > > > > I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? > > > > I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). > > What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... > > Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. > > Paul > > > > @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul >>> I have thought of that too, but the problem with that approach is that >>> after reading the file, X will have 2 or 3 dimensions, regardless of how >>> many singleton dims were in the file. A squeeze will always be needed. >>> Also, the purpose of squeeze is opposite that of the atleast_*d() >>> functions: squeeze reduces dimensions, while atleast_*d will add >>> dimensions. >>> >>> Therefore, I re-iterate... the patch by Derek gets the job done. I have >>> tested it for a wide variety of inputs for both regular arrays and record >>> arrays. Is there room for improvements? Yes, but I think that can wait for >>> later. Derek's patch however fixes an important bug in the ndmin >>> implementation and should be included for the release. >>> >>> Two questions: can you point me to the patch/ticket, and is this a >> regression? >> >> Thanks, >> Ralf >> >> >> > I don't know if he did a pull-request or not, but here is the link he > provided earlier in the thread. > > https://github.com/
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers wrote: > > > On Thu, May 5, 2011 at 9:18 PM, Benjamin Root wrote: > >> >> >> On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < >> paul.anton.let...@gmail.com> wrote: >> >>> >>> On 5. mai 2011, at 08.49, Benjamin Root wrote: >>> >>> > >>> > >>> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < >>> paul.anton.let...@gmail.com> wrote: >>> > >>> > On 4. mai 2011, at 20.33, Benjamin Root wrote: >>> > >>> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < >>> de...@astro.physik.uni-goettingen.de> wrote: >>> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: >>> > > >>> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions >>> written for this? Shouldn't we reuse them? Perhaps it's overkill, and >>> perhaps it will reintroduce the 'transposed' problem? >>> > > >>> > > Yes, good point, one could replace the >>> > > X.shape = (X.size, ) with X = np.atleast_1d(X), >>> > > but for the ndmin=2 case, we'd need to replace >>> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - >>> > > not sure which solution is more efficient in terms of memory access >>> etc... >>> > > >>> > > Cheers, >>> > >Derek >>> > > >>> > > >>> > > I can confirm that the current behavior is not sufficient for all of >>> the original corner cases that ndmin was supposed to address. Keep in mind >>> that np.loadtxt takes a one-column data file and a one-row data file down to >>> the same shape. I don't see how the current code is able to produce the >>> correct array shape when ndmin=2. Do we have some sort of counter in >>> loadtxt for counting the number of rows and columns read? Could we use >>> those to help guide the ndmin=2 case? >>> > > >>> > > I think that using atleast_1d(X) might be a bit overkill, but it >>> would be very clear as to the code's intent. I don't think we have to worry >>> about memory usage if we limit its use to only situations where ndmin is >>> greater than the number of dimensions of the array. In those cases, the >>> array is either an empty result, a scalar value (in which memory access is >>> trivial), or 1-d (in which a transpose is cheap). >>> > >>> > What if one does things the other way around - avoid calling squeeze >>> until _after_ doing the atleast_Nd() magic? That way the row/column >>> information should be conserved, right? Also, we avoid transposing, memory >>> use, ... >>> > >>> > Oh, and someone could conceivably have a _looong_ 1D file, but would >>> want it read as a 2D array. >>> > >>> > Paul >>> > >>> > >>> > >>> > @Derek, good catch with noticing the error in the tests. We do still >>> need to handle the case I mentioned, however. I have attached an example >>> script to demonstrate the issue. In this script, I would expect the >>> second-to-last array to be a shape of (1, 5). I believe that the >>> single-row, multi-column case would actually be the more common type of >>> edge-case encountered by users than the others. Therefore, I believe that >>> this ndmin fix is not adequate until this is addressed. >>> > >>> > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That >>> would just undo whatever we had just done. Also, wrt the transpose, a (1, >>> 10) array looks the same in memory as a (10, 1) array, right? >>> Agree. I thought more along the lines of (pseudocode-ish) >>> if ndmin == 0: >>>squeeze() >>> if ndmin == 1: >>>atleast_1D() >>> elif ndmin == 2: >>>atleast_2D() >>> else: >>>I don't rightly know what would go here, maybe raise ValueError? >>> >>> That would avoid the squeeze call before the atleast_Nd magic. But the >>> code was changed, so I think my comment doesn't make sense anymore. It's >>> probably fine the way it is! >>> >>> Paul >>> >>> >> I have thought of that too, but the problem with that approach is that >> after reading the file, X will have 2 or 3 dimensions, regardless of how >> many singleton dims were in the file. A squeeze will always be needed. >> Also, the purpose of squeeze is opposite that of the atleast_*d() >> functions: squeeze reduces dimensions, while atleast_*d will add >> dimensions. >> >> Therefore, I re-iterate... the patch by Derek gets the job done. I have >> tested it for a wide variety of inputs for both regular arrays and record >> arrays. Is there room for improvements? Yes, but I think that can wait for >> later. Derek's patch however fixes an important bug in the ndmin >> implementation and should be included for the release. >> >> Two questions: can you point me to the patch/ticket, and is this a > regression? > > Thanks, > Ralf > > > I don't know if he did a pull-request or not, but here is the link he provided earlier in the thread. https://github.com/dhomeier/numpy/compare/master...ndmin-cols Technically, this is not a "regression" as the ndmin feature is new in this release. However, the problem that ndmin is supposed to address is not fixed
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root wrote: > > > On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < > paul.anton.let...@gmail.com> wrote: > >> >> On 5. mai 2011, at 08.49, Benjamin Root wrote: >> >> > >> > >> > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < >> paul.anton.let...@gmail.com> wrote: >> > >> > On 4. mai 2011, at 20.33, Benjamin Root wrote: >> > >> > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < >> de...@astro.physik.uni-goettingen.de> wrote: >> > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: >> > > >> > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions >> written for this? Shouldn't we reuse them? Perhaps it's overkill, and >> perhaps it will reintroduce the 'transposed' problem? >> > > >> > > Yes, good point, one could replace the >> > > X.shape = (X.size, ) with X = np.atleast_1d(X), >> > > but for the ndmin=2 case, we'd need to replace >> > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - >> > > not sure which solution is more efficient in terms of memory access >> etc... >> > > >> > > Cheers, >> > >Derek >> > > >> > > >> > > I can confirm that the current behavior is not sufficient for all of >> the original corner cases that ndmin was supposed to address. Keep in mind >> that np.loadtxt takes a one-column data file and a one-row data file down to >> the same shape. I don't see how the current code is able to produce the >> correct array shape when ndmin=2. Do we have some sort of counter in >> loadtxt for counting the number of rows and columns read? Could we use >> those to help guide the ndmin=2 case? >> > > >> > > I think that using atleast_1d(X) might be a bit overkill, but it would >> be very clear as to the code's intent. I don't think we have to worry about >> memory usage if we limit its use to only situations where ndmin is greater >> than the number of dimensions of the array. In those cases, the array is >> either an empty result, a scalar value (in which memory access is trivial), >> or 1-d (in which a transpose is cheap). >> > >> > What if one does things the other way around - avoid calling squeeze >> until _after_ doing the atleast_Nd() magic? That way the row/column >> information should be conserved, right? Also, we avoid transposing, memory >> use, ... >> > >> > Oh, and someone could conceivably have a _looong_ 1D file, but would >> want it read as a 2D array. >> > >> > Paul >> > >> > >> > >> > @Derek, good catch with noticing the error in the tests. We do still >> need to handle the case I mentioned, however. I have attached an example >> script to demonstrate the issue. In this script, I would expect the >> second-to-last array to be a shape of (1, 5). I believe that the >> single-row, multi-column case would actually be the more common type of >> edge-case encountered by users than the others. Therefore, I believe that >> this ndmin fix is not adequate until this is addressed. >> > >> > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That >> would just undo whatever we had just done. Also, wrt the transpose, a (1, >> 10) array looks the same in memory as a (10, 1) array, right? >> Agree. I thought more along the lines of (pseudocode-ish) >> if ndmin == 0: >>squeeze() >> if ndmin == 1: >>atleast_1D() >> elif ndmin == 2: >>atleast_2D() >> else: >>I don't rightly know what would go here, maybe raise ValueError? >> >> That would avoid the squeeze call before the atleast_Nd magic. But the >> code was changed, so I think my comment doesn't make sense anymore. It's >> probably fine the way it is! >> >> Paul >> >> > I have thought of that too, but the problem with that approach is that > after reading the file, X will have 2 or 3 dimensions, regardless of how > many singleton dims were in the file. A squeeze will always be needed. > Also, the purpose of squeeze is opposite that of the atleast_*d() > functions: squeeze reduces dimensions, while atleast_*d will add > dimensions. > > Therefore, I re-iterate... the patch by Derek gets the job done. I have > tested it for a wide variety of inputs for both regular arrays and record > arrays. Is there room for improvements? Yes, but I think that can wait for > later. Derek's patch however fixes an important bug in the ndmin > implementation and should be included for the release. > > Two questions: can you point me to the patch/ticket, and is this a regression? Thanks, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes < paul.anton.let...@gmail.com> wrote: > > On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > > > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < > paul.anton.let...@gmail.com> wrote: > > > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < > de...@astro.physik.uni-goettingen.de> wrote: > > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions > written for this? Shouldn't we reuse them? Perhaps it's overkill, and > perhaps it will reintroduce the 'transposed' problem? > > > > > > Yes, good point, one could replace the > > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > > but for the ndmin=2 case, we'd need to replace > > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > > not sure which solution is more efficient in terms of memory access > etc... > > > > > > Cheers, > > >Derek > > > > > > > > > I can confirm that the current behavior is not sufficient for all of > the original corner cases that ndmin was supposed to address. Keep in mind > that np.loadtxt takes a one-column data file and a one-row data file down to > the same shape. I don't see how the current code is able to produce the > correct array shape when ndmin=2. Do we have some sort of counter in > loadtxt for counting the number of rows and columns read? Could we use > those to help guide the ndmin=2 case? > > > > > > I think that using atleast_1d(X) might be a bit overkill, but it would > be very clear as to the code's intent. I don't think we have to worry about > memory usage if we limit its use to only situations where ndmin is greater > than the number of dimensions of the array. In those cases, the array is > either an empty result, a scalar value (in which memory access is trivial), > or 1-d (in which a transpose is cheap). > > > > What if one does things the other way around - avoid calling squeeze > until _after_ doing the atleast_Nd() magic? That way the row/column > information should be conserved, right? Also, we avoid transposing, memory > use, ... > > > > Oh, and someone could conceivably have a _looong_ 1D file, but would want > it read as a 2D array. > > > > Paul > > > > > > > > @Derek, good catch with noticing the error in the tests. We do still need > to handle the case I mentioned, however. I have attached an example script > to demonstrate the issue. In this script, I would expect the second-to-last > array to be a shape of (1, 5). I believe that the single-row, multi-column > case would actually be the more common type of edge-case encountered by > users than the others. Therefore, I believe that this ndmin fix is not > adequate until this is addressed. > > > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That > would just undo whatever we had just done. Also, wrt the transpose, a (1, > 10) array looks the same in memory as a (10, 1) array, right? > Agree. I thought more along the lines of (pseudocode-ish) > if ndmin == 0: >squeeze() > if ndmin == 1: >atleast_1D() > elif ndmin == 2: >atleast_2D() > else: >I don't rightly know what would go here, maybe raise ValueError? > > That would avoid the squeeze call before the atleast_Nd magic. But the code > was changed, so I think my comment doesn't make sense anymore. It's probably > fine the way it is! > > Paul > > I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 5. mai 2011, at 08.49, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes > wrote: > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier > > wrote: > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written > > > for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it > > > will reintroduce the 'transposed' problem? > > > > Yes, good point, one could replace the > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > but for the ndmin=2 case, we'd need to replace > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > not sure which solution is more efficient in terms of memory access etc... > > > > Cheers, > >Derek > > > > > > I can confirm that the current behavior is not sufficient for all of the > > original corner cases that ndmin was supposed to address. Keep in mind > > that np.loadtxt takes a one-column data file and a one-row data file down > > to the same shape. I don't see how the current code is able to produce the > > correct array shape when ndmin=2. Do we have some sort of counter in > > loadtxt for counting the number of rows and columns read? Could we use > > those to help guide the ndmin=2 case? > > > > I think that using atleast_1d(X) might be a bit overkill, but it would be > > very clear as to the code's intent. I don't think we have to worry about > > memory usage if we limit its use to only situations where ndmin is greater > > than the number of dimensions of the array. In those cases, the array is > > either an empty result, a scalar value (in which memory access is trivial), > > or 1-d (in which a transpose is cheap). > > What if one does things the other way around - avoid calling squeeze until > _after_ doing the atleast_Nd() magic? That way the row/column information > should be conserved, right? Also, we avoid transposing, memory use, ... > > Oh, and someone could conceivably have a _looong_ 1D file, but would want it > read as a 2D array. > > Paul > > > > @Derek, good catch with noticing the error in the tests. We do still need to > handle the case I mentioned, however. I have attached an example script to > demonstrate the issue. In this script, I would expect the second-to-last > array to be a shape of (1, 5). I believe that the single-row, multi-column > case would actually be the more common type of edge-case encountered by users > than the others. Therefore, I believe that this ndmin fix is not adequate > until this is addressed. > > @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would > just undo whatever we had just done. Also, wrt the transpose, a (1, 10) > array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 10:49 AM, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < > paul.anton.let...@gmail.com> wrote: > >> >> On 4. mai 2011, at 20.33, Benjamin Root wrote: >> >> > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < >> de...@astro.physik.uni-goettingen.de> wrote: >> > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: >> > >> > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written >> for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it >> will reintroduce the 'transposed' problem? >> > >> > Yes, good point, one could replace the >> > X.shape = (X.size, ) with X = np.atleast_1d(X), >> > but for the ndmin=2 case, we'd need to replace >> > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - >> > not sure which solution is more efficient in terms of memory access >> etc... >> > >> > Cheers, >> >Derek >> > >> > >> > I can confirm that the current behavior is not sufficient for all of the >> original corner cases that ndmin was supposed to address. Keep in mind that >> np.loadtxt takes a one-column data file and a one-row data file down to the >> same shape. I don't see how the current code is able to produce the correct >> array shape when ndmin=2. Do we have some sort of counter in loadtxt for >> counting the number of rows and columns read? Could we use those to help >> guide the ndmin=2 case? >> > >> > I think that using atleast_1d(X) might be a bit overkill, but it would >> be very clear as to the code's intent. I don't think we have to worry about >> memory usage if we limit its use to only situations where ndmin is greater >> than the number of dimensions of the array. In those cases, the array is >> either an empty result, a scalar value (in which memory access is trivial), >> or 1-d (in which a transpose is cheap). >> >> What if one does things the other way around - avoid calling squeeze until >> _after_ doing the atleast_Nd() magic? That way the row/column information >> should be conserved, right? Also, we avoid transposing, memory use, ... >> >> Oh, and someone could conceivably have a _looong_ 1D file, but would want >> it read as a 2D array. >> >> Paul >> >> >> > @Derek, good catch with noticing the error in the tests. We do still need > to handle the case I mentioned, however. I have attached an example script > to demonstrate the issue. In this script, I would expect the second-to-last > array to be a shape of (1, 5). I believe that the single-row, multi-column > case would actually be the more common type of edge-case encountered by > users than the others. Therefore, I believe that this ndmin fix is not > adequate until this is addressed. > > Apologies Derek, your patch does address the issue I raised. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes < paul.anton.let...@gmail.com> wrote: > > On 4. mai 2011, at 20.33, Benjamin Root wrote: > > > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < > de...@astro.physik.uni-goettingen.de> wrote: > > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written > for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it > will reintroduce the 'transposed' problem? > > > > Yes, good point, one could replace the > > X.shape = (X.size, ) with X = np.atleast_1d(X), > > but for the ndmin=2 case, we'd need to replace > > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > > not sure which solution is more efficient in terms of memory access > etc... > > > > Cheers, > >Derek > > > > > > I can confirm that the current behavior is not sufficient for all of the > original corner cases that ndmin was supposed to address. Keep in mind that > np.loadtxt takes a one-column data file and a one-row data file down to the > same shape. I don't see how the current code is able to produce the correct > array shape when ndmin=2. Do we have some sort of counter in loadtxt for > counting the number of rows and columns read? Could we use those to help > guide the ndmin=2 case? > > > > I think that using atleast_1d(X) might be a bit overkill, but it would be > very clear as to the code's intent. I don't think we have to worry about > memory usage if we limit its use to only situations where ndmin is greater > than the number of dimensions of the array. In those cases, the array is > either an empty result, a scalar value (in which memory access is trivial), > or 1-d (in which a transpose is cheap). > > What if one does things the other way around - avoid calling squeeze until > _after_ doing the atleast_Nd() magic? That way the row/column information > should be conserved, right? Also, we avoid transposing, memory use, ... > > Oh, and someone could conceivably have a _looong_ 1D file, but would want > it read as a 2D array. > > Paul > > > @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Ben Root loadtest.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 4. mai 2011, at 20.33, Benjamin Root wrote: > On Wed, May 4, 2011 at 7:54 PM, Derek Homeier > wrote: > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for > > this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will > > reintroduce the 'transposed' problem? > > Yes, good point, one could replace the > X.shape = (X.size, ) with X = np.atleast_1d(X), > but for the ndmin=2 case, we'd need to replace > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > not sure which solution is more efficient in terms of memory access etc... > > Cheers, >Derek > > > I can confirm that the current behavior is not sufficient for all of the > original corner cases that ndmin was supposed to address. Keep in mind that > np.loadtxt takes a one-column data file and a one-row data file down to the > same shape. I don't see how the current code is able to produce the correct > array shape when ndmin=2. Do we have some sort of counter in loadtxt for > counting the number of rows and columns read? Could we use those to help > guide the ndmin=2 case? > > I think that using atleast_1d(X) might be a bit overkill, but it would be > very clear as to the code's intent. I don't think we have to worry about > memory usage if we limit its use to only situations where ndmin is greater > than the number of dimensions of the array. In those cases, the array is > either an empty result, a scalar value (in which memory access is trivial), > or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Wed, May 4, 2011 at 7:54 PM, Derek Homeier < de...@astro.physik.uni-goettingen.de> wrote: > On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > > > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written > for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it > will reintroduce the 'transposed' problem? > > Yes, good point, one could replace the > X.shape = (X.size, ) with X = np.atleast_1d(X), > but for the ndmin=2 case, we'd need to replace > X.shape = (X.size, 1) with X = np.atleast_2d(X).T - > not sure which solution is more efficient in terms of memory access etc... > > Cheers, > Derek > > I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: > But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for > this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will > reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 4. mai 2011, at 17.34, Derek Homeier wrote: > Hi Paul, > > I've got back to your suggestion re. the ndmin flag for loadtxt from a few > weeks ago... > > On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote: > 1562: I attach a possible patch. This could also be the default behavior to my mind, since the function caller can simply call numpy.squeeze if needed. Changing default behavior would probably break old code, however. >>> >>> See comments on Trac as well. >> >> Your patch is better, but there is one thing I disagree with. >> 808if X.ndim < ndmin: >> 809if ndmin == 1: >> 810X.shape = (X.size, ) >> 811elif ndmin == 2: >> 812X.shape = (X.size, 1) >> The last line should be: >> 812X.shape = (1, X.size) >> If someone wants a 2D array out, they would most likely expect a one-row >> file to come out as a one-row array, not the other way around. IMHO. > > I think you are completely right for the test case with one row. More > generally though, > since a file of N rows and M columns is read into an array of shape (N, M), > ndmin=2 > should enforce X.shape = (1, X.size) for single-row input, and X.shape = > (X.size, 1) > for single-column input. > I thought this would be handled automatically by preserving the original 2 > dimensions, > but apparently with single-row/multi-column input an extra dimension 1 is > prepended > when the array is returned from the parser. I've put up a fix for this at > > https://github.com/dhomeier/numpy/compare/master...ndmin-cols > > and also tested the patch against 1.6.0.rc2. > > Cheers, > Derek Looks sensible to me at least! But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Paul ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
Hi Paul, I've got back to your suggestion re. the ndmin flag for loadtxt from a few weeks ago... On 27.03.2011, at 12:09PM, Paul Anton Letnes wrote: >>> 1562: >>> I attach a possible patch. This could also be the default >>> behavior to my mind, since the function caller can simply call >>> numpy.squeeze if needed. Changing default behavior would probably >>> break old code, however. >> >> See comments on Trac as well. > > Your patch is better, but there is one thing I disagree with. > 808if X.ndim < ndmin: > 809if ndmin == 1: > 810X.shape = (X.size, ) > 811elif ndmin == 2: > 812X.shape = (X.size, 1) > The last line should be: > 812X.shape = (1, X.size) > If someone wants a 2D array out, they would most likely expect a one-row file > to come out as a one-row array, not the other way around. IMHO. I think you are completely right for the test case with one row. More generally though, since a file of N rows and M columns is read into an array of shape (N, M), ndmin=2 should enforce X.shape = (1, X.size) for single-row input, and X.shape = (X.size, 1) for single-column input. I thought this would be handled automatically by preserving the original 2 dimensions, but apparently with single-row/multi-column input an extra dimension 1 is prepended when the array is returned from the parser. I've put up a fix for this at https://github.com/dhomeier/numpy/compare/master...ndmin-cols and also tested the patch against 1.6.0.rc2. Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion