Re: [Numpy-discussion] optimizing ndarray.__setitem__
On Wed, May 4, 2011 at 6:19 AM, Christoph Groth c...@falma.de wrote: Dear numpy experts, I have noticed that with Numpy 1.5.1 the operation m[::2] += 1.0 takes twice as long as t = m[::2] t += 1.0 Mark Wiebe mwwi...@gmail.com writes: You'd better time this in 1.6 too. ;) https://github.com/numpy/numpy/commit/f60797ba64ccf33597225d23b893b6eb11149860 This seems to be exactly what I had in mind. Thanks for finding this. The case of boolean mask indexing can't benefit so easily from this optimization, but I think could see a big performance benefit if combined __index__ + __iop__ operators were added to Python. Something to consider, anyway. Has something like __index_iadd__ ever been considered seriously? Not to my (limited) knowledge. Indeed, the second loop executes twice as fast than the first in the following example (again with Numpy 1.5.1). import numpy m = numpy.zeros((1000, 1000)) mask = numpy.arange(0, 1000, 2, dtype=int) for i in xrange(40): m[mask] += 1.0 for i in xrange(40): t = m[mask] t += 1.0 But wouldn't it be easy to optimize this as well, by not executing assignments where the source and the destination is indexed by the same mask object? This would be a bit weaker, as it would work only for same (a is b), and not for equal masks, but it should still cover the most common case. Christoph ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Ben Root loadtest.py Description: Binary data ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 10:49 AM, Benjamin Root ben.r...@ou.edu wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. Apologies Derek, your patch does address the issue I raised. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimizing ndarray.__setitem__
On Thu, May 5, 2011 at 02:29, Christoph Groth c...@falma.de wrote: On Wed, May 4, 2011 at 6:19 AM, Christoph Groth c...@falma.de wrote: Dear numpy experts, I have noticed that with Numpy 1.5.1 the operation m[::2] += 1.0 takes twice as long as t = m[::2] t += 1.0 Mark Wiebe mwwi...@gmail.com writes: You'd better time this in 1.6 too. ;) https://github.com/numpy/numpy/commit/f60797ba64ccf33597225d23b893b6eb11149860 This seems to be exactly what I had in mind. Thanks for finding this. The case of boolean mask indexing can't benefit so easily from this optimization, but I think could see a big performance benefit if combined __index__ + __iop__ operators were added to Python. Something to consider, anyway. Has something like __index_iadd__ ever been considered seriously? Not to my (limited) knowledge. Only on this list, I think. :-) I don't think it will ever happen. Only numpy really cares about it, and adding another __special__ method for each __iop__ is a lot of additional methods that need to be supported. Indeed, the second loop executes twice as fast than the first in the following example (again with Numpy 1.5.1). import numpy m = numpy.zeros((1000, 1000)) mask = numpy.arange(0, 1000, 2, dtype=int) for i in xrange(40): m[mask] += 1.0 for i in xrange(40): t = m[mask] t += 1.0 But wouldn't it be easy to optimize this as well, by not executing assignments where the source and the destination is indexed by the same mask object? No. These two are not semantically equivalent. Your second example does not actually modify m. For integer and bool mask arrays, m[mask] necessarily makes a copy, so when you modify t via inplace addition, you have only modified t and not m. The assignment back to m[mask] is necessary. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] optimizing ndarray.__setitem__
On Thu, May 5, 2011 at 9:54 AM, Robert Kern robert.k...@gmail.com wrote: On Thu, May 5, 2011 at 02:29, Christoph Groth c...@falma.de wrote: Has something like __index_iadd__ ever been considered seriously? Not to my (limited) knowledge. Only on this list, I think. :-) I don't think it will ever happen. Only numpy really cares about it, and adding another __special__ method for each __iop__ is a lot of additional methods that need to be supported. Maybe in the context of PyPy someone will come up with a clever way to implement template-expression style operator fusion for numpy. That'd be kinda neat. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On 5. mai 2011, at 08.49, Benjamin Root wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numpy 1.6.0 release candidate 2
On Thu, May 5, 2011 at 1:10 AM, Benjamin Root ben.r...@ou.edu wrote: On Tue, May 3, 2011 at 1:18 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: Hi, I am pleased to announce the availability of the second release candidate of NumPy 1.6.0. Compared to the first release candidate, one segfault on (32-bit Windows + MSVC) and several memory leaks were fixed. If no new problems are reported, the final release will be in one week. Sources and binaries can be found at http://sourceforge.net/projects/numpy/files/NumPy/1.6.0rc2/ For (preliminary) release notes see below. Enjoy, Ralf Minor issue I just noticed on my recently installed Ubuntu 11.04 machine. The setup script is making a call to 'svnversion'. Doesn't impact the build or anything, but I only noticed it because svn hasn't been installed yet on that machine. Don't know if it is something that ought to be cleaned up or not. That's harmless and better left alone for backwards compatibility I think. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 5. mai 2011, at 08.49, Benjamin Root wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Ben Root ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 5. mai 2011, at 08.49, Benjamin Root wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Two questions: can you point me to the patch/ticket, and is this a regression? Thanks, Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 5. mai 2011, at 08.49, Benjamin Root wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Two questions: can you point me to the patch/ticket, and is this a regression? Thanks, Ralf I don't know if he did a pull-request or not, but here is the link he provided earlier in the thread. https://github.com/dhomeier/numpy/compare/master...ndmin-cols Technically, this is not a regression as the ndmin feature is new in this release. However, the problem that ndmin is supposed to address is not fixed by the current implementation for the rc. Essentially, a single-row, multi-column file with ndmin=2 comes out as a Nx1 array which is the same result for a multi-row, single-column file. My feeling is that if we let the current implementation stand as is, and developers use it in their code, then fixing it in a later
Re: [Numpy-discussion] loadtxt ndmin option
On Thu, May 5, 2011 at 9:46 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote: On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote: On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 5. mai 2011, at 08.49, Benjamin Root wrote: On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes paul.anton.let...@gmail.com wrote: On 4. mai 2011, at 20.33, Benjamin Root wrote: On Wed, May 4, 2011 at 7:54 PM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote: But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it will reintroduce the 'transposed' problem? Yes, good point, one could replace the X.shape = (X.size, ) with X = np.atleast_1d(X), but for the ndmin=2 case, we'd need to replace X.shape = (X.size, 1) with X = np.atleast_2d(X).T - not sure which solution is more efficient in terms of memory access etc... Cheers, Derek I can confirm that the current behavior is not sufficient for all of the original corner cases that ndmin was supposed to address. Keep in mind that np.loadtxt takes a one-column data file and a one-row data file down to the same shape. I don't see how the current code is able to produce the correct array shape when ndmin=2. Do we have some sort of counter in loadtxt for counting the number of rows and columns read? Could we use those to help guide the ndmin=2 case? I think that using atleast_1d(X) might be a bit overkill, but it would be very clear as to the code's intent. I don't think we have to worry about memory usage if we limit its use to only situations where ndmin is greater than the number of dimensions of the array. In those cases, the array is either an empty result, a scalar value (in which memory access is trivial), or 1-d (in which a transpose is cheap). What if one does things the other way around - avoid calling squeeze until _after_ doing the atleast_Nd() magic? That way the row/column information should be conserved, right? Also, we avoid transposing, memory use, ... Oh, and someone could conceivably have a _looong_ 1D file, but would want it read as a 2D array. Paul @Derek, good catch with noticing the error in the tests. We do still need to handle the case I mentioned, however. I have attached an example script to demonstrate the issue. In this script, I would expect the second-to-last array to be a shape of (1, 5). I believe that the single-row, multi-column case would actually be the more common type of edge-case encountered by users than the others. Therefore, I believe that this ndmin fix is not adequate until this is addressed. @Paul, we can't call squeeze after doing the atleast_Nd() magic. That would just undo whatever we had just done. Also, wrt the transpose, a (1, 10) array looks the same in memory as a (10, 1) array, right? Agree. I thought more along the lines of (pseudocode-ish) if ndmin == 0: squeeze() if ndmin == 1: atleast_1D() elif ndmin == 2: atleast_2D() else: I don't rightly know what would go here, maybe raise ValueError? That would avoid the squeeze call before the atleast_Nd magic. But the code was changed, so I think my comment doesn't make sense anymore. It's probably fine the way it is! Paul I have thought of that too, but the problem with that approach is that after reading the file, X will have 2 or 3 dimensions, regardless of how many singleton dims were in the file. A squeeze will always be needed. Also, the purpose of squeeze is opposite that of the atleast_*d() functions: squeeze reduces dimensions, while atleast_*d will add dimensions. Therefore, I re-iterate... the patch by Derek gets the job done. I have tested it for a wide variety of inputs for both regular arrays and record arrays. Is there room for improvements? Yes, but I think that can wait for later. Derek's patch however fixes an important bug in the ndmin implementation and should be included for the release. Two questions: can you point me to the patch/ticket, and is this a regression? Thanks, Ralf I don't know if he did a pull-request or not, but here is the link he provided earlier in the thread. https://github.com/dhomeier/numpy/compare/master...ndmin-cols Technically, this is not a regression as the ndmin feature is new in this release. Yes right, I forgot this was a recent change. However, the problem that ndmin is supposed to address is not fixed by the current implementation for the rc. Essentially, a single-row, multi-column file with ndmin=2 comes out as a Nx1 array which is the same result for a multi-row, single-column
Re: [Numpy-discussion] loadtxt ndmin option
On 5 May 2011, at 22:53, Derek Homeier wrote: However, the problem that ndmin is supposed to address is not fixed by the current implementation for the rc. Essentially, a single- row, multi-column file with ndmin=2 comes out as a Nx1 array which is the same result for a multi-row, single-column file. My feeling is that if we let the current implementation stand as is, and developers use it in their code, then fixing it in a later release would introduce more problems (maybe the devels would transpose the result themselves or something). Better to fix it now in rc with the two lines of code (and the correction to the tests), then to introduce a buggy feature that will be hard to fix in future releases, IMHO. Looks okay, and I agree that it's better to fix it now. The timing is a bit unfortunate though, just after RC2. I'll have closer look tomorrow and if it can go in, probably tag RC3. If in the meantime a few more people could test this, that would be helpful. Ralf I agree, wish I had time to push this before rc2. I could add the explanatory comments mentioned above and switch to use the atleast_[12]d() solution, test that and push it in a couple of minutes, or should I better leave it as is now for testing? Quick follow-up: I just applied the above changes, added some tests to cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 i386+ppc + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo and do my (first) pull request... Cheers, Derek ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] ANN: Numpy 1.6.0 release candidate 2
On Thu, May 5, 2011 at 11:51 PM, DJ Luscher d...@lanl.gov wrote: Ralf Gommers ralf.gommers at googlemail.com writes: Hi, I am pleased to announce the availability of the second release candidate of NumPy 1.6.0. Compared to the first release candidate, one segfault on (32-bit Windows + MSVC) and several memory leaks were fixed. If no new problems are reported, the final release will be in one week. Sources and binaries can be found at http://sourceforge.net/projects/numpy/files/NumPy/1.6.0rc2/ For (preliminary) release notes see below. Enjoy, Ralf = NumPy 1.6.0 Release Notes = Fortran assumed shape array and size function support in ``numpy.f2py`` --- F2py now supports wrapping Fortran 90 routines that use assumed shape arrays. Before such routines could be called from Python but the corresponding Fortran routines received assumed shape arrays as zero length arrays which caused unpredicted results. Thanks to Lorenz Hüdepohl for pointing out the correct way to interface routines with assumed shape arrays. In addition, f2py interprets Fortran expression ``size(array, dim)`` as ``shape(array, dim-1)`` which makes it possible to automatically wrap Fortran routines that use two argument ``size`` function in dimension specifications. Before users were forced to apply this mapping manually. Regarding the f2py support for assumed shape arrays: I'm just struggling along trying to learn how to use f2py to interface with fortran source, so please be patient if I am missing something obvious. That said, in test cases I've run the new f2py assumed-shape-array support in Numpy 1.6.0.rc2 seems to conflict with the support for f90-style modules. For example: foo_mod.f90 ! -*- fix -*- module easy real, parameter :: anx(4) = (/1.,2.,3.,4./) contains subroutine sum(x, res) implicit none real, intent(in) :: x(:) real, intent(out) :: res integer :: i !print *, sum: size(x) = , size(x) res = 0.0 do i = 1, size(x) res = res + x(i) enddo end subroutine sum end module easy when compiled with: f2py -c --fcompiler=intelem foo_mod.f90 -m e then: python import e print e.easy.sum(e.easy.anx) returns: 0.0 Also (and I believe related) f2py can no longer compile source with assumed shape array valued functions within a module. Even though the python wrapped code did not function properly when called from python, it did work when called from other fortran code. It seems that the interface has been broken. The previous version of Numpy I was using was 1.3.0 all on Ubuntu 10.04, Python 2.6, and using Intel fortran compiler. thanks for your consideration and feedback. Thanks for the bug report! These issues are now fixed in: https://github.com/numpy/numpy/commit/f393b604 Ralf, feel free to apply this changeset to 1.6.x branch if appropriate. Regards, Pearu ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] FYI -- numpy.linalg.lstsq as distributed by Ubuntu is crashing on some inputs
Probably just another standard your BLAS is compiled wrong! bug, but in this case I'm seeing it with the stock versions of ATLAS, numpy, etc. included in the latest Ubuntu release (11.04 Natty Narwhal): https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/778217 So I thought people might like a heads up in case they run into it as well. -- Nathaniel ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] loadtxt ndmin option
On Fri, May 6, 2011 at 12:12 AM, Derek Homeier de...@astro.physik.uni-goettingen.de wrote: On 5 May 2011, at 22:53, Derek Homeier wrote: However, the problem that ndmin is supposed to address is not fixed by the current implementation for the rc. Essentially, a single- row, multi-column file with ndmin=2 comes out as a Nx1 array which is the same result for a multi-row, single-column file. My feeling is that if we let the current implementation stand as is, and developers use it in their code, then fixing it in a later release would introduce more problems (maybe the devels would transpose the result themselves or something). Better to fix it now in rc with the two lines of code (and the correction to the tests), then to introduce a buggy feature that will be hard to fix in future releases, IMHO. Looks okay, and I agree that it's better to fix it now. The timing is a bit unfortunate though, just after RC2. I'll have closer look tomorrow and if it can go in, probably tag RC3. If in the meantime a few more people could test this, that would be helpful. Ralf I agree, wish I had time to push this before rc2. I could add the explanatory comments mentioned above and switch to use the atleast_[12]d() solution, test that and push it in a couple of minutes, or should I better leave it as is now for testing? Quick follow-up: I just applied the above changes, added some tests to cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5 i386+ppc + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo and do my (first) pull request... Go ahead, I'll have a look at it tonight. Thanks for testing on several Pythons, that definitely helps. Ralf ___ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion