Re: [Numpy-discussion] optimizing ndarray.__setitem__

2011-05-05 Thread Christoph Groth
 On Wed, May 4, 2011 at 6:19 AM, Christoph Groth c...@falma.de wrote:

 Dear numpy experts,

 I have noticed that with Numpy 1.5.1 the operation

 m[::2] += 1.0

 takes twice as long as

 t = m[::2]
 t += 1.0

Mark Wiebe mwwi...@gmail.com writes:

 You'd better time this in 1.6 too. ;)

 https://github.com/numpy/numpy/commit/f60797ba64ccf33597225d23b893b6eb11149860

This seems to be exactly what I had in mind.  Thanks for finding this.

 The case of boolean mask indexing can't benefit so easily from this
 optimization, but I think could see a big performance benefit if
 combined __index__ + __iop__ operators were added to
 Python. Something to consider, anyway.

Has something like __index_iadd__ ever been considered seriously?  Not
to my (limited) knowledge.

Indeed, the second loop executes twice as fast than the first in the
following example (again with Numpy 1.5.1).

import numpy
m = numpy.zeros((1000, 1000))
mask = numpy.arange(0, 1000, 2, dtype=int)

for i in xrange(40):
m[mask] += 1.0

for i in xrange(40):
t = m[mask]
t += 1.0

But wouldn't it be easy to optimize this as well, by not executing
assignments where the source and the destination is indexed by the same
mask object?  This would be a bit weaker, as it would work only for same
(a is b), and not for equal masks, but it should still cover the most
common case.

Christoph

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 4. mai 2011, at 20.33, Benjamin Root wrote:

  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
 for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
 will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access
 etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the
 original corner cases that ndmin was supposed to address.  Keep in mind that
 np.loadtxt takes a one-column data file and a one-row data file down to the
 same shape.  I don't see how the current code is able to produce the correct
 array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
 counting the number of rows and columns read?  Could we use those to help
 guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would be
 very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).

 What if one does things the other way around - avoid calling squeeze until
 _after_ doing the atleast_Nd() magic? That way the row/column information
 should be conserved, right? Also, we avoid transposing, memory use, ...

 Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.

 Paul



@Derek, good catch with noticing the error in the tests. We do still need to
handle the case I mentioned, however.  I have attached an example script to
demonstrate the issue.  In this script, I would expect the second-to-last
array to be a shape of (1, 5).  I believe that the single-row, multi-column
case would actually be the more common type of edge-case encountered by
users than the others.  Therefore, I believe that this ndmin fix is not
adequate until this is addressed.

@Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would
just undo whatever we had just done.  Also, wrt the transpose, a (1, 10)
array looks the same in memory as a (10, 1) array, right?

Ben Root


loadtest.py
Description: Binary data
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 10:49 AM, Benjamin Root ben.r...@ou.edu wrote:



 On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 4. mai 2011, at 20.33, Benjamin Root wrote:

  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written
 for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it
 will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access
 etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the
 original corner cases that ndmin was supposed to address.  Keep in mind that
 np.loadtxt takes a one-column data file and a one-row data file down to the
 same shape.  I don't see how the current code is able to produce the correct
 array shape when ndmin=2.  Do we have some sort of counter in loadtxt for
 counting the number of rows and columns read?  Could we use those to help
 guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).

 What if one does things the other way around - avoid calling squeeze until
 _after_ doing the atleast_Nd() magic? That way the row/column information
 should be conserved, right? Also, we avoid transposing, memory use, ...

 Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.

 Paul



 @Derek, good catch with noticing the error in the tests. We do still need
 to handle the case I mentioned, however.  I have attached an example script
 to demonstrate the issue.  In this script, I would expect the second-to-last
 array to be a shape of (1, 5).  I believe that the single-row, multi-column
 case would actually be the more common type of edge-case encountered by
 users than the others.  Therefore, I believe that this ndmin fix is not
 adequate until this is addressed.


Apologies Derek, your patch does address the issue I raised.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] optimizing ndarray.__setitem__

2011-05-05 Thread Robert Kern
On Thu, May 5, 2011 at 02:29, Christoph Groth c...@falma.de wrote:
 On Wed, May 4, 2011 at 6:19 AM, Christoph Groth c...@falma.de wrote:

     Dear numpy experts,

     I have noticed that with Numpy 1.5.1 the operation

     m[::2] += 1.0

     takes twice as long as

     t = m[::2]
     t += 1.0

 Mark Wiebe mwwi...@gmail.com writes:

 You'd better time this in 1.6 too. ;)

 https://github.com/numpy/numpy/commit/f60797ba64ccf33597225d23b893b6eb11149860

 This seems to be exactly what I had in mind.  Thanks for finding this.

 The case of boolean mask indexing can't benefit so easily from this
 optimization, but I think could see a big performance benefit if
 combined __index__ + __iop__ operators were added to
 Python. Something to consider, anyway.

 Has something like __index_iadd__ ever been considered seriously?  Not
 to my (limited) knowledge.

Only on this list, I think. :-)

I don't think it will ever happen. Only numpy really cares about it,
and adding another __special__ method for each __iop__ is a lot of
additional methods that need to be supported.

 Indeed, the second loop executes twice as fast than the first in the
 following example (again with Numpy 1.5.1).

 import numpy
 m = numpy.zeros((1000, 1000))
 mask = numpy.arange(0, 1000, 2, dtype=int)

 for i in xrange(40):
    m[mask] += 1.0

 for i in xrange(40):
    t = m[mask]
    t += 1.0

 But wouldn't it be easy to optimize this as well, by not executing
 assignments where the source and the destination is indexed by the same
 mask object?

No. These two are not semantically equivalent. Your second example
does not actually modify m. For integer and bool mask arrays, m[mask]
necessarily makes a copy, so when you modify t via inplace addition,
you have only modified t and not m. The assignment back to m[mask] is
necessary.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] optimizing ndarray.__setitem__

2011-05-05 Thread Nathaniel Smith
On Thu, May 5, 2011 at 9:54 AM, Robert Kern robert.k...@gmail.com wrote:
 On Thu, May 5, 2011 at 02:29, Christoph Groth c...@falma.de wrote:
 Has something like __index_iadd__ ever been considered seriously?  Not
 to my (limited) knowledge.

 Only on this list, I think. :-)

 I don't think it will ever happen. Only numpy really cares about it,
 and adding another __special__ method for each __iop__ is a lot of
 additional methods that need to be supported.

Maybe in the context of PyPy someone will come up with a clever way to
implement template-expression style operator fusion for numpy. That'd
be kinda neat.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Paul Anton Letnes

On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
 On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
 On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
  On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
  de...@astro.physik.uni-goettingen.de wrote:
  On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
 
   But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions written 
   for this? Shouldn't we reuse them? Perhaps it's overkill, and perhaps it 
   will reintroduce the 'transposed' problem?
 
  Yes, good point, one could replace the
  X.shape = (X.size, ) with X = np.atleast_1d(X),
  but for the ndmin=2 case, we'd need to replace
  X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
  not sure which solution is more efficient in terms of memory access etc...
 
  Cheers,
 Derek
 
 
  I can confirm that the current behavior is not sufficient for all of the 
  original corner cases that ndmin was supposed to address.  Keep in mind 
  that np.loadtxt takes a one-column data file and a one-row data file down 
  to the same shape.  I don't see how the current code is able to produce the 
  correct array shape when ndmin=2.  Do we have some sort of counter in 
  loadtxt for counting the number of rows and columns read?  Could we use 
  those to help guide the ndmin=2 case?
 
  I think that using atleast_1d(X) might be a bit overkill, but it would be 
  very clear as to the code's intent.  I don't think we have to worry about 
  memory usage if we limit its use to only situations where ndmin is greater 
  than the number of dimensions of the array.  In those cases, the array is 
  either an empty result, a scalar value (in which memory access is trivial), 
  or 1-d (in which a transpose is cheap).
 
 What if one does things the other way around - avoid calling squeeze until 
 _after_ doing the atleast_Nd() magic? That way the row/column information 
 should be conserved, right? Also, we avoid transposing, memory use, ...
 
 Oh, and someone could conceivably have a _looong_ 1D file, but would want it 
 read as a 2D array.
 
 Paul
 
 
 
 @Derek, good catch with noticing the error in the tests. We do still need to 
 handle the case I mentioned, however.  I have attached an example script to 
 demonstrate the issue.  In this script, I would expect the second-to-last 
 array to be a shape of (1, 5).  I believe that the single-row, multi-column 
 case would actually be the more common type of edge-case encountered by users 
 than the others.  Therefore, I believe that this ndmin fix is not adequate 
 until this is addressed.
 
 @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That would 
 just undo whatever we had just done.  Also, wrt the transpose, a (1, 10) 
 array looks the same in memory as a (10, 1) array, right?
Agree. I thought more along the lines of (pseudocode-ish)
if ndmin == 0:
squeeze()
if ndmin == 1:
atleast_1D()
elif ndmin == 2:
atleast_2D()
else:
I don't rightly know what would go here, maybe raise ValueError?

That would avoid the squeeze call before the atleast_Nd magic. But the code was 
changed, so I think my comment doesn't make sense anymore. It's probably fine 
the way it is!

Paul

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numpy 1.6.0 release candidate 2

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 1:10 AM, Benjamin Root ben.r...@ou.edu wrote:



 On Tue, May 3, 2011 at 1:18 PM, Ralf Gommers 
 ralf.gomm...@googlemail.comwrote:

 Hi,

 I am pleased to announce the availability of the second release
 candidate of NumPy 1.6.0.

 Compared to the first release candidate, one segfault on (32-bit
 Windows + MSVC) and several memory leaks were fixed. If no new
 problems are reported, the final release will be in one week.

 Sources and binaries can be found at
 http://sourceforge.net/projects/numpy/files/NumPy/1.6.0rc2/
 For (preliminary) release notes see below.

 Enjoy,
 Ralf



 Minor issue I just noticed on my recently installed Ubuntu 11.04 machine.
 The setup script is making a call to 'svnversion'.  Doesn't impact the build
 or anything, but I only noticed it because svn hasn't been installed yet on
 that machine.  Don't know if it is something that ought to be cleaned up or
 not.

 That's harmless and better left alone for backwards compatibility I think.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would want
 it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still need
 to handle the case I mentioned, however.  I have attached an example script
 to demonstrate the issue.  In this script, I would expect the second-to-last
 array to be a shape of (1, 5).  I believe that the single-row, multi-column
 case would actually be the more common type of edge-case encountered by
 users than the others.  Therefore, I believe that this ndmin fix is not
 adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the code
 was changed, so I think my comment doesn't make sense anymore. It's probably
 fine the way it is!

 Paul


I have thought of that too, but the problem with that approach is that after
reading the file, X will have 2 or 3 dimensions, regardless of how many
singleton dims were in the file.  A squeeze will always be needed.  Also,
the purpose of squeeze is opposite that of the atleast_*d() functions:
squeeze reduces dimensions, while atleast_*d will add dimensions.

Therefore, I re-iterate... the patch by Derek gets the job done.  I have
tested it for a wide variety of inputs for both regular arrays and record
arrays.  Is there room for improvements?  Yes, but I think that can wait for
later.  Derek's patch however fixes an important bug in the ndmin
implementation and should be included for the release.

Ben Root
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it would
 be very clear as to the code's intent.  I don't think we have to worry about
 memory usage if we limit its use to only situations where ndmin is greater
 than the number of dimensions of the array.  In those cases, the array is
 either an empty result, a scalar value (in which memory access is trivial),
 or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
regression?

Thanks,
Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Benjamin Root
On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers ralf.gomm...@googlemail.comwrote:



 On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it
 would be very clear as to the code's intent.  I don't think we have to worry
 about memory usage if we limit its use to only situations where ndmin is
 greater than the number of dimensions of the array.  In those cases, the
 array is either an empty result, a scalar value (in which memory access is
 trivial), or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
 regression?

 Thanks,
 Ralf



I don't know if he did a pull-request or not, but here is the link he
provided earlier in the thread.

https://github.com/dhomeier/numpy/compare/master...ndmin-cols

Technically, this is not a regression as the ndmin feature is new in this
release.  However, the problem that ndmin is supposed to address is not
fixed by the current implementation for the rc.  Essentially, a single-row,
multi-column file with ndmin=2 comes out as a Nx1 array which is the same
result for a multi-row, single-column file.  My feeling is that if we let
the current implementation stand as is, and developers use it in their code,
then fixing it in a later 

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Thu, May 5, 2011 at 9:46 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 2:33 PM, Ralf Gommers 
 ralf.gomm...@googlemail.comwrote:



 On Thu, May 5, 2011 at 9:18 PM, Benjamin Root ben.r...@ou.edu wrote:



 On Thu, May 5, 2011 at 1:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:


 On 5. mai 2011, at 08.49, Benjamin Root wrote:

 
 
  On Wed, May 4, 2011 at 11:08 PM, Paul Anton Letnes 
 paul.anton.let...@gmail.com wrote:
 
  On 4. mai 2011, at 20.33, Benjamin Root wrote:
 
   On Wed, May 4, 2011 at 7:54 PM, Derek Homeier 
 de...@astro.physik.uni-goettingen.de wrote:
   On 05.05.2011, at 2:40AM, Paul Anton Letnes wrote:
  
But: Isn't the numpy.atleast_2d and numpy.atleast_1d functions
 written for this? Shouldn't we reuse them? Perhaps it's overkill, and
 perhaps it will reintroduce the 'transposed' problem?
  
   Yes, good point, one could replace the
   X.shape = (X.size, ) with X = np.atleast_1d(X),
   but for the ndmin=2 case, we'd need to replace
   X.shape = (X.size, 1) with X = np.atleast_2d(X).T -
   not sure which solution is more efficient in terms of memory access
 etc...
  
   Cheers,
  Derek
  
  
   I can confirm that the current behavior is not sufficient for all of
 the original corner cases that ndmin was supposed to address.  Keep in mind
 that np.loadtxt takes a one-column data file and a one-row data file down 
 to
 the same shape.  I don't see how the current code is able to produce the
 correct array shape when ndmin=2.  Do we have some sort of counter in
 loadtxt for counting the number of rows and columns read?  Could we use
 those to help guide the ndmin=2 case?
  
   I think that using atleast_1d(X) might be a bit overkill, but it
 would be very clear as to the code's intent.  I don't think we have to 
 worry
 about memory usage if we limit its use to only situations where ndmin is
 greater than the number of dimensions of the array.  In those cases, the
 array is either an empty result, a scalar value (in which memory access is
 trivial), or 1-d (in which a transpose is cheap).
 
  What if one does things the other way around - avoid calling squeeze
 until _after_ doing the atleast_Nd() magic? That way the row/column
 information should be conserved, right? Also, we avoid transposing, memory
 use, ...
 
  Oh, and someone could conceivably have a _looong_ 1D file, but would
 want it read as a 2D array.
 
  Paul
 
 
 
  @Derek, good catch with noticing the error in the tests. We do still
 need to handle the case I mentioned, however.  I have attached an example
 script to demonstrate the issue.  In this script, I would expect the
 second-to-last array to be a shape of (1, 5).  I believe that the
 single-row, multi-column case would actually be the more common type of
 edge-case encountered by users than the others.  Therefore, I believe that
 this ndmin fix is not adequate until this is addressed.
 
  @Paul, we can't call squeeze after doing the atleast_Nd() magic.  That
 would just undo whatever we had just done.  Also, wrt the transpose, a (1,
 10) array looks the same in memory as a (10, 1) array, right?
 Agree. I thought more along the lines of (pseudocode-ish)
 if ndmin == 0:
squeeze()
 if ndmin == 1:
atleast_1D()
 elif ndmin == 2:
atleast_2D()
 else:
I don't rightly know what would go here, maybe raise ValueError?

 That would avoid the squeeze call before the atleast_Nd magic. But the
 code was changed, so I think my comment doesn't make sense anymore. It's
 probably fine the way it is!

 Paul


 I have thought of that too, but the problem with that approach is that
 after reading the file, X will have 2 or 3 dimensions, regardless of how
 many singleton dims were in the file.  A squeeze will always be needed.
 Also, the purpose of squeeze is opposite that of the atleast_*d()
 functions:  squeeze reduces dimensions, while atleast_*d will add
 dimensions.

 Therefore, I re-iterate... the patch by Derek gets the job done.  I have
 tested it for a wide variety of inputs for both regular arrays and record
 arrays.  Is there room for improvements?  Yes, but I think that can wait for
 later.  Derek's patch however fixes an important bug in the ndmin
 implementation and should be included for the release.

 Two questions: can you point me to the patch/ticket, and is this a
 regression?

 Thanks,
 Ralf



 I don't know if he did a pull-request or not, but here is the link he
 provided earlier in the thread.

 https://github.com/dhomeier/numpy/compare/master...ndmin-cols

 Technically, this is not a regression as the ndmin feature is new in this
 release.


Yes right, I forgot this was a recent change.


 However, the problem that ndmin is supposed to address is not fixed by the
 current implementation for the rc.  Essentially, a single-row, multi-column
 file with ndmin=2 comes out as a Nx1 array which is the same result for a
 multi-row, single-column 

Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Derek Homeier

On 5 May 2011, at 22:53, Derek Homeier wrote:


 However, the problem that ndmin is supposed to address is not fixed
 by the current implementation for the rc.  Essentially, a single-
 row, multi-column file with ndmin=2 comes out as a Nx1 array which
 is the same result for a multi-row, single-column file.  My feeling
 is that if we let the current implementation stand as is, and
 developers use it in their code, then fixing it in a later release
 would introduce more problems (maybe the devels would transpose the
 result themselves or something).  Better to fix it now in rc with
 the two lines of code (and the correction to the tests), then to
 introduce a buggy feature that will be hard to fix in future
 releases, IMHO.

 Looks okay, and I agree that it's better to fix it now. The timing
 is a bit unfortunate though, just after RC2. I'll have closer look
 tomorrow and if it can go in, probably tag RC3.

 If in the meantime a few more people could test this, that would be
 helpful.

 Ralf

 I agree, wish I had time to push this before rc2. I could add the
 explanatory comments
 mentioned above and switch to use the atleast_[12]d() solution, test
 that and push it
 in a couple of minutes, or should I better leave it as is now for
 testing?

Quick follow-up: I just applied the above changes, added some tests to
cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5  
i386+ppc
+ 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo  
and do
my (first) pull request...

Cheers,
Derek

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] ANN: Numpy 1.6.0 release candidate 2

2011-05-05 Thread Pearu Peterson
On Thu, May 5, 2011 at 11:51 PM, DJ Luscher d...@lanl.gov wrote:


 Ralf Gommers ralf.gommers at googlemail.com writes:

 
  Hi,
 
  I am pleased to announce the availability of the second release
  candidate of NumPy 1.6.0.
 
  Compared to the first release candidate, one segfault on (32-bit
  Windows + MSVC) and several memory leaks were fixed. If no new
  problems are reported, the final release will be in one week.
 
  Sources and binaries can be found at
  http://sourceforge.net/projects/numpy/files/NumPy/1.6.0rc2/
  For (preliminary) release notes see below.
 
  Enjoy,
  Ralf
 
  =
  NumPy 1.6.0 Release Notes
  =
 
  Fortran assumed shape array and size function support in ``numpy.f2py``
  ---
 
  F2py now supports wrapping Fortran 90 routines that use assumed shape
  arrays.  Before such routines could be called from Python but the
  corresponding Fortran routines received assumed shape arrays as zero
  length arrays which caused unpredicted results. Thanks to Lorenz
  Hüdepohl for pointing out the correct way to interface routines with
  assumed shape arrays.
 
  In addition, f2py interprets Fortran expression ``size(array, dim)``
  as ``shape(array, dim-1)`` which makes it possible to automatically
  wrap Fortran routines that use two argument ``size`` function in
  dimension specifications. Before users were forced to apply this
  mapping manually.
 


 Regarding the f2py support for assumed shape arrays:

 I'm just struggling along trying to learn how to use f2py to interface with
 fortran source, so please be patient if I am missing something obvious.
  That
 said, in test cases I've run the new f2py assumed-shape-array support in
 Numpy
 1.6.0.rc2 seems to conflict with the support for f90-style modules.  For
 example:

 foo_mod.f90

   ! -*- fix -*-

   module easy

   real, parameter :: anx(4) = (/1.,2.,3.,4./)

   contains

   subroutine sum(x, res)
 implicit none
 real, intent(in) :: x(:)
 real, intent(out) :: res

 integer :: i

 !print *, sum: size(x) = , size(x)

 res = 0.0

 do i = 1, size(x)
   res = res + x(i)
 enddo

   end subroutine sum

   end module easy


 when compiled with:
 f2py -c --fcompiler=intelem foo_mod.f90  -m e

 then:

 python
 import e
 print e.easy.sum(e.easy.anx)

 returns: 0.0

 Also (and I believe related) f2py can no longer compile source with assumed
 shape array valued functions within a module.  Even though the python
 wrapped
 code did not function properly when called from python, it did work when
 called
 from other fortran code.  It seems that the interface has been broken.  The
 previous version of Numpy I was using was 1.3.0 all on Ubuntu 10.04, Python
 2.6,
 and using Intel fortran compiler.

 thanks for your consideration and feedback.


Thanks for the bug report!

These issues are now fixed in:

  https://github.com/numpy/numpy/commit/f393b604

Ralf, feel free to apply this changeset to 1.6.x branch if appropriate.

Regards,
Pearu
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] FYI -- numpy.linalg.lstsq as distributed by Ubuntu is crashing on some inputs

2011-05-05 Thread Nathaniel Smith
Probably just another standard your BLAS is compiled wrong! bug, but
in this case I'm seeing it with the stock versions of ATLAS, numpy,
etc. included in the latest Ubuntu release (11.04 Natty Narwhal):
  https://bugs.launchpad.net/ubuntu/+source/atlas/+bug/778217

So I thought people might like a heads up in case they run into it as well.

-- Nathaniel
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] loadtxt ndmin option

2011-05-05 Thread Ralf Gommers
On Fri, May 6, 2011 at 12:12 AM, Derek Homeier 
de...@astro.physik.uni-goettingen.de wrote:


 On 5 May 2011, at 22:53, Derek Homeier wrote:

 
  However, the problem that ndmin is supposed to address is not fixed
  by the current implementation for the rc.  Essentially, a single-
  row, multi-column file with ndmin=2 comes out as a Nx1 array which
  is the same result for a multi-row, single-column file.  My feeling
  is that if we let the current implementation stand as is, and
  developers use it in their code, then fixing it in a later release
  would introduce more problems (maybe the devels would transpose the
  result themselves or something).  Better to fix it now in rc with
  the two lines of code (and the correction to the tests), then to
  introduce a buggy feature that will be hard to fix in future
  releases, IMHO.
 
  Looks okay, and I agree that it's better to fix it now. The timing
  is a bit unfortunate though, just after RC2. I'll have closer look
  tomorrow and if it can go in, probably tag RC3.
 
  If in the meantime a few more people could test this, that would be
  helpful.
 
  Ralf
 
  I agree, wish I had time to push this before rc2. I could add the
  explanatory comments
  mentioned above and switch to use the atleast_[12]d() solution, test
  that and push it
  in a couple of minutes, or should I better leave it as is now for
  testing?

 Quick follow-up: I just applied the above changes, added some tests to
 cover Ben's test cases and tested this with 1.6.0rc2 on OS X 10.5
 i386+ppc
 + 10.6 x86_64 (Python2.7+3.2). So I'd be ready to push it to my repo
 and do
 my (first) pull request...


Go ahead, I'll have a look at it tonight. Thanks for testing on several
Pythons, that definitely helps.

Ralf
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion