Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 6:38 PM, Robert Kern  wrote:
> On Fri, Dec 11, 2009 at 18:38, Keith Goodman  wrote:
>
>> That seems to work. To avoid changing the input
>>
 x = np.array(1)
 x.shape
>>   ()
 y = nan_to_num(x)
 x.shape
>>   (1,)
>>
>> I moved y = x.copy() further up and switched x's to y's. Here's what
>> it looks like:
>>
>> def nan_to_num(x):
>>    is_scalar = False
>>    if not isinstance(x, _nx.ndarray):
>>       x = asarray(x)
>>       if x.shape == ():
>>           # Must return this as a scalar later.
>>           is_scalar = True
>>    y = x.copy()
>>    old_shape = y.shape
>>    if y.shape == ():
>>       # We need element access.
>>       y.shape = (1,)
>>    t = y.dtype.type
>>    if issubclass(t, _nx.complexfloating):
>>        return nan_to_num(y.real) + 1j * nan_to_num(y.imag)
>
> Almost! You need to handle the shape restoration in this branch, too.
>
> In [9]: nan_to_num(array(1+1j))
> Out[9]: array([ 1.+1.j])

Taking care of my imaginary bug has the nice side effect of leaving us
with only one return statement. I changed

return nan_to_num(y.real) + 1j * nan_to_num(y.imag)

to

y = nan_to_num(y.real) + 1j * nan_to_num(y.imag)

And changed the if on the next line to elif.

def nan_to_num(x):
is_scalar = False
if not isinstance(x, _nx.ndarray):
   x = asarray(x)
   if x.shape == ():
   # Must return this as a scalar later.
   is_scalar = True
y = x.copy()
old_shape = y.shape
if y.shape == ():
   # We need element access.
   y.shape = (1,)
t = y.dtype.type
if issubclass(t, _nx.complexfloating):
y = nan_to_num(y.real) + 1j * nan_to_num(y.imag)
elif issubclass(t, _nx.inexact):
are_inf = isposinf(y)
are_neg_inf = isneginf(y)
are_nan = isnan(y)
maxf, minf = _getmaxmin(y.dtype.type)
y[are_nan] = 0
y[are_inf] = maxf
y[are_neg_inf] = minf
if is_scalar:
y = y[0]
else:
y.shape = old_shape
return y
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Hoyt Koepke
> One thing to note is that dot uses optimized atlas if available, which
> makes it quite faster than equivalent operations you would do using
> purely numpy. I doubt that's the reason here, since the arrays are
> small, but that's something to keep in mind when performances matter:
> use dot wherever possible, it is generally faster than prod/sum,

This is quite true; I once had a very large matrix  (600 x 200,000)
that I needed to normalize.  Using .sum( ) and /= took about 30
minutes.  When I switched to using dot( ) to do the same operation
(matrix multiplication with a vector of 1's, then turning that into a
diagonal matrix and using dot() again to normalize it), it dropped the
computation time down to about 2 minutes.   Most of the gain was
likely due to ATLAS using all the cores and numpy only using 1, but I
was still impressed.

--Hoyt


+ Hoyt Koepke
+ University of Washington Department of Statistics
+ http://www.stat.washington.edu/~hoytak/
+ hoy...@gmail.com
++
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread David Cournapeau
On Fri, Dec 11, 2009 at 10:06 PM, Bruce Southey  wrote:

>
> Having said that, the more you can vectorize your function, the more
> efficient it will likely be especially with Atlas etc.

One thing to note is that dot uses optimized atlas if available, which
makes it quite faster than equivalent operations you would do using
purely numpy. I doubt that's the reason here, since the arrays are
small, but that's something to keep in mind when performances matter:
use dot wherever possible, it is generally faster than prod/sum,

cheers,

David
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] non-standard standard deviation

2009-12-11 Thread Dr. Phillip M. Feldman



Anne Archibald wrote:
> 
> 2009/11/29 Dr. Phillip M. Feldman :
> 
>> All of the statistical packages that I am currently using and have used
>> in
>> the past (Matlab, Minitab, R, S-plus) calculate standard deviation using
>> the
>> sqrt(1/(n-1)) normalization, which gives a result that is unbiased when
>> sampling from a normally-distributed population.  NumPy uses the
>> sqrt(1/n)
>> normalization.  I'm currently using the following code to calculate
>> standard
>> deviations, but would much prefer if this could be fixed in NumPy itself:
> 
> This issue was the subject of lengthy discussions on the mailing list,
> the upshot of which is that in current versions of scipy, std and var
> take an optional argument "ddof", into which you can supply 1 to get
> the normalization you want.
> 
> Anne
> 

You are right that I can get the result that I want by setting ddof. 
Thanks!

I still feel that the default value for ddof should be 1 rather than 0; new
users are unlikely to read the documentation for a command like std, because
it is reasonable to expect standard behavior across all statistical
packages.

Phillip
-- 
View this message in context: 
http://old.nabble.com/non-standard-standard-deviation-tp26566808p26753999.html
Sent from the Numpy-discussion mailing list archive at Nabble.com.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 18:38, Keith Goodman  wrote:

> That seems to work. To avoid changing the input
>
>>> x = np.array(1)
>>> x.shape
>   ()
>>> y = nan_to_num(x)
>>> x.shape
>   (1,)
>
> I moved y = x.copy() further up and switched x's to y's. Here's what
> it looks like:
>
> def nan_to_num(x):
>    is_scalar = False
>    if not isinstance(x, _nx.ndarray):
>       x = asarray(x)
>       if x.shape == ():
>           # Must return this as a scalar later.
>           is_scalar = True
>    y = x.copy()
>    old_shape = y.shape
>    if y.shape == ():
>       # We need element access.
>       y.shape = (1,)
>    t = y.dtype.type
>    if issubclass(t, _nx.complexfloating):
>        return nan_to_num(y.real) + 1j * nan_to_num(y.imag)

Almost! You need to handle the shape restoration in this branch, too.

In [9]: nan_to_num(array(1+1j))
Out[9]: array([ 1.+1.j])

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 4:06 PM, Robert Kern  wrote:
> On Fri, Dec 11, 2009 at 17:44, Keith Goodman  wrote:
>> On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern  wrote:
>>> On Fri, Dec 11, 2009 at 16:09, Keith Goodman  wrote:
 On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
> On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  
>> wrote:
>
>>> So I agree that it should leave the input untouched when a non-float
>>> dtype is used for some array-like input.
>>
>> Would only one line need to be changed? Would changing
>>
>> if not issubclass(t, _nx.integer):
>>
>> to
>>
>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>>
>> do the trick?
>
> That still leaves strings, voids, and objects. I recommend:
>
>  if issubclass(t, _nx.inexact):
>
> Arguably, one should handle nan float objects in object arrays and
> float columns in structured arrays, but the current code does not
> handle either of those anyways.

 Without your change both

>> np.nan_to_num(np.array([True, False]))
>> np.nan_to_num([1])

 raise exceptions. With your change:

>> np.nan_to_num(np.array([True, False]))
   array([ True, False], dtype=bool)
>> np.nan_to_num([1])
   array([1])
>>>
>>> I think this is correct, though the latter one happens by accident.
>>> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
>>> checked and happens to be object_. The latter line is intended to
>>> handle scalars, not sequences. I think that sequences should be
>>> coerced to arrays for output and this check should be more explicit
>>> about what it handles. [1.0] will have a problem if you don't.
>>
>> That makes sense. But I'm not smart enough to implement it.
>
> Something like the following at the top should help distinguish the
> various cases.:
>
> is_scalar = False
> if not isinstance(x, _nx.ndarray):
>    x = np.asarray(x)
>    if x.shape == ():
>        # Must return this as a scalar later.
>        is_scalar = True
> old_shape = x.shape
> if x.shape == ():
>    # We need element access.
>    x.shape = (1,)
> t = x.dtype.type
>
> This should allow one to pass in [np.inf] and have it correctly get
> interpreted as a float array rather than an object scalar.

That seems to work. To avoid changing the input

>> x = np.array(1)
>> x.shape
   ()
>> y = nan_to_num(x)
>> x.shape
   (1,)

I moved y = x.copy() further up and switched x's to y's. Here's what
it looks like:

def nan_to_num(x):
is_scalar = False
if not isinstance(x, _nx.ndarray):
   x = asarray(x)
   if x.shape == ():
   # Must return this as a scalar later.
   is_scalar = True
y = x.copy()
old_shape = y.shape
if y.shape == ():
   # We need element access.
   y.shape = (1,)
t = y.dtype.type
if issubclass(t, _nx.complexfloating):
return nan_to_num(y.real) + 1j * nan_to_num(y.imag)
if issubclass(t, _nx.inexact):
are_inf = isposinf(y)
are_neg_inf = isneginf(y)
are_nan = isnan(y)
maxf, minf = _getmaxmin(y.dtype.type)
y[are_nan] = 0
y[are_inf] = maxf
y[are_neg_inf] = minf
if is_scalar:
y = y[0]
else:
y.shape = old_shape
return y
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 18:03, Keith Goodman  wrote:

> Ack! The "if issubclass(t, _nx.inexact)" fix doesn't work. It solves
> the bool problem but it introduces its own problem since numpy.object_
> is not a subclass of inexact:
>
>>> nan_to_num([np.inf])
>   array([ Inf])

Right. This is the problem I was referring to: "I think that sequences should be
coerced to arrays for output and this check should be more explicit
about what it handles. [1.0] will have a problem if you don't."

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 17:44, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern  wrote:
>> On Fri, Dec 11, 2009 at 16:09, Keith Goodman  wrote:
>>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
 On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  
> wrote:

>> So I agree that it should leave the input untouched when a non-float
>> dtype is used for some array-like input.
>
> Would only one line need to be changed? Would changing
>
> if not issubclass(t, _nx.integer):
>
> to
>
> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>
> do the trick?

 That still leaves strings, voids, and objects. I recommend:

  if issubclass(t, _nx.inexact):

 Arguably, one should handle nan float objects in object arrays and
 float columns in structured arrays, but the current code does not
 handle either of those anyways.
>>>
>>> Without your change both
>>>
> np.nan_to_num(np.array([True, False]))
> np.nan_to_num([1])
>>>
>>> raise exceptions. With your change:
>>>
> np.nan_to_num(np.array([True, False]))
>>>   array([ True, False], dtype=bool)
> np.nan_to_num([1])
>>>   array([1])
>>
>> I think this is correct, though the latter one happens by accident.
>> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
>> checked and happens to be object_. The latter line is intended to
>> handle scalars, not sequences. I think that sequences should be
>> coerced to arrays for output and this check should be more explicit
>> about what it handles. [1.0] will have a problem if you don't.
>
> That makes sense. But I'm not smart enough to implement it.

Something like the following at the top should help distinguish the
various cases.:

is_scalar = False
if not isinstance(x, _nx.ndarray):
x = np.asarray(x)
if x.shape == ():
# Must return this as a scalar later.
is_scalar = True
old_shape = x.shape
if x.shape == ():
# We need element access.
x.shape = (1,)
t = x.dtype.type

This should allow one to pass in [np.inf] and have it correctly get
interpreted as a float array rather than an object scalar.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 3:44 PM, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern  wrote:
>> On Fri, Dec 11, 2009 at 16:09, Keith Goodman  wrote:
>>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
 On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  
> wrote:

>> So I agree that it should leave the input untouched when a non-float
>> dtype is used for some array-like input.
>
> Would only one line need to be changed? Would changing
>
> if not issubclass(t, _nx.integer):
>
> to
>
> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>
> do the trick?

 That still leaves strings, voids, and objects. I recommend:

  if issubclass(t, _nx.inexact):

 Arguably, one should handle nan float objects in object arrays and
 float columns in structured arrays, but the current code does not
 handle either of those anyways.
>>>
>>> Without your change both
>>>
> np.nan_to_num(np.array([True, False]))
> np.nan_to_num([1])
>>>
>>> raise exceptions. With your change:
>>>
> np.nan_to_num(np.array([True, False]))
>>>   array([ True, False], dtype=bool)
> np.nan_to_num([1])
>>>   array([1])
>>
>> I think this is correct, though the latter one happens by accident.
>> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
>> checked and happens to be object_. The latter line is intended to
>> handle scalars, not sequences. I think that sequences should be
>> coerced to arrays for output and this check should be more explicit
>> about what it handles. [1.0] will have a problem if you don't.
>
> That makes sense. But I'm not smart enough to implement it.
>
>>> On a separate note, this seems a little awkward:
>>>
> np.nan_to_num(1.0)
>>>   1.0
> np.nan_to_num(1)
>>>   array(1)
> x = np.ones(1, dtype=np.int)
> np.nan_to_num(x[0])
>>>   1
>>
>> Worth fixing.
>
> Would this work?
>
> def nan_to_num(x):
>    try:
>        t = x.dtype.type
>    except AttributeError:
>        t = obj2sctype(type(x))
>    if issubclass(t, _nx.complexfloating):
>        return nan_to_num(x.real) + 1j * nan_to_num(x.imag)
>    else:
>        try:
>            y = x.copy()
>        except AttributeError:
>            y = array(x)
>    if not y.shape:
>        y = array([x])
>        scalar = True
>    else:
>        scalar = False
>    if issubclass(t, _nx.inexact):
>        are_inf = isposinf(y)
>        are_neg_inf = isneginf(y)
>        are_nan = isnan(y)
>        maxf, minf = _getmaxmin(y.dtype.type)
>        y[are_nan] = 0
>        y[are_inf] = maxf
>        y[are_neg_inf] = minf
>    if scalar:
>        y = y[0]
>    return y
>
> Instead of
>
>>> nan_to_num(1.0)
>   1.0
>>> nan_to_num(1)
>   array(1)
>>> nan_to_num(np.array(1.0))
>   1.0
>>> nan_to_num(np.array(1))
>   array(1)
>
> it gives
>
>>> nan_to_num(1.0)
>   1.0
>>> nan_to_num(1)
>   1
>>> nan_to_num(np.array(1.0))
>   1.0
>>> nan_to_num(np.array(1))
>   1
>
> I guess a lot of unit tests need to be written before nan_to_num can
> be fixed. But for now, your bool fix is an improvement.

Ack! The "if issubclass(t, _nx.inexact)" fix doesn't work. It solves
the bool problem but it introduces its own problem since numpy.object_
is not a subclass of inexact:

>> nan_to_num([np.inf])
   array([ Inf])

Yeah, way too many special cases here to do this without full unit
test coverage.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 2:22 PM, Robert Kern  wrote:
> On Fri, Dec 11, 2009 at 16:09, Keith Goodman  wrote:
>> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
>>> On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
 On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  wrote:
>>>
> So I agree that it should leave the input untouched when a non-float
> dtype is used for some array-like input.

 Would only one line need to be changed? Would changing

 if not issubclass(t, _nx.integer):

 to

 if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):

 do the trick?
>>>
>>> That still leaves strings, voids, and objects. I recommend:
>>>
>>>  if issubclass(t, _nx.inexact):
>>>
>>> Arguably, one should handle nan float objects in object arrays and
>>> float columns in structured arrays, but the current code does not
>>> handle either of those anyways.
>>
>> Without your change both
>>
 np.nan_to_num(np.array([True, False]))
 np.nan_to_num([1])
>>
>> raise exceptions. With your change:
>>
 np.nan_to_num(np.array([True, False]))
>>   array([ True, False], dtype=bool)
 np.nan_to_num([1])
>>   array([1])
>
> I think this is correct, though the latter one happens by accident.
> Lists don't have a .dtype attribute so obj2sctype(type([1])) is
> checked and happens to be object_. The latter line is intended to
> handle scalars, not sequences. I think that sequences should be
> coerced to arrays for output and this check should be more explicit
> about what it handles. [1.0] will have a problem if you don't.

That makes sense. But I'm not smart enough to implement it.

>> On a separate note, this seems a little awkward:
>>
 np.nan_to_num(1.0)
>>   1.0
 np.nan_to_num(1)
>>   array(1)
 x = np.ones(1, dtype=np.int)
 np.nan_to_num(x[0])
>>   1
>
> Worth fixing.

Would this work?

def nan_to_num(x):
try:
t = x.dtype.type
except AttributeError:
t = obj2sctype(type(x))
if issubclass(t, _nx.complexfloating):
return nan_to_num(x.real) + 1j * nan_to_num(x.imag)
else:
try:
y = x.copy()
except AttributeError:
y = array(x)
if not y.shape:
y = array([x])
scalar = True
else:
scalar = False
if issubclass(t, _nx.inexact):
are_inf = isposinf(y)
are_neg_inf = isneginf(y)
are_nan = isnan(y)
maxf, minf = _getmaxmin(y.dtype.type)
y[are_nan] = 0
y[are_inf] = maxf
y[are_neg_inf] = minf
if scalar:
y = y[0]
return y

Instead of

>> nan_to_num(1.0)
   1.0
>> nan_to_num(1)
   array(1)
>> nan_to_num(np.array(1.0))
   1.0
>> nan_to_num(np.array(1))
   array(1)

it gives

>> nan_to_num(1.0)
   1.0
>> nan_to_num(1)
   1
>> nan_to_num(np.array(1.0))
   1.0
>> nan_to_num(np.array(1))
   1

I guess a lot of unit tests need to be written before nan_to_num can
be fixed. But for now, your bool fix is an improvement.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 16:09, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
>> On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
>>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  wrote:
>>
 So I agree that it should leave the input untouched when a non-float
 dtype is used for some array-like input.
>>>
>>> Would only one line need to be changed? Would changing
>>>
>>> if not issubclass(t, _nx.integer):
>>>
>>> to
>>>
>>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>>>
>>> do the trick?
>>
>> That still leaves strings, voids, and objects. I recommend:
>>
>>  if issubclass(t, _nx.inexact):
>>
>> Arguably, one should handle nan float objects in object arrays and
>> float columns in structured arrays, but the current code does not
>> handle either of those anyways.
>
> Without your change both
>
>>> np.nan_to_num(np.array([True, False]))
>>> np.nan_to_num([1])
>
> raise exceptions. With your change:
>
>>> np.nan_to_num(np.array([True, False]))
>   array([ True, False], dtype=bool)
>>> np.nan_to_num([1])
>   array([1])

I think this is correct, though the latter one happens by accident.
Lists don't have a .dtype attribute so obj2sctype(type([1])) is
checked and happens to be object_. The latter line is intended to
handle scalars, not sequences. I think that sequences should be
coerced to arrays for output and this check should be more explicit
about what it handles. [1.0] will have a problem if you don't.

> On a separate note, this seems a little awkward:
>
>>> np.nan_to_num(1.0)
>   1.0
>>> np.nan_to_num(1)
>   array(1)
>>> x = np.ones(1, dtype=np.int)
>>> np.nan_to_num(x[0])
>   1

Worth fixing.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 1:14 PM, Robert Kern  wrote:
> On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
>> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  wrote:
>
>>> So I agree that it should leave the input untouched when a non-float
>>> dtype is used for some array-like input.
>>
>> Would only one line need to be changed? Would changing
>>
>> if not issubclass(t, _nx.integer):
>>
>> to
>>
>> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>>
>> do the trick?
>
> That still leaves strings, voids, and objects. I recommend:
>
>  if issubclass(t, _nx.inexact):
>
> Arguably, one should handle nan float objects in object arrays and
> float columns in structured arrays, but the current code does not
> handle either of those anyways.

Without your change both

>> np.nan_to_num(np.array([True, False]))
>> np.nan_to_num([1])

raise exceptions. With your change:

>> np.nan_to_num(np.array([True, False]))
   array([ True, False], dtype=bool)
>> np.nan_to_num([1])
   array([1])

On a separate note, this seems a little awkward:

>> np.nan_to_num(1.0)
   1.0
>> np.nan_to_num(1)
   array(1)
>> x = np.ones(1, dtype=np.int)
>> np.nan_to_num(x[0])
   1
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 14:41, Keith Goodman  wrote:
> On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  wrote:

>> So I agree that it should leave the input untouched when a non-float
>> dtype is used for some array-like input.
>
> Would only one line need to be changed? Would changing
>
> if not issubclass(t, _nx.integer):
>
> to
>
> if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):
>
> do the trick?

That still leaves strings, voids, and objects. I recommend:

  if issubclass(t, _nx.inexact):

Arguably, one should handle nan float objects in object arrays and
float columns in structured arrays, but the current code does not
handle either of those anyways.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 12:08 PM, Bruce Southey  wrote:
> On 12/11/2009 01:33 PM, Robert Kern wrote:
>> On Fri, Dec 11, 2009 at 13:11, Bruce Southey  wrote:
>>
>>
>>> As documented, nan_to_num returns a float so it does not return the
>>> input unchanged.
>>>
> Sorry for my mistake:
> Given an int input, np.nan_to_num returns an int dtype
>  >>> np.nan_to_num(np.zeros((3,3), dtype=np.int)).dtype
> dtype('int64')
>
>> I think that is describing the current behavior rather than
>> documenting the intent of the function. Given the high level purpose
>> of the function, to "[r]eplace nan with zero and inf with finite
>> numbers," I think it is fairly reasonable to implement it as a no-op
>> for integers and related dtypes. There are no nans or infs for those
>> dtypes so the input can be passed back unchanged.
>>
>
> So I agree that it should leave the input untouched when a non-float
> dtype is used for some array-like input.

Would only one line need to be changed? Would changing

if not issubclass(t, _nx.integer):

to

if not issubclass(t, _nx.integer) and not issubclass(t, _nx.bool_):

do the trick?

Here's nan_to_num for reference:

def nan_to_num(x):
try:
t = x.dtype.type
except AttributeError:
t = obj2sctype(type(x))
if issubclass(t, _nx.complexfloating):
return nan_to_num(x.real) + 1j * nan_to_num(x.imag)
else:
try:
y = x.copy()
except AttributeError:
y = array(x)
if not issubclass(t, _nx.integer):
if not y.shape:
y = array([x])
scalar = True
else:
scalar = False
are_inf = isposinf(y)
are_neg_inf = isneginf(y)
are_nan = isnan(y)
maxf, minf = _getmaxmin(y.dtype.type)
y[are_nan] = 0
y[are_inf] = maxf
y[are_neg_inf] = minf
if scalar:
y = y[0]
return y
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Bruce Southey
On 12/11/2009 01:33 PM, Robert Kern wrote:
> On Fri, Dec 11, 2009 at 13:11, Bruce Southey  wrote:
>
>
>> As documented, nan_to_num returns a float so it does not return the
>> input unchanged.
>>  
Sorry for my mistake:
Given an int input, np.nan_to_num returns an int dtype
 >>> np.nan_to_num(np.zeros((3,3), dtype=np.int)).dtype
dtype('int64')

> I think that is describing the current behavior rather than
> documenting the intent of the function. Given the high level purpose
> of the function, to "[r]eplace nan with zero and inf with finite
> numbers," I think it is fairly reasonable to implement it as a no-op
> for integers and related dtypes. There are no nans or infs for those
> dtypes so the input can be passed back unchanged.
>

So I agree that it should leave the input untouched when a non-float 
dtype is used for some array-like input.

Bruce


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Robert Kern
On Fri, Dec 11, 2009 at 13:11, Bruce Southey  wrote:

> As documented, nan_to_num returns a float so it does not return the
> input unchanged.

I think that is describing the current behavior rather than
documenting the intent of the function. Given the high level purpose
of the function, to "[r]eplace nan with zero and inf with finite
numbers," I think it is fairly reasonable to implement it as a no-op
for integers and related dtypes. There are no nans or infs for those
dtypes so the input can be passed back unchanged.

> That is the output of np.nan_to_num(np.zeros((3,3))) is a float array
> not an int array. This is also why np.finfo() fails because it is not
> give a float (that is, it also gives the same output if the argument to
> np.finfo() is an int rather than an boolean type).
>
> I am curious why do you expect this conversion to work given how Python
> defines boolean types
> (http://docs.python.org/library/stdtypes.html#boolean-values).
>
> It is ambiguous to convert from boolean to float since anything that is
> not zero is 'True' and that NaN is not zero:
>  >>> bool(np.PINF)
> True
>  >>> bool(np.NINF)
> True
>  >>> bool(np.NaN)
> True
>  >>> bool(np.PZERO)
> False
>  >>> bool(np.NZERO)
> False

No, that's the other way around, converting floats to bools.
Converting bools to floats is trivial: True->1.0, False->0.0.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth."
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Bruce Southey
On 12/11/2009 10:21 AM, Keith Goodman wrote:
> On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier
>   wrote:
>
>> Hello,
>>
>> Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using
>> nan_to_num on a bool array, is that the expected behavior ?
>>
>>
>>  
> import numpy
> Z = numpy.zeros((3,3),dtype=bool)
> numpy.nan_to_num(Z)
>
>> Traceback (most recent call last):
>>   File "", line 1, in
>>   File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
>> 374, in nan_to_num
>> maxf, minf = _getmaxmin(y.dtype.type)
>>   File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
>> 307, in _getmaxmin
>> f = getlimits.finfo(t)
>>   File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line
>> 103, in __new__
>> raise ValueError, "data type %r not inexact" % (dtype)
>> ValueError: data type  not inexact
>>  
> I guess a check for bool could be added at the top of nan_to_num. If
> the input x is a bool then nan_to_num would just return x unchanged.
> Or perhaps
>
> maxf, minf = _getmaxmin(y.dtype.type)
>
> could return False, True.
>
> Best bet is probably to file a ticket. And then pray.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

As documented, nan_to_num returns a float so it does not return the 
input unchanged.
That is the output of np.nan_to_num(np.zeros((3,3))) is a float array 
not an int array. This is also why np.finfo() fails because it is not 
give a float (that is, it also gives the same output if the argument to 
np.finfo() is an int rather than an boolean type).

I am curious why do you expect this conversion to work given how Python 
defines boolean types 
(http://docs.python.org/library/stdtypes.html#boolean-values).

It is ambiguous to convert from boolean to float since anything that is 
not zero is 'True' and that NaN is not zero:
 >>> bool(np.PINF)
True
 >>> bool(np.NINF)
True
 >>> bool(np.NaN)
True
 >>> bool(np.PZERO)
False
 >>> bool(np.NZERO)
False

So what do you behavior do you expect to see?

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] December Webinar: SciPy India with Travis Oliphant

2009-12-11 Thread Amenity Applewhite
Next Friday Enthought will be hosting our monthly Scientific Computing  
with Python Webinar:

Summary of SciPy India
Friday December 18
1pm CST/ 7pm UTC
Register at GoToMeeting
Enthought President Travis Oliphant is currently in Kerala, India as  
the keynote speaker at SciPy India 2009. Due to a training engagement,  
Travis missed SciPy for the first time this summer, so he’s excited  
for this additional opportunity to meet and collaborate with the  
scientific Python community. Speakers at the event include Jarrod  
Millman, David Cournapeau, Christopher Burns, Prabhu Ramachandran, and  
Asokan Pichai — a great group. We’re looking forward to hearing  
Travis’ review of the proceedings.


Hope to see you there!
Enthought Media








___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Francesc Alted
A Friday 11 December 2009 17:36:54 Bruce Southey escrigué:
> On 12/11/2009 10:03 AM, Francesc Alted wrote:
> > A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué:
> >> Jasper van de Gronde wrote:
> >>> Dag Sverre Seljebotn wrote:
>  Jasper van de Gronde wrote:
> > I've attached a test file which shows the problem. It also tries
> > adding columns instead of rows (in case the memory layout is playing
> > tricks), but this seems to make no difference. This is the output I
> > got:
> >
> >  Dot product: 5.188786
> >  Add a row: 8.032767
> >  Add a column: 8.070953
> >
> > Any ideas on why adding a row (or column) of a matrix is slower than
> > computing a matrix product with a similarly sized matrix... (Xi has
> > less columns than Xi2, but just as many rows.)
> 
>  I think we need some numbers to put this into context -- how big are
>  the vectors/matrices? How many iterations was the loop run? If the
>  vectors are small and the loop is run many times, how fast the
>  operation "ought" to be is irrelevant as it would drown in Python
>  overhead.
> >>>
> >>> Originally I had attached a Python file demonstrating the problem, but
> >>> apparently this wasn't accepted by the list. In any case, the matrices
> >>> and vectors weren't too big (60x20), so I tried making them bigger and
> >>> indeed the "fast" version was now considerably faster.
> >>
> >> 60x20 is "nothing", so a full matrix multiplication or a single
> >> matrix-vector probably takes the same time (that is, the difference
> >> between them in itself is likely smaller than the error you make during
> >> measuring).
> >>
> >> In this context, the benchmarks will be completely dominated by the
> >> number of Python calls you make (each, especially taking the slice,
> >> means allocating Python objects, calling a bunch of functions in C, etc.
> >> etc). So it's not that strange, taking a slice isn't free, some Python
> >> objects must be created etc. etc.
> >
> > Yeah, I think taking slices here is taking quite a lot of time:
> >
> > In [58]: timeit E + Xi2[P/2,:]
> > 10 loops, best of 3: 3.95 µs per loop
> >
> > In [59]: timeit E + Xi2[P/2]
> > 10 loops, best of 3: 2.17 µs per loop
> >
> > don't know why the additional ',:' in the slice is taking so much time,
> > but my guess is that passing&  analyzing the second argument
> > (slice(None,None,None)) could be the responsible for the slowdown (but
> > that is taking too much time). Mmh, perhaps it would be worth to study
> > this more carefully so that an optimization could be done in NumPy.
> >
> >> I think the lesson mostly should be that with so little data,
> >> benchmarking becomes a very difficult art.
> >
> > Well, I think it is not difficult, it is just that you are perhaps
> > benchmarking Python/NumPy machinery instead ;-)  I'm curious whether
> > Matlab can do slicing much more faster than NumPy.  Jasper?
> 
> What are using actually trying to test here?

Good question.  I don't know for sure :-)

> I do not see any equivalence in the operations or output here.
> -With your slices you need two dot products but ultimately you are only
> using one for your dot product.
> -There are addition operations on the slices that are not present in the
> dot product.
> -The final E arrays are not the same for all three operations.

I don't understand the ultimate goal of the OP either, but what caught my 
attention was that:

In [74]: timeit Xi2[P/2]
100 loops, best of 3: 278 ns per loop

In [75]: timeit Xi2[P/2,:]
100 loops, best of 3: 1.04 µs per loop

i.e. adding an additional parameter (the ':') to the slice, drives the time to 
run almost 4x slower.  And with this, it is *partially* explained the problem 
exposed by OP, i.e.:

In [77]: timeit np.dot(Xi,w)
10 loops, best of 3: 2.91 µs per loop

In [78]: timeit E + Xi2[P/2]
10 loops, best of 3: 2.05 µs per loop

In [79]: timeit E + Xi2[P/2,:]
10 loops, best of 3: 3.81 µs per loop

But again, don't ask me whether the results are okay or not.  I'm playing here 
the role of a pure computational scientist on a very concrete problem ;-)

> Having said that, the more you can vectorize your function, the more
> efficient it will likely be especially with Atlas etc.

Except if your arrays are small enough, which is the underlying issue here 
IMO.

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Nicolas Rougier

I've created a ticket (#1327).

Nicolas

On Dec 11, 2009, at 17:21 , Keith Goodman wrote:

> On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier
>  wrote:
>> 
>> Hello,
>> 
>> Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using
>> nan_to_num on a bool array, is that the expected behavior ?
>> 
>> 
> import numpy
> Z = numpy.zeros((3,3),dtype=bool)
> numpy.nan_to_num(Z)
>> Traceback (most recent call last):
>>  File "", line 1, in 
>>  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
>> 374, in nan_to_num
>>maxf, minf = _getmaxmin(y.dtype.type)
>>  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
>> 307, in _getmaxmin
>>f = getlimits.finfo(t)
>>  File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line
>> 103, in __new__
>>raise ValueError, "data type %r not inexact" % (dtype)
>> ValueError: data type  not inexact
> 
> I guess a check for bool could be added at the top of nan_to_num. If
> the input x is a bool then nan_to_num would just return x unchanged.
> Or perhaps
> 
> maxf, minf = _getmaxmin(y.dtype.type)
> 
> could return False, True.
> 
> Best bet is probably to file a ticket. And then pray.
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Bruce Southey
On 12/11/2009 10:03 AM, Francesc Alted wrote:
> A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué:
>
>> Jasper van de Gronde wrote:
>>  
>>> Dag Sverre Seljebotn wrote:
>>>
 Jasper van de Gronde wrote:
  
> I've attached a test file which shows the problem. It also tries adding
> columns instead of rows (in case the memory layout is playing tricks),
> but this seems to make no difference. This is the output I got:
>
>  Dot product: 5.188786
>  Add a row: 8.032767
>  Add a column: 8.070953
>
> Any ideas on why adding a row (or column) of a matrix is slower than
> computing a matrix product with a similarly sized matrix... (Xi has
> less columns than Xi2, but just as many rows.)
>
 I think we need some numbers to put this into context -- how big are the
 vectors/matrices? How many iterations was the loop run? If the vectors
 are small and the loop is run many times, how fast the operation "ought"
 to be is irrelevant as it would drown in Python overhead.
  
>>> Originally I had attached a Python file demonstrating the problem, but
>>> apparently this wasn't accepted by the list. In any case, the matrices
>>> and vectors weren't too big (60x20), so I tried making them bigger and
>>> indeed the "fast" version was now considerably faster.
>>>
>> 60x20 is "nothing", so a full matrix multiplication or a single
>> matrix-vector probably takes the same time (that is, the difference
>> between them in itself is likely smaller than the error you make during
>> measuring).
>>
>> In this context, the benchmarks will be completely dominated by the
>> number of Python calls you make (each, especially taking the slice,
>> means allocating Python objects, calling a bunch of functions in C, etc.
>> etc). So it's not that strange, taking a slice isn't free, some Python
>> objects must be created etc. etc.
>>  
> Yeah, I think taking slices here is taking quite a lot of time:
>
> In [58]: timeit E + Xi2[P/2,:]
> 10 loops, best of 3: 3.95 µs per loop
>
> In [59]: timeit E + Xi2[P/2]
> 10 loops, best of 3: 2.17 µs per loop
>
> don't know why the additional ',:' in the slice is taking so much time, but my
> guess is that passing&  analyzing the second argument (slice(None,None,None))
> could be the responsible for the slowdown (but that is taking too much time).
> Mmh, perhaps it would be worth to study this more carefully so that an
> optimization could be done in NumPy.
>
>
>> I think the lesson mostly should be that with so little data,
>> benchmarking becomes a very difficult art.
>>  
> Well, I think it is not difficult, it is just that you are perhaps
> benchmarking Python/NumPy machinery instead ;-)  I'm curious whether Matlab
> can do slicing much more faster than NumPy.  Jasper?
>
>
What are using actually trying to test here?
I do not see any equivalence in the operations or output here.
-With your slices you need two dot products but ultimately you are only 
using one for your dot product.
-There are addition operations on the slices that are not present in the 
dot product.
-The final E arrays are not the same for all three operations.

Having said that, the more you can vectorize your function, the more 
efficient it will likely be especially with Atlas etc.

Bruce
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Keith Goodman
On Fri, Dec 11, 2009 at 12:50 AM, Nicolas Rougier
 wrote:
>
> Hello,
>
> Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using
> nan_to_num on a bool array, is that the expected behavior ?
>
>
 import numpy
 Z = numpy.zeros((3,3),dtype=bool)
 numpy.nan_to_num(Z)
> Traceback (most recent call last):
>  File "", line 1, in 
>  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
> 374, in nan_to_num
>    maxf, minf = _getmaxmin(y.dtype.type)
>  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
> 307, in _getmaxmin
>    f = getlimits.finfo(t)
>  File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line
> 103, in __new__
>    raise ValueError, "data type %r not inexact" % (dtype)
> ValueError: data type  not inexact

I guess a check for bool could be added at the top of nan_to_num. If
the input x is a bool then nan_to_num would just return x unchanged.
Or perhaps

maxf, minf = _getmaxmin(y.dtype.type)

could return False, True.

Best bet is probably to file a ticket. And then pray.
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [Nipy-devel] Impossibility to build nipy on recent numpy?

2009-12-11 Thread Gael Varoquaux
On Thu, Dec 10, 2009 at 05:19:24PM +0100, Gael Varoquaux wrote:
> On Thu, Dec 10, 2009 at 10:17:43AM -0600, Robert Kern wrote:
> > > OK, so we need to bug report to ubuntu. Anybody feels like doing it, or
> > > do I need to go ahead :).

> > It's your problem. :-)

> That's kinda what I thought. I was just try to dump work on someone else
> :). I'll do it.

OK, done:

https://bugs.launchpad.net/ubuntu/+source/python-numpy/+bug/495537

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Francesc Alted
A Friday 11 December 2009 16:44:29 Dag Sverre Seljebotn escrigué:
> Jasper van de Gronde wrote:
> > Dag Sverre Seljebotn wrote:
> >> Jasper van de Gronde wrote:
> >>> I've attached a test file which shows the problem. It also tries adding
> >>> columns instead of rows (in case the memory layout is playing tricks),
> >>> but this seems to make no difference. This is the output I got:
> >>>
> >>> Dot product: 5.188786
> >>> Add a row: 8.032767
> >>> Add a column: 8.070953
> >>>
> >>> Any ideas on why adding a row (or column) of a matrix is slower than
> >>> computing a matrix product with a similarly sized matrix... (Xi has
> >>> less columns than Xi2, but just as many rows.)
> >>
> >> I think we need some numbers to put this into context -- how big are the
> >> vectors/matrices? How many iterations was the loop run? If the vectors
> >> are small and the loop is run many times, how fast the operation "ought"
> >> to be is irrelevant as it would drown in Python overhead.
> >
> > Originally I had attached a Python file demonstrating the problem, but
> > apparently this wasn't accepted by the list. In any case, the matrices
> > and vectors weren't too big (60x20), so I tried making them bigger and
> > indeed the "fast" version was now considerably faster.
> 
> 60x20 is "nothing", so a full matrix multiplication or a single
> matrix-vector probably takes the same time (that is, the difference
> between them in itself is likely smaller than the error you make during
> measuring).
> 
> In this context, the benchmarks will be completely dominated by the
> number of Python calls you make (each, especially taking the slice,
> means allocating Python objects, calling a bunch of functions in C, etc.
> etc). So it's not that strange, taking a slice isn't free, some Python
> objects must be created etc. etc.

Yeah, I think taking slices here is taking quite a lot of time:

In [58]: timeit E + Xi2[P/2,:]
10 loops, best of 3: 3.95 µs per loop

In [59]: timeit E + Xi2[P/2]
10 loops, best of 3: 2.17 µs per loop

don't know why the additional ',:' in the slice is taking so much time, but my 
guess is that passing & analyzing the second argument (slice(None,None,None)) 
could be the responsible for the slowdown (but that is taking too much time).  
Mmh, perhaps it would be worth to study this more carefully so that an 
optimization could be done in NumPy.

> I think the lesson mostly should be that with so little data,
> benchmarking becomes a very difficult art.

Well, I think it is not difficult, it is just that you are perhaps 
benchmarking Python/NumPy machinery instead ;-)  I'm curious whether Matlab 
can do slicing much more faster than NumPy.  Jasper?

-- 
Francesc Alted
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Dag Sverre Seljebotn
Jasper van de Gronde wrote:
> Dag Sverre Seljebotn wrote:
>   
>> Jasper van de Gronde wrote:
>> 
>>> I've attached a test file which shows the problem. It also tries adding
>>> columns instead of rows (in case the memory layout is playing tricks),
>>> but this seems to make no difference. This is the output I got:
>>>
>>> Dot product: 5.188786
>>> Add a row: 8.032767
>>> Add a column: 8.070953
>>>
>>> Any ideas on why adding a row (or column) of a matrix is slower than
>>> computing a matrix product with a similarly sized matrix... (Xi has less
>>> columns than Xi2, but just as many rows.)
>>>   
>>>   
>> I think we need some numbers to put this into context -- how big are the 
>> vectors/matrices? How many iterations was the loop run? If the vectors 
>> are small and the loop is run many times, how fast the operation "ought" 
>> to be is irrelevant as it would drown in Python overhead.
>> 
>
> Originally I had attached a Python file demonstrating the problem, but 
> apparently this wasn't accepted by the list. In any case, the matrices 
> and vectors weren't too big (60x20), so I tried making them bigger and 
> indeed the "fast" version was now considerably faster.
>   
60x20 is "nothing", so a full matrix multiplication or a single 
matrix-vector probably takes the same time (that is, the difference 
between them in itself is likely smaller than the error you make during 
measuring).

In this context, the benchmarks will be completely dominated by the 
number of Python calls you make (each, especially taking the slice, 
means allocating Python objects, calling a bunch of functions in C, etc. 
etc). So it's not that strange, taking a slice isn't free, some Python 
objects must be created etc. etc.

I think the lesson mostly should be that with so little data, 
benchmarking becomes a very difficult art.

Dag Sverre

> But still, this seems like a very odd difference. I know Python is an 
> interpreted language and has a lot of overhead, but still, selecting a 
> row/column shouldn't be THAT slow, should it? To be clear, this is the 
> code I used for testing:
> 
> import timeit
>
> setupCode = """
> import numpy as np
>
> P = 60
> N = 20
>
> Xi = np.random.standard_normal((P,N))
> w = np.random.standard_normal((N))
> Xi2 = np.dot(Xi,Xi.T)
> E = np.dot(Xi,w)
> """
>
> N = 1
>
> dotProduct = timeit.Timer('E = np.dot(Xi,w)',setupCode)
> additionRow = timeit.Timer('E += Xi2[P/2,:]',setupCode)
> additionCol = timeit.Timer('E += Xi2[:,P/2]',setupCode)
> print "Dot product: %f" % dotProduct.timeit(N)
> print "Add a row: %f" % additionRow.timeit(N)
> print "Add a column: %f" % additionCol.timeit(N)
> 
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>   

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Jasper van de Gronde
Dag Sverre Seljebotn wrote:
> Jasper van de Gronde wrote:
>> I've attached a test file which shows the problem. It also tries adding
>> columns instead of rows (in case the memory layout is playing tricks),
>> but this seems to make no difference. This is the output I got:
>>
>> Dot product: 5.188786
>> Add a row: 8.032767
>> Add a column: 8.070953
>>
>> Any ideas on why adding a row (or column) of a matrix is slower than
>> computing a matrix product with a similarly sized matrix... (Xi has less
>> columns than Xi2, but just as many rows.)
>>   
> I think we need some numbers to put this into context -- how big are the 
> vectors/matrices? How many iterations was the loop run? If the vectors 
> are small and the loop is run many times, how fast the operation "ought" 
> to be is irrelevant as it would drown in Python overhead.

Originally I had attached a Python file demonstrating the problem, but 
apparently this wasn't accepted by the list. In any case, the matrices 
and vectors weren't too big (60x20), so I tried making them bigger and 
indeed the "fast" version was now considerably faster.

But still, this seems like a very odd difference. I know Python is an 
interpreted language and has a lot of overhead, but still, selecting a 
row/column shouldn't be THAT slow, should it? To be clear, this is the 
code I used for testing:

import timeit

setupCode = """
import numpy as np

P = 60
N = 20

Xi = np.random.standard_normal((P,N))
w = np.random.standard_normal((N))
Xi2 = np.dot(Xi,Xi.T)
E = np.dot(Xi,w)
"""

N = 1

dotProduct = timeit.Timer('E = np.dot(Xi,w)',setupCode)
additionRow = timeit.Timer('E += Xi2[P/2,:]',setupCode)
additionCol = timeit.Timer('E += Xi2[:,P/2]',setupCode)
print "Dot product: %f" % dotProduct.timeit(N)
print "Add a row: %f" % additionRow.timeit(N)
print "Add a column: %f" % additionCol.timeit(N)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Dag Sverre Seljebotn
Jasper van de Gronde wrote:
> (Resending without attachment as I don't think my previous message arrived.)
>
> I just started using numpy and am very, very pleased with the
> functionality and cleanness so far. However, I tried what I though would
> be a simple optimization and found that the opposite was true.
> Specifically, I had a loop where something like this was done:
>
> w += Xi[mu,:]
> E = np.dot(Xi,w)
>
> Instead of repeatedly doing the matrix product I thought I'd do the
> matrix product just once, before the loop, compute the product
> np.dot(Xi,Xi.T) and then do:
>
> w += Xi[mu,:]
> E += Xi2[mu,:]
>
> Seems like a clear winner, instead of doing a matrix multiplication it
> simply has to sum two vectors (in-place). However, it turned out to be
> 1.5 times SLOWER...
>
> I've attached a test file which shows the problem. It also tries adding
> columns instead of rows (in case the memory layout is playing tricks),
> but this seems to make no difference. This is the output I got:
>
> Dot product: 5.188786
> Add a row: 8.032767
> Add a column: 8.070953
>
> Any ideas on why adding a row (or column) of a matrix is slower than
> computing a matrix product with a similarly sized matrix... (Xi has less
> columns than Xi2, but just as many rows.)
>   
I think we need some numbers to put this into context -- how big are the 
vectors/matrices? How many iterations was the loop run? If the vectors 
are small and the loop is run many times, how fast the operation "ought" 
to be is irrelevant as it would drown in Python overhead.

Dag Sverre
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Slicing slower than matrix multiplication?

2009-12-11 Thread Jasper van de Gronde
(Resending without attachment as I don't think my previous message arrived.)

I just started using numpy and am very, very pleased with the
functionality and cleanness so far. However, I tried what I though would
be a simple optimization and found that the opposite was true.
Specifically, I had a loop where something like this was done:

w += Xi[mu,:]
E = np.dot(Xi,w)

Instead of repeatedly doing the matrix product I thought I'd do the
matrix product just once, before the loop, compute the product
np.dot(Xi,Xi.T) and then do:

w += Xi[mu,:]
E += Xi2[mu,:]

Seems like a clear winner, instead of doing a matrix multiplication it
simply has to sum two vectors (in-place). However, it turned out to be
1.5 times SLOWER...

I've attached a test file which shows the problem. It also tries adding
columns instead of rows (in case the memory layout is playing tricks),
but this seems to make no difference. This is the output I got:

Dot product: 5.188786
Add a row: 8.032767
Add a column: 8.070953

Any ideas on why adding a row (or column) of a matrix is slower than
computing a matrix product with a similarly sized matrix... (Xi has less
columns than Xi2, but just as many rows.)

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] nan_to_num and bool arrays

2009-12-11 Thread Nicolas Rougier

Hello,

Using both numpy 1.3.0 and 1.4.0rc1 I got the following exception using
nan_to_num on a bool array, is that the expected behavior ?


>>> import numpy
>>> Z = numpy.zeros((3,3),dtype=bool)
>>> numpy.nan_to_num(Z)
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
374, in nan_to_num
maxf, minf = _getmaxmin(y.dtype.type)
  File "/usr/lib/python2.6/dist-packages/numpy/lib/type_check.py", line
307, in _getmaxmin
f = getlimits.finfo(t)
  File "/usr/lib/python2.6/dist-packages/numpy/core/getlimits.py", line
103, in __new__
raise ValueError, "data type %r not inexact" % (dtype)
ValueError: data type  not inexact



Nicolas

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion