Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-13 Thread Mark Wiebe
On Tue, Apr 12, 2011 at 11:51 AM, Mark Wiebe mwwi...@gmail.com wrote:

 snip

here's the rule for a set of arbitrary arrays (not necessarily just 2):

 - if all the arrays are scalars, do type promotion on the types as is
 - otherwise, do type promotion on min_scalar_type(a) of each array a

 The function min_scalar_type returns the array type if a has = 1
 dimensions, or the smallest type of the same kind (allowing int-uint in the
 case of positive-valued signed integers) to which the value can be cast
 without overflow if a has 0 dimensions.

 The promote_types function used for the type promotion is symmetric and
 associative, so the result won't change when shuffling the inputs. There's a
 bit of a wrinkle in the implementation to handle the fact that the uint type
 values aren't a strict subset of the same-sized int type values, but
 otherwise this is what happens.


 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/convert_datatype.c#L1075

 The change I'm proposing is to modify this as follows:

 - if all the arrays are scalars, do type promotion on the types as is
 - if the maximum kind of all the scalars is  the maximum kind of all the
 arrays, do type promotion on the types as is
 - otherwise, do type promotion on min_scalar_type(a) of each array a

 One case where this may not capture a possible desired semantics is
 [complex128 scalar] * [float32 array] - [complex128]. In this case
 [complex64] may be desired. This is directly analogous to the original
 [float64 scalar] * [int8 array], however, and in the latter case it's clear
 a float64 should result.


I've implemented what I suggested, and improved the documentation to better
explain what's going on. One thing I adjusted slightly is instead of using
the existing kinds, I used three categories: boolean, integer (int/uint),
and floating point (float/complex). This way, we get [float32 array] + 0j
producing a [complex64 array] instead of a [complex128 array], while still
fixing the original regression.

Please review my patch, thanks in advance!

https://github.com/numpy/numpy/pull/73

Cheers,
Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-13 Thread Mark Wiebe
On Wed, Apr 13, 2011 at 3:34 PM, Mark Wiebe mwwi...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 11:51 AM, Mark Wiebe mwwi...@gmail.com wrote:

 snip

  here's the rule for a set of arbitrary arrays (not necessarily just 2):

 - if all the arrays are scalars, do type promotion on the types as is
 - otherwise, do type promotion on min_scalar_type(a) of each array a

 The function min_scalar_type returns the array type if a has = 1
 dimensions, or the smallest type of the same kind (allowing int-uint in the
 case of positive-valued signed integers) to which the value can be cast
 without overflow if a has 0 dimensions.

 The promote_types function used for the type promotion is symmetric and
 associative, so the result won't change when shuffling the inputs. There's a
 bit of a wrinkle in the implementation to handle the fact that the uint type
 values aren't a strict subset of the same-sized int type values, but
 otherwise this is what happens.


 https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/convert_datatype.c#L1075

 The change I'm proposing is to modify this as follows:

 - if all the arrays are scalars, do type promotion on the types as is
 - if the maximum kind of all the scalars is  the maximum kind of all the
 arrays, do type promotion on the types as is
 - otherwise, do type promotion on min_scalar_type(a) of each array a

  One case where this may not capture a possible desired semantics is
 [complex128 scalar] * [float32 array] - [complex128]. In this case
 [complex64] may be desired. This is directly analogous to the original
 [float64 scalar] * [int8 array], however, and in the latter case it's clear
 a float64 should result.


 I've implemented what I suggested, and improved the documentation to better
 explain what's going on. One thing I adjusted slightly is instead of using
 the existing kinds, I used three categories: boolean, integer (int/uint),
 and floating point (float/complex). This way, we get [float32 array] + 0j
 producing a [complex64 array] instead of a [complex128 array], while still
 fixing the original regression.

 Please review my patch, thanks in advance!

 https://github.com/numpy/numpy/pull/73

 Cheers,
 Mark


I forgot to mention that Travis was right, and it wasn't necessary to detect
that the ufunc is a binary operator. I think at some point splitting out
binary operators as a special case of ufuncs is desirable for performance
reasons, but for now this patch is much simpler.

-Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Robert Kern
On Mon, Apr 11, 2011 at 23:43, Mark Wiebe mwwi...@gmail.com wrote:
 On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant oliph...@enthought.com
 wrote:

 It would be good to see a simple test case and understand why the boolean
 multiplied by the scalar double is becoming a float16.     In other words,
  why does
 (1-test)*t
 return a float16 array
 This does not sound right at all and it would be good to understand why
 this occurs, now.   How are you handling scalars multiplied by arrays in
 general?

 The reason it's float16 is that the first function in the multiply function
 list for which both types can be safely cast to the output type,

Except that float64 cannot be safely cast to float16.

 after
 applying the min_scalar_type function to the scalars, is float16.

This is implemented incorrectly, then. It makes no sense for floats,
for which the limiting attribute is precision, not range. For floats,
the result of min_scalar_type should be the type of the object itself,
nothing else. E.g. min_scalar_type(x)==float64 if type(x) is float no
matter what value it has.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Robert Kern
On Tue, Apr 12, 2011 at 11:20, Mark Wiebe mwwi...@gmail.com wrote:
 On Tue, Apr 12, 2011 at 8:24 AM, Robert Kern robert.k...@gmail.com wrote:

 On Mon, Apr 11, 2011 at 23:43, Mark Wiebe mwwi...@gmail.com wrote:
  On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant
  oliph...@enthought.com
  wrote:

  It would be good to see a simple test case and understand why the
  boolean
  multiplied by the scalar double is becoming a float16.     In other
  words,
   why does
  (1-test)*t
  return a float16 array
  This does not sound right at all and it would be good to understand why
  this occurs, now.   How are you handling scalars multiplied by arrays
  in
  general?
 
  The reason it's float16 is that the first function in the multiply
  function
  list for which both types can be safely cast to the output type,

 Except that float64 cannot be safely cast to float16.

 That's true, but it was already being done this way with respect to float32.
 Rereading the documentation for min_scalar_type, I see the explanation could
 elaborate on the purpose of the function further. Float64 cannot be safely
 cast to float32, but this is what NumPy does:
 import numpy as np
 np.__version__
 '1.4.1'
 np.float64(3.5) * np.ones(2,dtype=np.float32)
 array([ 3.5,  3.5], dtype=float32)


You're missing the key part of the rule that numpy uses: for
array*scalar cases, when both array and scalar are the same kind (both
floating point or both integers), then the array dtype always wins.
Only when they are different kinds do you try to negotiate a common
safe type between the scalar and the array.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Mark Wiebe
On Tue, Apr 12, 2011 at 9:30 AM, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 11:20, Mark Wiebe mwwi...@gmail.com wrote:
  On Tue, Apr 12, 2011 at 8:24 AM, Robert Kern robert.k...@gmail.com
 wrote:
 
  On Mon, Apr 11, 2011 at 23:43, Mark Wiebe mwwi...@gmail.com wrote:
   On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant
   oliph...@enthought.com
   wrote:
 
   It would be good to see a simple test case and understand why the
   boolean
   multiplied by the scalar double is becoming a float16. In other
   words,
why does
   (1-test)*t
   return a float16 array
   This does not sound right at all and it would be good to understand
 why
   this occurs, now.   How are you handling scalars multiplied by arrays
   in
   general?
  
   The reason it's float16 is that the first function in the multiply
   function
   list for which both types can be safely cast to the output type,
 
  Except that float64 cannot be safely cast to float16.
 
  That's true, but it was already being done this way with respect to
 float32.
  Rereading the documentation for min_scalar_type, I see the explanation
 could
  elaborate on the purpose of the function further. Float64 cannot be
 safely
  cast to float32, but this is what NumPy does:
  import numpy as np
  np.__version__
  '1.4.1'
  np.float64(3.5) * np.ones(2,dtype=np.float32)
  array([ 3.5,  3.5], dtype=float32)
 

 You're missing the key part of the rule that numpy uses: for
 array*scalar cases, when both array and scalar are the same kind (both
 floating point or both integers), then the array dtype always wins.
 Only when they are different kinds do you try to negotiate a common
 safe type between the scalar and the array.


I'm afraid I'm not seeing the point you're driving at, can you provide some
examples which tease apart these issues? Here's the same example but with
different kinds, and to me it seems to have the same character as the case
with float32/float64:

 np.__version__
'1.4.1'
 1e60*np.ones(2,dtype=np.complex64)
array([ Inf NaNj,  Inf NaNj], dtype=complex64)

 np.__version__
'2.0.0.dev-4cb2eb4'
 1e60*np.ones(2,dtype=np.complex64)
array([  1.e+60+0.j,   1.e+60+0.j])

 -Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Charles R Harris
On Tue, Apr 12, 2011 at 10:49 AM, Mark Wiebe mwwi...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 9:30 AM, Robert Kern robert.k...@gmail.comwrote:

 On Tue, Apr 12, 2011 at 11:20, Mark Wiebe mwwi...@gmail.com wrote:
  On Tue, Apr 12, 2011 at 8:24 AM, Robert Kern robert.k...@gmail.com
 wrote:
 
  On Mon, Apr 11, 2011 at 23:43, Mark Wiebe mwwi...@gmail.com wrote:
   On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant
   oliph...@enthought.com
   wrote:
 
   It would be good to see a simple test case and understand why the
   boolean
   multiplied by the scalar double is becoming a float16. In other
   words,
why does
   (1-test)*t
   return a float16 array
   This does not sound right at all and it would be good to understand
 why
   this occurs, now.   How are you handling scalars multiplied by
 arrays
   in
   general?
  
   The reason it's float16 is that the first function in the multiply
   function
   list for which both types can be safely cast to the output type,
 
  Except that float64 cannot be safely cast to float16.
 
  That's true, but it was already being done this way with respect to
 float32.
  Rereading the documentation for min_scalar_type, I see the explanation
 could
  elaborate on the purpose of the function further. Float64 cannot be
 safely
  cast to float32, but this is what NumPy does:
  import numpy as np
  np.__version__
  '1.4.1'
  np.float64(3.5) * np.ones(2,dtype=np.float32)
  array([ 3.5,  3.5], dtype=float32)
 

 You're missing the key part of the rule that numpy uses: for
 array*scalar cases, when both array and scalar are the same kind (both
 floating point or both integers), then the array dtype always wins.
 Only when they are different kinds do you try to negotiate a common
 safe type between the scalar and the array.


 I'm afraid I'm not seeing the point you're driving at, can you provide some
 examples which tease apart these issues? Here's the same example but with
 different kinds, and to me it seems to have the same character as the case
 with float32/float64:

  np.__version__
 '1.4.1'
  1e60*np.ones(2,dtype=np.complex64)
 array([ Inf NaNj,  Inf NaNj], dtype=complex64)

  np.__version__
 '2.0.0.dev-4cb2eb4'
  1e60*np.ones(2,dtype=np.complex64)
 array([  1.e+60+0.j,   1.e+60+0.j])


IIRC, the behavior with respect to scalars sort of happened in the code on
the fly, so this is a good discussion to have. We should end up with
documented rules and tests to enforce them. I agree with Mark that the tests
have been deficient up to this point.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Charles R Harris
On Tue, Apr 12, 2011 at 10:20 AM, Mark Wiebe mwwi...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 8:24 AM, Robert Kern robert.k...@gmail.comwrote:

 On Mon, Apr 11, 2011 at 23:43, Mark Wiebe mwwi...@gmail.com wrote:
  On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant 
 oliph...@enthought.com
  wrote:

  It would be good to see a simple test case and understand why the
 boolean
  multiplied by the scalar double is becoming a float16. In other
 words,
   why does
  (1-test)*t
  return a float16 array
  This does not sound right at all and it would be good to understand why
  this occurs, now.   How are you handling scalars multiplied by arrays
 in
  general?
 
  The reason it's float16 is that the first function in the multiply
 function
  list for which both types can be safely cast to the output type,

 Except that float64 cannot be safely cast to float16.


 That's true, but it was already being done this way with respect to
 float32. Rereading the documentation for min_scalar_type, I see the
 explanation could elaborate on the purpose of the function further. Float64
 cannot be safely cast to float32, but this is what NumPy does:


Yep, I remember noticing that on occasion. I didn't think it was really the
right thing to do...

snip

Chuck



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Robert Kern
On Tue, Apr 12, 2011 at 12:27, Charles R Harris
charlesr.har...@gmail.com wrote:

 IIRC, the behavior with respect to scalars sort of happened in the code on
 the fly, so this is a good discussion to have. We should end up with
 documented rules and tests to enforce them. I agree with Mark that the tests
 have been deficient up to this point.

It's been documented for a long time now.

http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Charles R Harris
On Tue, Apr 12, 2011 at 11:56 AM, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 12:27, Charles R Harris
 charlesr.har...@gmail.com wrote:

  IIRC, the behavior with respect to scalars sort of happened in the code
 on
  the fly, so this is a good discussion to have. We should end up with
  documented rules and tests to enforce them. I agree with Mark that the
 tests
  have been deficient up to this point.

 It's been documented for a long time now.

 http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules


Nope, the kind stuff is missing. Note the cast to float32 that Mark pointed
out. Also that the casting of python integers depends on their sign and
magnitude.

In [1]: ones(3, '?') + 0
Out[1]: array([1, 1, 1], dtype=int8)

In [2]: ones(3, '?') + 1000
Out[2]: array([1001, 1001, 1001], dtype=int16)


Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Mark Wiebe
On Tue, Apr 12, 2011 at 11:17 AM, Charles R Harris 
charlesr.har...@gmail.com wrote:



 On Tue, Apr 12, 2011 at 11:56 AM, Robert Kern robert.k...@gmail.comwrote:

 On Tue, Apr 12, 2011 at 12:27, Charles R Harris
 charlesr.har...@gmail.com wrote:

  IIRC, the behavior with respect to scalars sort of happened in the code
 on
  the fly, so this is a good discussion to have. We should end up with
  documented rules and tests to enforce them. I agree with Mark that the
 tests
  have been deficient up to this point.

 It's been documented for a long time now.

 http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules


 Nope, the kind stuff is missing. Note the cast to float32 that Mark pointed
 out. Also that the casting of python integers depends on their sign and
 magnitude.

 In [1]: ones(3, '?') + 0
 Out[1]: array([1, 1, 1], dtype=int8)

 In [2]: ones(3, '?') + 1000
 Out[2]: array([1001, 1001, 1001], dtype=int16)


This is the behaviour with master - it's a good idea to cross-check with an
older NumPy. I think we're discussing 3 things here, what NumPy 1.5 and
earlier did, what NumPy 1.6 beta currently does, and what people think NumPy
did. The old implementation had a bit of a spaghetti-factor to it, and had
problems like asymmetry and silent overflows. The new implementation is in
my opinion cleaner and follows well-defined semantics while trying to stay
true to the old implementation. I admit the documentation I wrote doesn't
fully explain them, but here's the rule for a set of arbitrary arrays (not
necessarily just 2):

- if all the arrays are scalars, do type promotion on the types as is
- otherwise, do type promotion on min_scalar_type(a) of each array a

The function min_scalar_type returns the array type if a has = 1
dimensions, or the smallest type of the same kind (allowing int-uint in the
case of positive-valued signed integers) to which the value can be cast
without overflow if a has 0 dimensions.

The promote_types function used for the type promotion is symmetric and
associative, so the result won't change when shuffling the inputs. There's a
bit of a wrinkle in the implementation to handle the fact that the uint type
values aren't a strict subset of the same-sized int type values, but
otherwise this is what happens.

https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/convert_datatype.c#L1075

The change I'm proposing is to modify this as follows:

- if all the arrays are scalars, do type promotion on the types as is
- if the maximum kind of all the scalars is  the maximum kind of all the
arrays, do type promotion on the types as is
- otherwise, do type promotion on min_scalar_type(a) of each array a

One case where this may not capture a possible desired semantics is
[complex128 scalar] * [float32 array] - [complex128]. In this case
[complex64] may be desired. This is directly analogous to the original
[float64 scalar] * [int8 array], however, and in the latter case it's clear
a float64 should result.

-Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Robert Kern
On Tue, Apr 12, 2011 at 11:49, Mark Wiebe mwwi...@gmail.com wrote:
 On Tue, Apr 12, 2011 at 9:30 AM, Robert Kern robert.k...@gmail.com wrote:

 You're missing the key part of the rule that numpy uses: for
 array*scalar cases, when both array and scalar are the same kind (both
 floating point or both integers), then the array dtype always wins.
 Only when they are different kinds do you try to negotiate a common
 safe type between the scalar and the array.

 I'm afraid I'm not seeing the point you're driving at, can you provide some
 examples which tease apart these issues? Here's the same example but with
 different kinds, and to me it seems to have the same character as the case
 with float32/float64:
 np.__version__
 '1.4.1'
 1e60*np.ones(2,dtype=np.complex64)
 array([ Inf NaNj,  Inf NaNj], dtype=complex64)
 np.__version__
 '2.0.0.dev-4cb2eb4'
 1e60*np.ones(2,dtype=np.complex64)
 array([  1.e+60+0.j,   1.e+60+0.j])

The point is that when you multiply an array by a scalar, and the
array-dtype is the same kind as the scalar-dtype, the output dtype is
the array-dtype. That's what gets you the behavior of the
float32-array staying the same when you multiply it with a Python
float(64). min_scalar_type should never be consulted in this case, so
you don't need to try to account for this case in its rules. This
cross-kind example is irrelevant to the point I'm trying to make.

For cross-kind operations, then you do need to find a common output
type that is safe for both array and scalar. However, please keep in
mind that for floating point types, keeping precision is more
important than range!

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-12 Thread Robert Kern
On Tue, Apr 12, 2011 at 13:17, Charles R Harris
charlesr.har...@gmail.com wrote:


 On Tue, Apr 12, 2011 at 11:56 AM, Robert Kern robert.k...@gmail.com wrote:

 On Tue, Apr 12, 2011 at 12:27, Charles R Harris
 charlesr.har...@gmail.com wrote:

  IIRC, the behavior with respect to scalars sort of happened in the code
  on
  the fly, so this is a good discussion to have. We should end up with
  documented rules and tests to enforce them. I agree with Mark that the
  tests
  have been deficient up to this point.

 It's been documented for a long time now.

 http://docs.scipy.org/doc/numpy/reference/ufuncs.html#casting-rules


 Nope, the kind stuff is missing. Note the cast to float32 that Mark pointed
 out.

The float32-array*float64-scalar case? That's covered in the last
paragraph; they are the same kind so array dtype wins.

 Also that the casting of python integers depends on their sign and
 magnitude.

 In [1]: ones(3, '?') + 0
 Out[1]: array([1, 1, 1], dtype=int8)

 In [2]: ones(3, '?') + 1000
 Out[2]: array([1001, 1001, 1001], dtype=int16)

bool and int cross kinds. Not a counterexample. I'm not saying that
the size of the values should be ignored for cross-kind upcasting. I'm
saying that you don't need value-size calculations to preserve the
float32-array*float64-scalar behavior.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Robert Kern
On Mon, Apr 11, 2011 at 13:54, Skipper Seabold jsseab...@gmail.com wrote:
 All,

 We noticed some failing tests for statsmodels between numpy 1.5.1 and
 numpy = 1.6.0. These are the versions where I noticed the change. It
 seems that when you divide a float array and multiply by a boolean
 array the answers are different (unless the others are also off by
 some floating point). Can anyone replicate the following using this
 script or point out what I'm missing?

 import numpy as np
 print np.__version__
 np.random.seed(12345)

 test = np.random.randint(0,2,size=10).astype(bool)
 t = 1.345
 arr = np.random.random(size=10)

 arr # okay
 t/arr # okay
 (1-test)*arr # okay
 (1-test)*t/arr # huh?

[~]
|12 1-test
array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)

[~]
|13 (1-test)*t
array([ 1.34472656,  0.,  0.,  0.,
1.34472656,  0.,
1.34472656,  1.34472656,  0.,  1.34472656], dtype=float16)

Some of the recent ufunc changes or the introduction of the float16
type must have changed the way the dtype is chosen for the
int8-array*float64-scalar case. Previously, the result was a float64
array, as expected.

Mark, I expect this is a result of one of your changes. Can you take a
look at this?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Charles R Harris
On Mon, Apr 11, 2011 at 12:54 PM, Skipper Seabold jsseab...@gmail.comwrote:

 All,

 We noticed some failing tests for statsmodels between numpy 1.5.1 and
 numpy = 1.6.0. These are the versions where I noticed the change. It
 seems that when you divide a float array and multiply by a boolean
 array the answers are different (unless the others are also off by
 some floating point). Can anyone replicate the following using this
 script or point out what I'm missing?

 import numpy as np
 print np.__version__
 np.random.seed(12345)

 test = np.random.randint(0,2,size=10).astype(bool)
 t = 1.345
 arr = np.random.random(size=10)

 arr # okay
 t/arr # okay
 (1-test)*arr # okay
 (1-test)*t/arr # huh?

 Python 2.6.5 (r265:79063, Apr 16 2010, 13:57:41)
 [GCC 4.4.3] on linux2
 Type help, copyright, credits or license for more information.
  import numpy as np
  print np.__version__
 2.0.0.dev-fe3852f
  np.random.seed(12345)
 
  test = np.random.randint(0,2,size=10).astype(bool)
  t = 1.345
  arr = np.random.random(size=10)
 
  arr # okay
 array([ 0.5955447 ,  0.96451452,  0.6531771 ,  0.74890664,  0.65356987,
0.74771481,  0.96130674,  0.0083883 ,  0.10644438,  0.29870371])
  t/arr # okay
 array([   2.25843668,1.39448393,2.05916589,1.7959515 ,
  2.0579284 ,1.79881418,1.39913718,  160.342421  ,
 12.63570742,4.50278968])
  (1-test)*arr # okay
 array([ 0.5955447 ,  0.,  0.,  0.,  0.65356987,
0.,  0.96130674,  0.0083883 ,  0.,  0.29870371])
  (1-test)*t/arr # huh?
 array([   2.25797754,0.,0.,0.,
  2.05751003,0.,1.39885274,  160.3098235 ,
  0.,4.50187427])


Looks like it is going through float16:

In [2]: float16(1.345)/0.5955447
Out[2]: 2.25797754979601

The scalar 1.345 should be converted to double and that isn't happening.

snip

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Mark Wiebe
On Mon, Apr 11, 2011 at 12:37 PM, Robert Kern robert.k...@gmail.com wrote:

 On Mon, Apr 11, 2011 at 13:54, Skipper Seabold jsseab...@gmail.com
 wrote:
  All,
 
  We noticed some failing tests for statsmodels between numpy 1.5.1 and
  numpy = 1.6.0. These are the versions where I noticed the change. It
  seems that when you divide a float array and multiply by a boolean
  array the answers are different (unless the others are also off by
  some floating point). Can anyone replicate the following using this
  script or point out what I'm missing?
 
  import numpy as np
  print np.__version__
  np.random.seed(12345)
 
  test = np.random.randint(0,2,size=10).astype(bool)
  t = 1.345
  arr = np.random.random(size=10)
 
  arr # okay
  t/arr # okay
  (1-test)*arr # okay
  (1-test)*t/arr # huh?

 [~]
 |12 1-test
 array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)

 [~]
 |13 (1-test)*t
 array([ 1.34472656,  0.,  0.,  0.,
 1.34472656,  0.,
1.34472656,  1.34472656,  0.,  1.34472656], dtype=float16)

 Some of the recent ufunc changes or the introduction of the float16
 type must have changed the way the dtype is chosen for the
 int8-array*float64-scalar case. Previously, the result was a float64
 array, as expected.

 Mark, I expect this is a result of one of your changes. Can you take a
 look at this?


It's the type promotion rules change that is causing this, and it's
definitely a 1.6.0 release blocker. Good catch!

I can think of an easy, reasonable way to fix it in the result_type
function, but since the ufuncs themselves use a poor method of resolving the
types, it will take some thought to apply the same idea there. Maybe
detecting that the ufunc is a binary op at its creation time, setting a
flag, and using the result_type function in that case. This would have the
added bonus of being faster, too.

Going back to the 1.5 code isn't an option, since adding the new data types
while maintaining ABI compatibility required major surgery to this part of
the system.

-Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Charles R Harris
On Mon, Apr 11, 2011 at 2:31 PM, Mark Wiebe mwwi...@gmail.com wrote:

 On Mon, Apr 11, 2011 at 12:37 PM, Robert Kern robert.k...@gmail.comwrote:

 On Mon, Apr 11, 2011 at 13:54, Skipper Seabold jsseab...@gmail.com
 wrote:
  All,
 
  We noticed some failing tests for statsmodels between numpy 1.5.1 and
  numpy = 1.6.0. These are the versions where I noticed the change. It
  seems that when you divide a float array and multiply by a boolean
  array the answers are different (unless the others are also off by
  some floating point). Can anyone replicate the following using this
  script or point out what I'm missing?
 
  import numpy as np
  print np.__version__
  np.random.seed(12345)
 
  test = np.random.randint(0,2,size=10).astype(bool)
  t = 1.345
  arr = np.random.random(size=10)
 
  arr # okay
  t/arr # okay
  (1-test)*arr # okay
  (1-test)*t/arr # huh?

 [~]
 |12 1-test
 array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)

 [~]
 |13 (1-test)*t
 array([ 1.34472656,  0.,  0.,  0.,
 1.34472656,  0.,
1.34472656,  1.34472656,  0.,  1.34472656], dtype=float16)

 Some of the recent ufunc changes or the introduction of the float16
 type must have changed the way the dtype is chosen for the
 int8-array*float64-scalar case. Previously, the result was a float64
 array, as expected.

 Mark, I expect this is a result of one of your changes. Can you take a
 look at this?


 It's the type promotion rules change that is causing this, and it's
 definitely a 1.6.0 release blocker. Good catch!

 I can think of an easy, reasonable way to fix it in the result_type
 function, but since the ufuncs themselves use a poor method of resolving the
 types, it will take some thought to apply the same idea there. Maybe
 detecting that the ufunc is a binary op at its creation time, setting a
 flag, and using the result_type function in that case. This would have the
 added bonus of being faster, too.

 Going back to the 1.5 code isn't an option, since adding the new data types
 while maintaining ABI compatibility required major surgery to this part of
 the system.


There isn't a big rush here, so take the time work it through.

Chuck
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Travis Oliphant

On Apr 11, 2011, at 3:55 PM, Charles R Harris wrote:

 
 
 On Mon, Apr 11, 2011 at 2:31 PM, Mark Wiebe mwwi...@gmail.com wrote:
 On Mon, Apr 11, 2011 at 12:37 PM, Robert Kern robert.k...@gmail.com wrote:
 On Mon, Apr 11, 2011 at 13:54, Skipper Seabold jsseab...@gmail.com wrote:
  All,
 
  We noticed some failing tests for statsmodels between numpy 1.5.1 and
  numpy = 1.6.0. These are the versions where I noticed the change. It
  seems that when you divide a float array and multiply by a boolean
  array the answers are different (unless the others are also off by
  some floating point). Can anyone replicate the following using this
  script or point out what I'm missing?
 
  import numpy as np
  print np.__version__
  np.random.seed(12345)
 
  test = np.random.randint(0,2,size=10).astype(bool)
  t = 1.345
  arr = np.random.random(size=10)
 
  arr # okay
  t/arr # okay
  (1-test)*arr # okay
  (1-test)*t/arr # huh?
 
 [~]
 |12 1-test
 array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)
 
 [~]
 |13 (1-test)*t
 array([ 1.34472656,  0.,  0.,  0.,
 1.34472656,  0.,
1.34472656,  1.34472656,  0.,  1.34472656], dtype=float16)
 
 Some of the recent ufunc changes or the introduction of the float16
 type must have changed the way the dtype is chosen for the
 int8-array*float64-scalar case. Previously, the result was a float64
 array, as expected.
 
 Mark, I expect this is a result of one of your changes. Can you take a
 look at this?
 
 It's the type promotion rules change that is causing this, and it's 
 definitely a 1.6.0 release blocker. Good catch!
 
 I can think of an easy, reasonable way to fix it in the result_type function, 
 but since the ufuncs themselves use a poor method of resolving the types, it 
 will take some thought to apply the same idea there. Maybe detecting that the 
 ufunc is a binary op at its creation time, setting a flag, and using the 
 result_type function in that case. This would have the added bonus of being 
 faster, too.
 
 Going back to the 1.5 code isn't an option, since adding the new data types 
 while maintaining ABI compatibility required major surgery to this part of 
 the system.
 
 
 There isn't a big rush here, so take the time work it through.

I agree with Charles.   Let's take the needed time and work this through.   
This is the sort of thing I was a bit nervous about with the changes made to 
the casting rules.Right now, I'm not confident that the scalar-array 
casting rules are being handled correctly. 

From your explanation, I don't understand how you intend to solve the problem.  
 Surely there is a better approach than special casing the binary op and this 
work-around flag (and why is the result_type even part of the discussion)? 

It would be good to see a simple test case and understand why the boolean 
multiplied by the scalar double is becoming a float16. In other words,  why 
does

(1-test)*t 

return a float16 array

This does not sound right at all and it would be good to understand why this 
occurs, now.   How are you handling scalars multiplied by arrays in general?  


-Travis


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy 1.5.1

2011-04-11 Thread Mark Wiebe
On Mon, Apr 11, 2011 at 8:48 PM, Travis Oliphant oliph...@enthought.comwrote:


 On Apr 11, 2011, at 3:55 PM, Charles R Harris wrote:
 snip
 I agree with Charles.   Let's take the needed time and work this through.
 This is the sort of thing I was a bit nervous about with the changes made to
 the casting rules.Right now, I'm not confident that the scalar-array
 casting rules are being handled correctly.


I would suggest viewing this as having exposed a problem in the NumPy test
suite. All necessary type promotion behaviors should have tests for them,
and I added tests since before 1.6 there were none. This case, where the
scalar type has a greater kind than the array type, was missed and had no
preexisting tests. If the test suite is thorough, you can be confident that
the scalar-array casting rules are being handled right.

Here's where to add more tests. Both np.add and np.result_type are validated
with the same set of tests.

https://github.com/numpy/numpy/blob/master/numpy/core/tests/test_numeric.py#L345


 From your explanation, I don't understand how you intend to solve the
 problem.   Surely there is a better approach than special casing the binary
 op and this work-around flag (and why is the result_type even part of the
 discussion)?


Yes, there's definitely a better approach, but I don't see one without
changing the ufunc API. In other libraries and programming languages, type
promotion is defined with a set of explicit promotion rules, but since the
current ufuncs have a list of functions and do linear searches in those
lists to find a matching loop, defining and implementing such an explicit
rule set is more difficult.

The reason to bring the numpy.result_type function into it is that it
cleanly encapsulates the NumPy type-promotion rules. A good design needs a
single, consistent way to handle type promotion, so that it can be validated
to be correct in a single place. I created the numpy.promote_types,
numpy.min_scalar_type, and numpy.result_type functions for this purpose, and
the numpy.nditer uses this API to determine the output type when it
allocates a requested array for output. By fixing result_type, then having
the ufunc binary operations use it, we can be confident that the type
promotion will be consistent.

It would be good to see a simple test case and understand why the boolean
 multiplied by the scalar double is becoming a float16. In other words,
  why does

 (1-test)*t

 return a float16 array

 This does not sound right at all and it would be good to understand why
 this occurs, now.   How are you handling scalars multiplied by arrays in
 general?


The reason it's float16 is that the first function in the multiply function
list for which both types can be safely cast to the output type, after
applying the min_scalar_type function to the scalars, is float16. The change
I'm suggesting is to remove the min_scalar_type part if the kind of the
scalar type is higher than the kind of the array type. Maybe it can be
done without switching the ufunc to use result_type in 1.6, but result_type
needs to be fixed as well to make the nditer and anything else that wants to
follow the ufunc rules be consistent.

I created a ticket for the issue here:
http://projects.scipy.org/numpy/ticket/1798

Cheers,
Mark
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion