Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Lisandro Dalcin
You, know, float are inmutable objects, and then 'float(f)' just
returns a new reference to 'f' is 'f' is (exactly) of type 'float'

In [1]: f = 1.234
In [2]: f is float(f)
Out[2]: True

I do not remember right now the implementations of comparisons in core
Python, but I believe the 'in' operator is testing first for object
identity, and then 'np.nan in [np.nan]' then returns True, and then
the fact that 'np.nan==np.nan' returns False is never considered.

On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote:
 Might someone explain this to me?

  x = [1.,np.nan]
  np.nan in x
 True
  np.nan in np.array(x)
 False
  np.nan in np.array(x).tolist()
 False
  np.nan is float(np.nan)
 True

 Thank you,
 Alan Isaac


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion




-- 
Lisandro Dalcín
---
Centro Internacional de Métodos Computacionales en Ingeniería (CIMEC)
Instituto de Desarrollo Tecnológico para la Industria Química (INTEC)
Consejo Nacional de Investigaciones Científicas y Técnicas (CONICET)
PTLC - Güemes 3450, (3000) Santa Fe, Argentina
Tel/Fax: +54-(0)342-451.1594
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Alan G Isaac
  On Fri, Sep 19, 2008 at 1:59 PM, Alan G Isaac [EMAIL PROTECTED] wrote:
  Might someone explain this to me?
 
   x = [1.,np.nan]
   np.nan in x
  True
   np.nan in np.array(x)
  False
   np.nan in np.array(x).tolist()
  False
   np.nan is float(np.nan)
  True


On 9/19/2008 1:15 PM Lisandro Dalcin apparently wrote:
 I do not remember right now the implementations of comparisons in core
 Python, but I believe the 'in' operator is testing first for object
 identity, and then 'np.nan in [np.nan]' then returns True, and then
 the fact that 'np.nan==np.nan' returns False is never considered.


Sure.  All evaluations to True make sense to me.
I am asking about the ones that evaluate to False.
Thanks,
Alan

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Christopher Barker
Alan G Isaac wrote:
 Might someone explain this to me?
 
   x = [1.,np.nan]
   np.nan in x
  True
   np.nan in np.array(x)
  False
   np.nan in np.array(x).tolist()
  False
   np.nan is float(np.nan)
  True

not quite -- but I do know that is is tricky -- it tests object 
identity. I think it actually compares the pointer to the object. What 
makes this tricky is that python interns some objects, so that when you 
create two that have the same value, they may actually be the same object:

  s1 = this
  s2 = this
  s1 is s2

True

So short strings are interned, as are small integers and maybe floats? 
However, longer strings are not:

  s1 = A much longer string
  s2 = A much longer string
  s1 is  s2
False

I don't know the interning rules, but I do know that you should never 
count on them, then may not be consistent between implementations, or 
even different runs.

NaN is a floating point number with a specific value. np.nan is 
particular instance of that, but not all nans will be the same instance:

  np.array(0.0) / 0
nan
  np.array(0.0) / 0 is np.nan
False

So you can't use is to check.

  np.array(0.0) / 0 == np.nan
False

and you can't use ==

The only way to do it reliably is:

  np.isnan(np.array(0.0) / 0)
True


So, the short answer is that the only way to deal with NaNs properly is 
to have NaN-aware functions, like nanmin() and friends.


Regardless of how man nan* functions get written, or what exactly they 
do, we really do need to make sure that no numpy function gives bogus 
results in the presence of NaNs, which doesn't appear to be the case now.

I also think I see a consensus building that non-nan-specific numpy 
functions should either preserve NaN's or raise exceptions, rather than 
ignoring them.

-Chris








-- 
Christopher Barker, Ph.D.
Oceanographer

Emergency Response Division
NOAA/NOS/ORR(206) 526-6959   voice
7600 Sand Point Way NE   (206) 526-6329   fax
Seattle, WA  98115   (206) 526-6317   main reception

[EMAIL PROTECTED]
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Andrew Dalke
On Sep 19, 2008, at 7:52 PM, Christopher Barker wrote:
 I don't know the interning rules, but I do know that you should never
 count on them, then may not be consistent between implementations, or
 even different runs.

There are a few things that Python-the-language guarantees are singleton
objects which can be compared correctly with is.  Those are:

   True, False, None

Otherwise there is no guarantee that two objects of a given type
which are equal in some sense of the word, are actually the same  
object.

As Chris pointed out, the C implementation does (as a performance
matter) have additional singletons.  For example, the integers between
-5 to 257 are also singletons


#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS   257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS   5
#endif
/* References to small integers are saved in this array so that they
can be shared.
The integers that are saved are those in the range
-NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyIntObject *small_ints[NSMALLNEGINTS + NSMALLPOSINTS];


This used to be -1 to 100 but some testing showed it was better
to extend the range somewhat.

There was also some performance testing about special-casing 0.0
and +/- 1.0 but I think it showed the results weren't worthwhile.


So, back to NaN.  There's no guarantee NaN is a singleton
object, so testing with is almost certainly is wrong.
In fact, at the bit-level there are multiple NaNs.  A
NaN (according to Wikipedia) fits the following bit pattern.

   NaN: xaxx. x = undefined. If a = 1,

   it is a quiet NaN, otherwise it is a signalling NaN.


So  
and 1110
and 1100

are all NaN values.



Andrew
[EMAIL PROTECTED]


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Christian Heimes
Andrew Dalke wrote:
 There are a few things that Python-the-language guarantees are singleton
 objects which can be compared correctly with is.  Those are:
 
True, False, None

The empty tuple () and all interned strings are also guaranteed to be 
singletons. String interning is used to optimize code on C level. It's 
much faster to compare memory addresses than objects. All strings can be 
interned through the builtin function intern like s = intern(s). For 
Python 3.x the function was moved in the the sys module and changed to 
support str which are PyUnicode objects.


 So, back to NaN.  There's no guarantee NaN is a singleton
 object, so testing with is almost certainly is wrong.
 In fact, at the bit-level there are multiple NaNs.  A
 NaN (according to Wikipedia) fits the following bit pattern.
 
NaN: xaxx. x = undefined. If a = 1,
 
it is a quiet NaN, otherwise it is a signalling NaN.

The definition is correct for all doubles on IEEE 754 aware platforms. 
Python's float type uses the double C type. Almost all modern computers 
have either hardware IEEE 754 support or software support for embedded 
devices (some mobile phones and PDAs). 
http://en.wikipedia.org/wiki/IEEE_754-1985

The Python core makes no difference between quiet NaNs and signaling 
NaNs. Only errno, input and output values are checked to raise an 
exception. We were discussion the possibility of a NaN singleton during 
our revamp of Python's IEEE 754 and math support for Python 2.6 and 3.0. 
But we decided against it because the extra code and cost wasn't worth 
the risks. Instead I added isnan() and isinf() to the math module.

All checks for NaN, inf and the sign bit of a float must be made through 
the appropriate APIs - either the NumPy API or the new APIs for floats.

Hope to shed some light on things
Christian

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] np.nan and ``is``

2008-09-19 Thread Andrew Dalke
On Sep 19, 2008, at 10:04 PM, Christian Heimes wrote:
 Andrew Dalke wrote:
 There are a few things that Python-the-language guarantees are  
 singleton
 objects which can be compared correctly with is.

 The empty tuple () and all interned strings are also guaranteed to be
 singletons.

Where's the guarantee?  As far as I know it's not part of
Python-the-language, and I thought it was only an implementation
detail of CPython.

tupleobject.c says:

PyTuple_Fini(void)
{
#if PyTuple_MAXSAVESIZE  0
 /* empty tuples are used all over the place and applications  
may
  * rely on the fact that an empty tuple is a singleton. */
 Py_XDECREF(free_list[0]);
 free_list[0] = NULL;

 (void)PyTuple_ClearFreeList();
#endif
}

but that doesn't hold under Jython 2.2a1:


Jython 2.2a1 on java1.4.2_16 (JIT: null)
Type copyright, credits or license for more information.
  () is ()
0
  1 is 1
1



 String interning is used to optimize code on C level. It's
 much faster to compare memory addresses than objects. All strings  
 can be
 interned through the builtin function intern like s = intern(s). For
 Python 3.x the function was moved in the the sys module and changed to
 support str which are PyUnicode objects.

intern being listed in the documentation under
 http://docs.python.org/lib/non-essential-built-in-funcs.html

 2.2 Non-essential Built-in Functions

 There are several built-in functions that are no longer

 essential to learn, know or use in modern Python programming.

 They have been kept here to maintain backwards compatibility

 with programs written for older versions of Python.




Again, I think this is only an aspect of the CPython implementation.



 The Python core makes no difference between quiet NaNs and signaling
 NaNs.

Based on my limited readings just now, it seems that that's the general
consensus:

   http://www.open-std.org/jtc1/sc22/wg14/www/docs/n965.htm
   Standard C only adopted Quiet NaNs. It did not adopt Signaling
   NaNs because it was believed that they are of too limited
   utility for the amount of work required.

   http://www.digitalmars.com/d/archives/digitalmars/D/ 
signaling_NaNs_and_quiet_NaNs_75844.html
   Signaling NaNs have fallen out of favor. No exceptions get raised  
for them.

   http://en.wikipedia.org/wiki/NaN
   There were questions about if signalling NaNs should continue  
to be
   required in the revised standard. In the end it appears they will
   be left in.



 We were discussion the possibility of a NaN singleton during
 our revamp of Python's IEEE 754 and math support for Python 2.6 and  
 3.0.
 But we decided against it because the extra code and cost wasn't worth
 the risks. Instead I added isnan() and isinf() to the math module.

I couldn't find that thread.  What are the advantages of converting
all NaNs to a singleton?  All I can come up with are disadvantages.

BTW, another place to look is the Decimal module

  import decimal
  decimal.Decimal(nan)
Decimal(NaN)
 

Looking at the decimal docs now I see a canonical() method which

The result has the same value as the operand but always
uses a canonical encoding. The definition of canonical
is implementation-defined; if more than one internal
encoding for a given NaN, Infinity, or finite number
is possible then one ‘preferred’ encoding is deemed
canonical. This operation then returns the value using
that preferred encoding.



Andrew
[EMAIL PROTECTED]


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion