En Thu, 03 Apr 2008 19:27:47 -0300, <[EMAIL PROTECTED]> escribió: > Hi all, > > I've been playing around with the identity function id() for different > types of objects, and I think I understand its behaviour when it comes > to objects like lists and tuples in which case an assignment r2 = r1 > (r1 refers to an existing object) creates an alias r2 that refers to > the same object as r1. In this case id(r1) == id(r2) (or, if you > like: r1 is r2). However for r1, r2 assigned as follows: r1 = [1, 2, > 3] and r2 = [1, 2, 3], (r1 is r2) is False, even if r1==r2, > etc. ...this is all very well. Therefore, it seems that id(r) can be > interpreted as the address of the object that 'r' refers to. > > My observations of its behaviour when comparing ints, floats and > strings have raised some questions in my mind, though. Consider the > following examples: > > ######################################################################### > > # (1) turns out to be true > a = 10 > b = 10 > print a is b
...only because CPython happens to cache small integers and return always the same object. Try again with 10000. This is just an optimization and the actual range of cached integer, or whether they are cached at all, is implementation (and version) dependent. (As integers are immutable, the optimization *can* be done, but that doesn't mean that all immutable objects are always shared). > # (2) turns out to be false > f = 10.0 > g = 10.0 > print f is g Because the above optimization isn't used for floats. The `is` operator checks object identity: whether both operands are the very same object (*not* a copy, or being equal: the *same* object) ("identity" is a primitive concept) The only way to guarantee that you are talking of the same object, is using a reference to a previously created object. That is: a = some_arbitrary_object b = a assert a is b The name `b` now refers to the same object as name `a`; the assertion holds for whatever object it is. In other cases, like (1) and (2) above, the literals are just handy constructors for int and float objects. You have two objects constructed (a and b, f and g). Whether they are identical or not is not defined; they might be the same, or not, depending on unknown factors that might include the moon phase; both alternatives are valid Python. > # (3) checking if ids of all list elements are the same for different > cases: > > a = 3*[1]; areAllElementsEqual([id(i) for i in a]) # True > b = [1, 1, 1]; areAllElementsEqual([id(i) for i in b]) # True > f = 3*[1.0]; areAllElementsEqual([id(i) for i in f]) # True > g = [1.0, 1.0, 1.0]; areAllElementsEqual([id(i) for i in g]) # True > g1 = [1.0, 1.0, 0.5+0.5]; areAllElementsEqual([id(i) for i in g1]) # > False Again, this is implementation dependent. If you try with a different Python version or a different implementation you may get other results - and that doesn't mean that any of them is broken. > # (4) two equal floats defined inside a function body behave > differently than case (1): > > def func(): > f = 10.0 > g = 10.0 > return f is g > > print func() # True Another implementation detail related to co_consts. You shouldn't rely on it. > I didn't mention any examples with strings; they behaved like ints > with respect to their id properties for all the cases I tried. You didn't try hard enough :) py> x = "abc" py> y = ''.join(x) py> x == y True py> x is y False Long strings behave like big integers: they aren't cached: py> x = "a rather long string, full of garbage. No, this isn't garbage, just non sense text to fill space." py> y = "a rather long string, full of garbage. No, this isn't garbage, just non sense text to fill space." py> x == y True py> x is y False As always: you have two statements constructing two objects. Whether they return the same object or not, it's not defined. > While I have no particular qualms about the behaviour, I have the > following questions: > > 1) Which of the above behaviours are reliable? For example, does a1 = > a2 for ints and strings always imply that a1 is a2? If you mean: a1 = something a2 = a1 a1 is a2 then, from my comments above, you should be able to answer: yes, always, not restricted to ints and strings. If you mean: a1 = someliteral a2 = someliteral a1 is a2 then: no, it isn't guaranteed at all, nor even for small integers or strings. > 2) From the programmer's perspective, are ids of ints, floats and > string of any practical significance at all (since these types are > immutable)? The same significance as id() of any other object... mostly, none, except for debugging purposes. > 3) Does the behaviour of ids for lists and tuples of the same element > (of type int, string and sometimes even float), imply that the tuple a > = (1,) takes (nearly) the same storage space as a = 10000*(1,)? (What > about a list, where elements can be changed at will?) That's a different thing. A tuple maintains only references to its elements (as any other object in Python). The memory required for a tuple (I'm talking of CPython exclusively) is: (a small header) + n * sizeof(pointer). So the expression 10000*(anything,) will take more space than the singleton (anything,) because the former requires space for 10000 pointers and the latter just one. You have to take into account the memory for the elements themselves; but in both cases there is a *single* object referenced, so it doesn't matter. Note that it doesn't matter whether that single element is an integer, a string, mutable or immutable object: it's always the same object, already existing, and creating that 10000-uple just increments its reference count by 10000. The situation is similar for lists, except that being mutable containers, they're over-allocated (to have room for future expansion). So the list [anything]*10000 has a size somewhat larger than 10000*sizeof(pointer); its (only) element increments its reference count by 10000. -- Gabriel Genellina -- http://mail.python.org/mailman/listinfo/python-list