On 07/11/2011 04:21 PM William ML Leslie wrote:
On 11 July 2011 23:21, Bengt Richter<b...@oz.net>  wrote:
On 07/11/2011 01:36 PM William ML Leslie wrote:

On 11 July 2011 20:29, Bengt Richter<b...@oz.net>    wrote:

On 07/10/2011 09:13 PM Laura Creighton wrote:

What do we want to happen when somebody -- say in a C extension -- takes
the id of an object
that is scheduled to be removed when the gc next runs?

IMO taking the id should increment the object ref counter
and prevent the garbage collection, until the id value itself is garbage
collected.

This significantly changes the meaning of id() in a way that will
break existing code.

Do you have an example of existing code that depends on the integer-cast
value of a dangling pointer??

I mean that id creating a reference will break existing code.  id()
has always returned an integer, and the existence of some integer in
some python code has never prevented some otherwise unrelated object
from being collected.  Existing code will not make sure that it cleans
up the return value of id(), as nowhere has id() ever kept a reference
to the object passed in.
Ok, d'oh ;-/

I was focused on making sure the id value "referred" to an existing live object
*when returned from id* (it is of course live when passed to id while bound in
id's argument -- but if that is the *only* binding, then the object is 
*guaranteed*
to be garbage when id returns the integer, and thus that integer is IMO 
meaningless
except as a debugging peek at implementation, and it would be an *error* for a 
program
to depend on its value.

[10:12 ~]$ python -c 'import this'|grep -A1 Errors
Errors should never pass silently.
Unless explicitly silenced.

You are right that existing code could and some probably would break if id 
guarantees
validity of the integer by holding the object, so I will go with the first 
alternative
I mentioned in my reply to Armin, and focus on preventing return of the id of 
garbage
rather than the "or else..." option which is impractical and is likely to break 
code, as you say.

<excerpt pasted as quote>
Letting the expression result die and returning a kind of pointer
to where the result object *was* seems like a dangling pointer problem,
except I guess you can't dereference an id value (without hackery).

Maybe id should raise an exception if the argument referenced only has
a ref count of 1 (i.e., just the reference from the argument list)?

Or else let id be a class and return a minimal instance only binding
the passed object, and customize the compare ops to take into account
type diffs etc.? Then there would be no id values without corresponding
objects, and id values used in expressions would die a natural death,
along with their references to their objects -- whether "variables"
or expressions.

Sorry to belabor the obvious ;-)
</excerpt>

Rather than exception, perhaps returning a None would suffice, analogous
to a null pointer where no valid pointer can be returned. That should be cheap.

It could also be used in answer to Laura's question, to which I only proposed
the impractical id object.


I know that you are suggesting that id returns something that is /not/
an integer, but that is also a language change.  People have always
been able to assume that they can % format ids as decimals or
hexadecimals.
I thought of subclassing int, but was reaching for an id abstraction more
than a practical thing, sorry. But never mind to id-as-object idea for current 
python ;-)

Or do you mean that id's must be allowed to be compared == to integers,
which my example prohibits? (I didn't define __cmp__, BTW, just lazy ;-)

Good, __cmp__ has been deprecated for over 10 years now.

The only sensible sort on id's I can think of off hand would be if id's carried
a time stamp.

If you want an object reference, just use one.  If you want them to be
persistent, build a dictionary from id to object.

Yes, dictionary is one way to bind an object and thus make sure its id is
valid.

But it would be overkill to use a dictionary to guarantee object id
persistence
just for the duration of an expression such as id(x.a) == id(y.a)

But id is not about persistence. The lack of persistence is one of its
key features.

That said, I do think id()'s current behaviour is overkill.  I just
don't think we can change it in a way that will fit existing usage.
And cleaning it up properly is far too much work.

How about just returning None when id sees an object which no other
code will be able to see when id returns (hence making the integer
the id of garbage)?

<snip>

The definition of id(), according to docs.python.org, is:

Return the “identity” of an object. This is an integer (or long
integer) which is guaranteed to be unique and constant for this object
during its lifetime. Two objects with non-overlapping lifetimes may
have the same id() value.

Hm, I couldn't find that, googling <a few strings from the above> 
site:python.org
Nor at site:docs.python.org. Maybe from a non-current version of docs? But 
never mind.

Also, a new id could live alongside the old ;-)

It's just that the problems you are attempting to fix are already
solved, and they are only vaguely related to what a python programmer
understands id() to mean.  If, according to cpython, "1003 is not 1000
+ 3", then programmers can't rely on any excellent new behaviour for
id() *anyway*.
My question to Armin was whether doing what cpython 2.7 does meant following
the vagaries of possible optimizations. E.g., if space for constants were
slightly modified, cpython would return False for "1003 is not 1000 +3".
1000+3 is apparently already folded to a constant 1003, but apparently
local constants are currently allowed to be duplicated, as you see in
in the disassembly of your example:

>>> from ut.miscutil import disev
>>> 1003 is not 1000 + 3
True
>>> disev("1003 is not 1000 + 3")
  1           0 LOAD_CONST               0 (1003)
              3 LOAD_CONST               3 (1003)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

It would seem you could generate quite a few equivalent constants:
>>> disev('[1000+3,1000+3,1000+3,1000+3,1000+3]')
  1           0 LOAD_CONST               2 (1003)
              3 LOAD_CONST               3 (1003)
              6 LOAD_CONST               4 (1003)
              9 LOAD_CONST               5 (1003)
             12 LOAD_CONST               6 (1003)
             15 BUILD_LIST               5
             18 RETURN_VALUE
which sooner or later someone will probably find a reason to optimize for space,
and what does that mean for the *"language"* definition of id?


OTOH, the "identity may not even be preserved for primitive types"
issue is an observable difference to cpython and is fixable, even if
it is a silly thing to rely on.

Apparently the folding of expressions yielding e.g. small integers involves
generating a reference to the single instance.

Hm. I downloaded pypy and it does optimize constant storage for 1003 is 1000+3

[11:03 ~]$ pypy

pypy: /usr/lib/libcrypto.so.0.9.8: no version information available (required 
by pypy)
pypy: /usr/lib/libssl.so.0.9.8: no version information available (required by 
pypy)
Python 2.7.1 (b590cf6de419, Apr 30 2011, 02:00:38)
[PyPy 1.5.0-alpha0 with GCC 4.4.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
And now for something completely different: ``psyco eats one brain per inch of
progress''
>>>>
>>>> 1003 is 1000+3
True
>>>> from ut.miscutil import disev
>>>> disev('1003 is 1000+3')
  1           0 LOAD_CONST               0 (1003)
              3 LOAD_CONST               0 (1003)
              6 COMPARE_OP               8 (is)
              9 RETURN_VALUE
>>>>

Let's see what the id values are:

>>>> id(1003), id(1000+3)
(-1216202084, -1216202084)
>>>> disev('id(1003), id(1000+3)')
  1           0 LOAD_NAME                0 (id)
              3 LOAD_CONST               0 (1003)
              6 CALL_FUNCTION            1
              9 LOAD_NAME                0 (id)
             12 LOAD_CONST               0 (1003)
             15 CALL_FUNCTION            1
             18 BUILD_TUPLE              2
             21 RETURN_VALUE
>>>>

Vs cpython 2.7.2:

>>> id(1003), id(1000+3)  # different garbage ;-)
(136814932, 136814848)
>>> disev('id(1003), id(1000+3)  # different garbage ;-)')
  1           0 LOAD_NAME                0 (id)
              3 LOAD_CONST               0 (1003)
              6 CALL_FUNCTION            1
              9 LOAD_NAME                0 (id)
             12 LOAD_CONST               3 (1003)
             15 CALL_FUNCTION            1
             18 BUILD_TUPLE              2
             21 RETURN_VALUE
>>>

Of course, the id's are all still id's of garbage locations
once returned from id ;-)

So how about returning None instead of id's of garbage,
or raising an exception? Would that not be pythonic?

Regards,
Bengt Richter

_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Reply via email to