Re: [pypy-dev] Object identity and dict strategies

Bengt Richter Sun, 10 Jul 2011 06:50:32 -0700

On 07/09/2011 10:17 AM Armin Rigo wrote:

Hi,


On Sat, Jul 9, 2011 at 5:20 AM, William ML Leslie
<william.leslie....@gmail.com>  wrote:

My point about small integers (...)


I think that your point about small integers is broken (even assuming
that smallints are enabled by default, which is not the case).  It
means that we'd get an id() function which "behaves" as long as taking
the id() of a 31-bit signed integer, and then doesn't behave as
expected neither for full 32-bit integers, nor for longs, nor for
floats.

This happens to work for larger integers and floats, because the id
will be preserved as long as a reference to them exists by their
boxedness.
Floats could use a similar mechanism to integers, eg, their bit
representation shifted left two and bitwise-ored with 2.


I don't understand these two sentences, because they seem to say the
exact opposite of each other...

  That does mean that id() is no longer word-sized, but it does not make it
unbounded.


The "unbounded" part in my e-mail was about longs.  Obviously if you
are computing id(x) where x is in some finite set (like ints or
floats), then the results are in some finite set too.

What is keeping us from using the underlying rpython string as the
basis for id?


This is probably a good enough workaround for strings and unicodes.

Speaking of, maybe it'd be easier to attempt to get the identity dict
into the language proper.


We tried at some point, but python-dev refused the idea.  Maybe the
idea has more chances for approval now that we can really show with
performance numbers that it's a deep issue, as opposed to just wave
our hands.  Feel free to try again.  In the meantime I've decided, at
least myself, to stick with the approach that Python is whatever is in
2.7, and that we have to work around such issues instead of fixing
them properly in the language.

Does that mean you have to follow the vagaries of the 2.7 compiler optimizations
(or lack of them, as the case may be) that make for results like these?:
(note fresh 2.7.2 build below ;-)
[& BTW kudos to those who made it easy with the tarball and config, make]

[10:30 ~]$ python
Python 2.7.2 (default, Jul  8 2011, 23:38:53)
[GCC 4.1.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
  >>> from ut.miscutil import disev,disex
  >>> id(2000) == id(2000)
True
  >>> id(2000) == id(20*100)
False
  >>> disev('id(2000) == id(2000)')
    1           0 LOAD_NAME                0 (id)
                3 LOAD_CONST               0 (2000)
                6 CALL_FUNCTION            1
                9 LOAD_NAME                0 (id)
               12 LOAD_CONST               0 (2000)
               15 CALL_FUNCTION            1
               18 COMPARE_OP               2 (==)
               21 RETURN_VALUE
  >>> disev('id(2000) == id(20*100)')
    1           0 LOAD_NAME                0 (id)
                3 LOAD_CONST               0 (2000)
                6 CALL_FUNCTION            1
                9 LOAD_NAME                0 (id)
               12 LOAD_CONST               3 (2000)
               15 CALL_FUNCTION            1
               18 COMPARE_OP               2 (==)
               21 RETURN_VALUE
  >>>

Notice that 20*100 was folded to 2000, but a different constant was generated,
hence, presumably, the different id. Will the next update of 2.7 do the same?

BTW, disev above is just from a grabbag module of mine, amounting to

def disev(s):
      import dis
      return dis.dis(compile(s,'disev','eval'))

It doesn't have to be about integers:

  >>> id([]) == id([])
True
  >>> id(list()) == id(list())
False
  >>> disev('id([]) == id([])')
    1           0 LOAD_NAME                0 (id)
                3 BUILD_LIST               0
                6 CALL_FUNCTION            1
                9 LOAD_NAME                0 (id)
               12 BUILD_LIST               0
               15 CALL_FUNCTION            1
               18 COMPARE_OP               2 (==)
               21 RETURN_VALUE
  >>> disev('id(list()) == id(list())')
    1           0 LOAD_NAME                0 (id)
                3 LOAD_NAME                1 (list)
                6 CALL_FUNCTION            0
                9 CALL_FUNCTION            1
               12 LOAD_NAME                0 (id)
               15 LOAD_NAME                1 (list)
               18 CALL_FUNCTION            0
               21 CALL_FUNCTION            1
               24 COMPARE_OP               2 (==)
               27 RETURN_VALUE
  >>>

Of course, the is operator call will keep both references alive
on the stack, so whereas you get

 >>> id([]) == id([])
True
 >>> id([]), id([])
(3082823052L, 3082823052L)

Keeping the arguments alive simultaneously gives

 >>> (lambda x,y:(id(x),id(y)))([],[])
(3082823052L, 3082881516L)
 >>> (lambda x,y:(id(x)==id(y)))([],[])
False
 >>>

... assuming the lambda above approximates what `is' does.

Using id comparisons instead of `is' can look superficially a bit weird:

 >>> id([1]) == id([2])
True
 >>>

Bottom line: id does not seem to have a complete abstract definition[1][2],
so how may one define a 'correct' implementation?

ISTM id now (CPython 2.7.2) is more like a version-dependent debugger function
that can be useful in app hacks, but with caveats ;-)

Regards,
Bengt Richter
[1] http://docs.python.org/reference/datamodel.html#index-824
[2] http://docs.python.org/library/functions.html#id

PS. How does the following cachingid.py compare
to your idea of what id should ideally do? (not tested except example)
(Obviously it is just a concept implementation, not resource efficient)
________________________________________

refval = id # for using old id
def id(x, objcache = {}):
    if type(x) in (int,float,bool,tuple,str,unicode): # etc
        t = (type(x), x)
        return refval(objcache.setdefault(t, t)[1])
    else:
        return refval(x) # as now? XXX what is abstract meaning of 
id(some_mutable)?
________________________________________

It tries to return the ordinary id of the first instance of equivalent 
immutable objects
passed to it, and otherwise uses the old id, which I'm not finished thinking 
about ;-)

At least it fixes the difference between id(2000) and id(20*100):

>>> oldid=id
>>> from ut.cachingid import id # ut is just my utility collection
>>> oldid(2000)==oldid(2000)
True
>>> oldid(2000)==oldid(20*100)
False
>>> id(2000)==id(2000)
True
>>> id(2000)==id(20*100)
True
>>> id.func_defaults
({(<type 'int'>, 2000): (<type 'int'>, 2000)},)
>>>


A bientôt,

Armin.



_______________________________________________
pypy-dev mailing list
pypy-dev@python.org
http://mail.python.org/mailman/listinfo/pypy-dev

Re: [pypy-dev] Object identity and dict strategies

Reply via email to