On 8 July 2011 19:52, Armin Rigo <ar...@tunes.org> wrote: > Hi William,
Hi Armin, everybody, > On Fri, Jul 8, 2011 at 10:31 AM, William ML Leslie > <william.leslie....@gmail.com> wrote: >> On another note: what Alex talks about as being two different cases >> are just one with the small int optimisation - all references can be >> compared by value in the C backend with small ints enabled, if the >> object space doesn't provide alternative behaviour. > > No, because e.g. of longs. You don't have the same issue if you use > longs instead of ints in this particular example, but more generally > the issue exists too. > > A first note is that it's impossible to satisfy all three of Alex's > criteria in general: we would like id(x) to be a unique word-sized > number determined only by the value of 'x', and different for every > value of 'x' and for every object of a different type too; but if the > possible 'x'es are all possible word-sized integers, then it's > impossible to satisfy this, just because there are too many of them. > The problem only gets "more impossible" if we include all long objects > in 'x'. That id(x) be a word-sized value uniquely determined by the value of x is impossible, yes, as the following program should demonstrate: while True: id([]) We create an infinite number of objects here, and if each one had to have a unique word-sized id, we'd exhaust that space pretty quickly. What id() does provide is that as long as there is a *reference* to the object, it returns a constant and unique integer (which we are assuming should be a word-sized integer). As later emails on this list suggested, it would be better if the semantics were relaxed to "the id should be preserved", meaning that placement of an integer or string into a container should allow it to have the same id on retrieval. New objects resulting from operations such as multiplication or string concatenation need not have the same id as other objects with the same value - if we did that for strings, it would have serious consequences for parallel code. What I was suggesting is that, since every live object must be encoded in an object reference somewhere, that object references should be at least good enough to suggest an id. My point about small integers wasn't really about word-sized ints, I was talking about the smallint optimisation, which as I understood it, boxes small app-level integers in the C backend. It does this by shifting integers right and bitwise-oring with 1; encoding the integer into the reference. "By definition", as Carl put it, there are never more objects represented in this way than we can fit in a reasonable, bounded id size. The suggestion then is to use the value of the object reference cast as a word-sized integer as the id of the object for integers so encoded, and calculate the id of other objects in the usual fashion. This happens to work for larger integers and floats, because the id will be preserved as long as a reference to them exists by their boxedness. Floats could use a similar mechanism to integers, eg, their bit representation shifted left two and bitwise-ored with 2. That does mean that id() is no longer word-sized, but it does not make it unbounded. > The problem is not new, it is just a bit more apparent. For example, > already in pypy 1.5 we have: > >>>>> a = A() >>>>> d = a.__dict__ >>>>> s = 'foobar' >>>>> d[s] = 5 >>>>> id(s) > 163588812 >>>>> id(d.keys()[0]) > 163609508 >>>>> id(d.keys()[0]) > 163609520 What is keeping us from using the underlying rpython string as the basis for id? I guess there is nowhere obvious to store the id or the need to store the id in the gc copy phase. In that way, it'd make for an ugly special case. On 9 July 2011 00:07, Armin Rigo <ar...@tunes.org> wrote: > After some discussion with Carl Friedrich, here is the best we could > think of. Say that "a is b" is redefined as being True for two equal > immutable objects of the same type. Then we want the following > property to always hold: "a is b" if and only if "id(a) == id(b)". I would prefer to weaken this just a little: x is y iff id(x) == id(y) WHEN x and y are live for the duration of the equality. This counters cases such as: id([]) == id([]) which can certainly happen under any reasonable definition of id. > We > can do that by having id(x) return either a regular integer or a long. > Let's call for the rest of this discussion an "immutable object" an > object of type int, long or float. If 'x' is not an immutable object, > then id(x) can return the same value as it does now. If 'x' is an > immutable object, then we compute from it a long value that does *not* > fit in 32- or 64-bit, and that includes some tagging to make sure that > immutable objects of different types have different id's. For > example, id(7) would be (2**32 + 7<<3 + 1). This makes the range of id() unbounded, where its domain is bounded. I don't feel comfortable with it, although I know that isn't much of an objection. > Such a solution should make two common use cases of id() work: > > 1. as keys in a dictionary, to implement an identity dict, like in > copy.py: in this case we take the id() of random objects including > immutable ones, but only expect the result to work as keys in the > dictionary. Getting arbitrarily-sized longs is not a problem here. Speaking of, maybe it'd be easier to attempt to get the identity dict into the language proper. William Leslie _______________________________________________ pypy-dev mailing list pypy-dev@python.org http://mail.python.org/mailman/listinfo/pypy-dev