Steven D'Aprano <ste...@remove.this.cybersource.com.au> wrote: > If I wanted a reference to a list, I'd expect to *dereference* the > reference to get to the list. That's not what Python forces you do to: > you just use the list as the list object itself.
That's odd. No, you give a reference to the list to a function, and the function messes with the list for you. > This isn't hard people. Stop confusing the implementation details of > how CPython works under the hood with Python level code. I'm not confusing anything. This is conceptual-model stuff, not implementation details. (Further discussion below.) > In Python, [1, 2, 3] is a list, not a reference to a list. In [1]: l = [1, 2, 3] In [2]: l[1] = l In [3]: l Out[3]: [1, <Recursion on list with id=3079286092>, 3] Now, if you're right, then l directly contains itself and some other stuff, which is obviously absurd: it's a list, not a Tardis. If I'm right, then l is bound to a reference to a list which contains three references, two of them to integers and a third to the list itself. This doesn't seem absurd at all any more. Variables are not the only places where sharing can occur! Explaining this is much harder if you don't start from the idea that all you're doing is carting uniformly shaped references about. Another example: In [1]: a = [1, 2, 3] In [2]: b = (a, a) In [3]: b Out[3]: ([1, 2, 3], [1, 2, 3]) In [4]: a[1] = 0 In [5]: b Out[5]: ([1, 0, 3], [1, 0, 3]) If b is a tuple containing two copies of a, then this shouldn't have happened. The only satisfactory explanation is that the tuple that b refers to actually contains two references to the same list, so when I mutate that list, the change shows up twice. > When you pass [1, 2, 3] to a function, the function sees the list you > passed it. The function doesn't see "a reference to a list", it sees a > list: > > >>> def func(x): > ... print type(x) > ... > >>> func([1, 2, 3]) > <type 'list'> It sees the reference. `type' sees the reference. `type' digs the type of the object out of the reference, and returns you a reference to the type. > It's so easy, some people refuse to believe it could be that easy, and > insist on complicating matters by bring the implementation details into > the discussion. Just stop, please. References belong in the > *implementation*, nothing to do with Python level code. I'm not getting into implementation details. I'm presenting a mental model. The `they're objects: they contain other objects' model is invalidated when you create circular or shared structures, as I've shown above. There are other epicyclic explanations you could invent to explain sharing, maybe -- like keeping lists of clones, and magically updating all the clones whenever one us mutated. That may even be a valid implementation for a distributed Python (with cached copies of objects and a cache-coherency protocol and all that), but it makes a rotten mental model. And it still doesn't explain circularity. Internally, Tcl uses pointers to values in its implementation. The common currency inside the Tcl interpreter is a Tcl_Obj *. (It used to be a char *, before Tcl 8.) So Tcl could easily offer the same semantics as Python and friends. But there's a twist. Tcl does copy-on-write. It's just impossible to make a circular value in Tcl, and sharing doesn't happen. For example, here's a snippet of a tclsh session. % set l {a b c} a b c % lreplace $l 1 1 $l a {a b c} c Tcl really /can/ be explained without talking about references. The existence of Tcl_Obj, and its strange dual-ported nature (it contains a string and an internal representation, and lazily updates one from the other, and uses the string in order to allow changes of internal representation as necessary) really is an implementation detail, and it's possible to have a full understanding of the behaviour of Tcl programs without knowing about it. This is just impossible with Python. Reference semantics pervade the language. > In Python code, there are no references and no dereferencing. You're right! But the concept is essential in understanding the semantics of the language. Even though no references are explicitly made, and no dereferencing explicitly performed, these things are done repeatedly under the covers -- and failure to understand that will lead to confusion. > > Python decided that all values are passed around as and manipulated > > through references. > > Oh really? Then why can't I write a Python function that does this? > > x = 1 > y = 2 > swap(x, y) > assert x == 2 > assert y == 1 Because the function is given references to the objects. It's not given references to /your/ references to those objects. Therefore it can't modify /your/ references, only its ones. Pedantic answer: def swap(hunoz, hukairz): global x, y x, y = y, x (Maybe nonlocal for Python 3.) > You can't, because Python doesn't have references. In a language with > references, that's easy. Here's an untested Pascal version for swap: [snip] That's call-by-reference, which is a different thing. Python, like Lisp, Scheme, Javascript, ML, Haskell, Lua, Smalltalk, Erlang, Prolog, Java, and indeed C, does call-by-value exclusively. This deserves to be called out as a display: Python passes references by value. By contrast, Pascal (sometimes) passes values by reference! The terminology is admittedly confusing, because it comes from different places. If you don't like me talking about Python having references, then pretend I've been saying `pointer' instead, all the way through. I think `pointer' and `reference' are synonymous in this context, attempts by Stroustrup to confuse everybody notwithstanding. But `pointer' has unhelpful connotations: * pointers are more usually values rather than strange behind-the- scenes things, e.g., in C; * pointers, probably because of C, are associated with scariness and non-safe-ness; and * talking of pointers does seem like it's getting towards implementation details, and I wanted to avoid that. Anyway, call-by-value means that the function gets given copies of the caller's arguments; call-by-reference means that the function gets told where the caller's arguments are. In other words call-by-reference introduces /another/ level of indirection between variables and values. > > (There are no locatives: references are not values, > > But references *are* locatives. No! A locative is a /reified/ reference -- a /value/ that /is/ a reference. That is a locative is a reference, but a reference need not be a locative. (A sheep is a mammal, but not all mammals are sheep.) The Lisp Machine had real locatives. You dereferenced them using CAR and RPLACA -- most unpleasant. In modern Lisp systems they seem to have died a death, probably because making them work with fancy things like resizing arrays when you don't have invisible pointers is too painful. > No no no, lists and tuples store *objects*. Such storage happens to be > implemented as pointers in CPython, but that's an irrelevant detail at > the level of Python. Uh-uh. There's no way that an object can store itself and still have room left over. The idea is just crazy. It can't possibly fit. (Axiom of foundation, if you're into that funky stuff.) If you say `no, but it can store a reference to itself', then everything makes sense. It's completely uniform, and not very scary. Lisp people draw these box-and-pointer diagrams all over the place +-----+-----+ | * | *------> NIL +--|--+-----+ | v 3 to show how the data model works. Of course, the Lisp data model is /exactly the same as Python's/. After a while, the diagrams get less pedantic, and tend to show values stashed /in/ the cons cells. In fact, this is actually closer to most implementations for small things like fixnums, characters and flonums, because they're represented by stashing the actual value with some special not-really-a-pointer tag bits -- but the program can't tell, so this doesn't need to be part of your mental model until you start worrying about performance hacking and why your program is consing so much. -- [mdw] -- http://mail.python.org/mailman/listinfo/python-list