On Sun, Nov 10, 2013 at 1:37 AM, Roy Smith <r...@panix.com> wrote: > In article <mailman.2283.1383985583.18130.python-l...@python.org>, > Chris Angelico <ros...@gmail.com> wrote: > >> Some languages [intern] automatically for all strings, others >> (like Python) only when you ask for it. > > What does "only when you ask for it" mean?
You can explicitly intern a Python string with the sys.intern() function, which returns either the string itself or an indistinguishable "interned" string. Two equal strings, when interned, will return the same object: >>> foo = "asdf" >>> bar = "as" >>> bar += "df" >>> foo is bar False Note that the Python interpreter is free to answer True there, but there's no mandate for it. >>> foo = sys.intern(foo) >>> bar = sys.intern(bar) >>> foo is bar True Now it's mandated. The two strings must be the same object. Interning in this way makes string equality come down to an 'is' check, which is potentially a lot faster than actual string equality. Some languages (eg Pike) do this automatically with all strings - the construction of any string includes checking to see if it's a duplicate of any other string. This adds cost to string manipulation and speeds up string comparisons; since the engine knows that all strings are interned, it can do the equivalent of an 'is' check for any string equality. So what I meant, in terms of storage/representation efficiency, is that you can store duplicate strings very efficiently if you simply increment the reference counts of the same few objects. Python won't necessarily do that for you; check memory usage of something like this: strings = [open("some_big_file").read() for _ in range(10000)] And compare against this: strings = [sys.intern(open("some_big_file").read()) for _ in range(10000)] In a language that guarantees string interning, the syntax of the former would have the memory consumption of the latter. Whether that memory saving and improved equality comparison is worth the effort of dictionarification is one of those eternally-debatable points. ChrisA -- https://mail.python.org/mailman/listinfo/python-list