On Tue, Sep 11, 2012 at 12:43 AM, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: > On 2012-09-10, Oscar Benjamin <oscar.j.benja...@gmail.com> wrote: >> I haven't looked at the source but my understanding was precisely that there >> is an intern() bit and that not only the builtins module but all the literals >> in any byte-compiled module are interned. >> > > s/literals/identifiers/ > > You can see the interned flag in the PyUnicodeObject struct here: > http://hg.python.org/cpython/file/3ffd6ad93fe4/Include/unicodeobject.h#l303
Ah, yep, so that's there. In that case, it's possible to have that optimization. However, I may be misreading this, but it seems the only Unicode comparison function is a rich compare, which is unable to take advantage of a known difference: http://hg.python.org/cpython/file/b48ef168d8c5/Objects/unicodeobject.c#l6114 Different pointers prove the strings differ, but don't tell you which is to be sorted earlier. You could use this if you roll your own comparison in C; or, if you already know the strings are interned, you can use 'is' / 'is not'. But that seems to be the extent of it. ChrisA -- http://mail.python.org/mailman/listinfo/python-list