On Thu, 2009-08-06 at 01:31 +0000, John Machin wrote: > Faster by an enormous margin; attributing this to the cost of attribute lookup > seems implausible.
Ok, fair point. I don't think the time difference fully registered when I composed that message. Testing a global access (LOAD_GLOBAL) versus an attribute access on a global object (LOAD_GLOBAL + LOAD_ATTR) shows that the latter is about 40% slower than the former. So that certainly doesn't account for the difference. > Suggested further avenues of investigation: > > (1) Try the timing again with "cp1252" and "utf8" and "utf_8" > > (2) grep "utf-8" <Python2.X_source_code>/Objects/unicodeobject.c Very pedagogical of you. :) Indeed, it looks like bigger player in the performance difference is the fact that the code path for unicode(s, enc) short-circuits the codec registry for common encodings (which includes 'utf-8' specifically), whereas s.decode('utf-8') necessarily consults the codec registry. Cheers, Jason. -- http://mail.python.org/mailman/listinfo/python-list