Thorsten Kampe wrote: > * Michael Ströder (Wed, 05 Aug 2009 16:43:09 +0200) >> These both expressions are equivalent but which is faster or should be >> used for any reason? >> >> u = unicode(s,'utf-8') >> >> u = s.decode('utf-8') # looks nicer > > "decode" was added in Python 2.2 for the sake of symmetry to encode().
Yes, and I like the style. But... > It's essentially the same as unicode() and I wouldn't be surprised if it > is exactly the same. Did you try? > I don't think any measurable speed increase will be noticeable between > those two. Well, seems not to be true. Try yourself. I did (my console has UTF-8 as charset): Python 2.6 (r26:66714, Feb 3 2009, 20:52:03) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import timeit >>> timeit.Timer("'äöüÄÖÜß'.decode('utf-8')").timeit(1000000) 7.2721178531646729 >>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(1000000) 7.1302499771118164 >>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000) 8.3726329803466797 >>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(1000000) 1.8622009754180908 >>> timeit.Timer("unicode('äöüÄÖÜß','utf8')").timeit(1000000) 8.651669979095459 >>> Comparing again the two best combinations: >>> timeit.Timer("unicode('äöüÄÖÜß','utf-8')").timeit(10000000) 17.23644495010376 >>> timeit.Timer("'äöüÄÖÜß'.decode('utf8')").timeit(10000000) 72.087096929550171 That is significant! So the winner is: unicode('äöüÄÖÜß','utf-8') Ciao, Michael. -- http://mail.python.org/mailman/listinfo/python-list