On Thu, Mar 14, 2013 at 3:05 PM, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote: > That depends on how you use the strings. Because strings are immutable, > there isn't really anything like "switching between widths" -- the width > is set when the string is created, and then remains fixed.
The nearest thing to "switching" is where you repeatedly replace() or append/slice to add/remove the one non-ASCII character that your contrived test is using. Let's see... Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600 32 bit (Intel)] on win32 ASCII -> ASCII: >>> timeit.timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000) 0.14999895238081962 ASCII -> BMP: >>> timeit.timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000) 1.7513426985832012 BMP -> BMP: >>> timeit.timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000) 0.22562895563542895 ASCII -> SMP: >>> timeit.timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000) 1.9037101084076369 BMP -> SMP: >>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000) 1.9659967956821163 SMP -> SMP: >>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000) 0.7214749360603037 So there *is* cost to "changing size". Trying them again in Python 2.6 Narrow: Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 ASCII -> ASCII: >>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000) 0.53506213778566547 ASCII -> BMP: >>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000) 0.57752172412974268 BMP -> BMP: >>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000) 0.53309121690045913 ASCII -> SMP: >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000) 0.55128347317885584 BMP -> SMP: >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000) 0.55610140394938412 SMP -> SMP: >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000) 0.6599570615818493 Much more consistent. (Note that the SMP timings are quite probably a bit off as the string will continue to grow - I'm taking off one 16-bit character and putting on two.) I don't have a 2.6 wide build on the same hardware, so these times don't truly compare to the above ones. This is slower hardware than the above tests. Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39) [GCC 4.4.5] on linux2 >>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000) 1.5774970054626465 >>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000) 1.5743560791015625 >>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000) 1.6072981357574463 >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000) 1.6745591163635254 >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000) 1.6705770492553711 >>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000) 1.7078530788421631 Here's my reading of all these stats. Python 3.3's str is faster than 2.6's unicode when the copy can be done directly (ie when the size isn't changing), but converting sizes costs a lot (suggestion: memcpy is blazingly fast, no surprise there). Since MOST string operations won't change the size, this is a benefit to most programs. I expect that Python 3.2 will behave comparably to the 2.6 stats, but I don't have 3.2s handy - can someone confirm please? ChrisA -- http://mail.python.org/mailman/listinfo/python-list