About the exemples contested by Steven: eg: timeit.timeit("('ab…' * 10).replace('…', 'œ…')")
And it is good enough to show the problem. Period. The rest (you have to do this, you should not do this, why are you using these characters - amazing and stupid question -) does not count. The real problem is elsewhere. *Americans* do not wish a character occupies 4 bytes in *their* memory. The rest of the world does not count. The same thing happens with the utf-8 coding scheme. Technically, it is fine. But after n years of usage, one should recognize it just became an ascii2. Especially for those who undestand nothing in that field and are not even aware, characters are "coded". I'm the first to think, this is legitimate. Memory or "ability to treat all text in the same and equal way"? End note. This kind of discussion is not specific to Python, it always happen when there is some kind of conflict between ascii and non ascii users. Have a nice day. jmf -- http://mail.python.org/mailman/listinfo/python-list