Le samedi 25 août 2012 11:46:34 UTC+2, Frank Millman a écrit : > On 25/08/2012 10:58, Mark Lawrence wrote: > > > On 25/08/2012 08:27, wxjmfa...@gmail.com wrote: > > >> > > >> Unicode design: a flat table of code points, where all code > > >> points are "equals". > > >> As soon as one attempts to escape from this rule, one has to > > >> "pay" for it. > > >> The creator of this machinery (flexible string representation) > > >> can not even benefit from it in his native language (I think > > >> I'm correctly informed). > > >> > > >> Hint: Google -> "Das grosse Eszett" > > >> > > >> jmf > > >> > > > > > > It's Saturday morning, I'm stone cold sober, had a good sleep and I'm > > > still baffled as to the point if any. Could someone please enlightem me? > > > > > > > Here's what I think he is saying. I am posting this to test the water. I > > am also confused, and if I have got it wrong hopefully someone will > > correct me. > > > > In python 3.3, unicode strings are now stored as follows - > > if all characters can be represented by 1 byte, the entire string is > > composed of 1-byte characters > > else if all characters can be represented by 1 or 2 bytea, the entire > > string is composed of 2-byte characters > > else the entire string is composed of 4-byte characters > > > > There is an overhead in making this choice, to detect the lowest number > > of bytes required. > > > > jmfauth believes that this only benefits 'english-speaking' users, as > > the rest of the world will tend to have strings where at least one > > character requires 2 or 4 bytes. So they incur the overhead, without > > getting any benefit. > > > > Therefore, I think he is saying that he would have preferred that python > > standardise on 4-byte characters, on the grounds that the saving in > > memory does not justify the performance overhead. > > > > Frank Millman
Very well explained. Thanks. More precisely, affected are not only the 'english-speaking' users, but all the users who are using not latin-1 characters. (See the title of this topic, ... typography). Being at the same time, latin-1 and unicode compliant is a plain absurdity in the mathematical sense. --- For those you do not know, the go language has introduced the rune type. As far as I know, nobody is complaining, I have not even seen a discussion related to this subject. 100% Unicode compliant from the day 0. Congratulations. jmf -- http://mail.python.org/mailman/listinfo/python-list