On 07/27/2013 12:21 PM, wxjmfa...@gmail.com wrote: > Good point. FSR, nice tool for those who wish to teach > Unicode. It is not every day, one has such an opportunity.
I had a long e-mail composed, but decided to chop it down, but still too long. so I ditched a lot of the context, which jmf also seems to do. Apologies. 1. FSR *is* UTF-32 so it is as unicode compliant as UTF-32, since UTF-32 is an official encoding. FSR only differs from UTF-32 in that the padding zeros are stripped off such that it is stored in the most compact form that can handle all the characters in string, which is always known at string creation time. Now you can argue many things, but to say FSR is not unicode compliant is quite a stretch! What unicode entities or characters cannot be stored in strings using FSR? What sequences of bytes in FSR result in invalid Unicode entities? 2. strings in Python *never change*. They are immutable. The + operator always copies strings character by character into a new string object, even if Python had used UTF-8 internally. If you're doing a lot of string concatenations, perhaps you're using the wrong data type. A byte buffer might be better for you, where you can stuff utf-8 sequences into it to your heart's content. 3. UTF-8 and UTF-16 encodings, being variable width encodings, mean that slicing a string would be very very slow, and that's unacceptable for the use cases of python strings. I'm assuming you understand big O notation, as you talk of experience in many languages over the years. FSR and UTF-32 both are O(1) for slicing and lookups. UTF-8, 16 and any variable-width encoding are always O(n). A lot slower! 4. Unicode is, well, unicode. You seem to hop all over the place from talking about code points to bytes to bits, using them all interchangeably. And now you seem to be claiming that a particular byte encoding standard is by definition unicode (UTF-8). Or at least that's how it sounds. And also claim FSR is not compliant with unicode standards, which appears to me to be completely false. Is my understanding of these things wrong? -- http://mail.python.org/mailman/listinfo/python-list