Re: [Python-Dev] bytes / unicode

Glyph Lefkowitz Mon, 21 Jun 2010 22:26:16 -0700

On Jun 21, 2010, at 2:17 PM, P.J. Eby wrote:

> One issue I remember from my "enterprise" days is some of the Asian-language 
> developers at NTT/Verio explaining to me that unicode doesn't actually solve 
> certain issues -- that there are use cases where you really *do* need "bytes 
> plus encoding" in order to properly express something.


The thing that I have heard in passing from a couple of folks with experience 
in this area is that some older software in asia would present characters 
differently if they were originally encoded in a "japanese" encoding versus a 
"chinese" encoding, even though they were really "the same" characters.

I do know that Han Unification is a giant political mess 
(<http://en.wikipedia.org/wiki/Han_unification> makes for some interesting 
reading), but my understanding is that it has handled enough of the cases by 
now that one can write software to display asian languages and it will 
basically work with a modern version of unicode.  (And of course, there's 
always the private use area, as Stephen Turnbull pointed out.)

Regardless, this is another example where keeping around a string isn't really 
enough.  If you need to display a japanese character in a distinct way because 
you are operating in the japanese *script*, you need a tag surrounding your 
data that is a hint to its presentation.  The fact that these presentation 
hints were sometimes determined by their encoding is an unfortunate historical 
accident.

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Re: [Python-Dev] bytes / unicode

Reply via email to