...
>> IOW, if you're producing output that has to go into another system
>> that doesn't take unicode, it doesn't matter how
>> theoretically-correct it would be for your app to process the data in
>> unicode form.  In that case, unicode is not a feature: it's a bug.
>>
> This is not always true.  If you read a webpage, chop it up so you get
> a list of words, create a histogram of word length, and then write the output 
> as
> utf8 to a database.  Should you do all your intermediate string operations
> on utf8 encoded byte strings?  No, you should do them on unicode strings as
> otherwise you need to know about the details of how utf8 encodes characters.
> 

You'd still have problems in Unicode given stuff like å =~ å even though
u'\xe5' vs u'a\u030a' (those will look the same depending on your
Unicode system. IDLE shows them pretty much the same, T-Bird on Windosw
with my current font shows the second as 2 characters.)

I realize this was a toy example, but it does point out that Unicode
complicates the idea of 'equality' as well as the idea of 'what is a
character'. And just saying "decode it to Unicode" isn't really sufficient.

John
=:->

_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to