Re: unicode encoding usablilty problem

aurora Fri, 18 Feb 2005 16:25:04 -0800

On Fri, 18 Feb 2005 21:16:01 +0100, Martin v. Löwis <[EMAIL PROTECTED]> wrote:

I'd like to point out the
historical reason: Python predates Unicode, so the byte string type
has many convenience operations that you would only expect of
a character string.

We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return byte strings, when they should process character strings.
Instead of breaking the libraries right away, we have defined
a transitional mechanism, which allows to add Unicode support
to libraries as the need arises. This transition is still in
progress.

I understand. So I wasn't yelling "why can't Python be more like Java". On the other hand I also want to point out making individual decision for each string wasn't practical and is very error prone. The fact that unicode and 8 bit string look alike and work alike in common situation but only run into problem with non-ASCII is very confusing for most people.

Eventually, the primary string type should be the Unicode
string. If you are curious how far we are still off that goal,
just try running your program with the -U option.


Lots of errors. Amount them are gzip (binary?!) and strftime??

I actually quite appriciate Python's power in processing binary data as 8-bit strings. But perhaps we should transition to use unicode as text string as treat binary string as exception. Right now we have

  '' - 8bit string; u'' unicode string

How about

  b'' - 8bit string; '' unicode string

and no automatic conversion. Perhaps this can be activated by something like the encoding declarations, so that transition can happen module by module.

Regards,
Martin


--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode encoding usablilty problem

Reply via email to