aurora wrote:
The Java has a much more usable model with unicode used internally and encoding/decoding decision only need twice when dealing with input and output.

In addition to Fredrik's comment (that you should use the same model in Python) and Walter's comment (that you can enforce it by setting the default encoding to "undefined"), I'd like to point out the historical reason: Python predates Unicode, so the byte string type has many convenience operations that you would only expect of a character string.

We have come up with a transition strategy, allowing existing
libraries to widen their support from byte strings to character
strings. This isn't a simple task, so many libraries still expect
and return byte strings, when they should process character strings.
Instead of breaking the libraries right away, we have defined
a transitional mechanism, which allows to add Unicode support
to libraries as the need arises. This transition is still in
progress.

Eventually, the primary string type should be the Unicode
string. If you are curious how far we are still off that goal,
just try running your program with the -U option.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to