On Thu, 25 Apr 2013, Lennart Regebro wrote:

On Thu, Apr 25, 2013 at 4:22 PM, MRAB <pyt...@mrabarnett.plus.com> wrote:
The JSON specification says that it's text. Its string literals can
contain Unicode codepoints. It needs to be encoded to bytes for
transmission and storage, but JSON itself is not a bytestring format.

OK, fair enough.

base64 is a way of encoding binary data as text.

It's a way of encoding binary data using ASCII. There is a subtle but
important difference.

It is a way of encoding arrays of 8-bit bytes as arrays of characters that are part of the printable, non-whitespace subset of the ASCII repertoire. Since the ASCII repertoire is now simply the first 128 code points in the Unicode repertoire, it is equally correct to say that base64 is a way of encoding binary data as Unicode text.

In Python 3 we're trying to stop mixing binary data (bytestrings) with
text (Unicode strings).

Yup. And that's why a byte64 encoding shouldn't return Unicode strings.

That is exactly why it should return Unicode strings. What bytes should get sent if base64 is used to send a byte array over an EBCDIC link? [*]

Having said that, there may be other reasons for base64 encoding to return bytes - I can conceive of arguments involving efficiency, or practicality, or the most common use cases. So I can't say for sure what base64 encoding actually ought to return in Python. But the purist stance should be that base64 encoding should return text, i.e. a string, i.e. unicode.

[*] I apologize to anybody who just ate.

Isaac Morland                   CSCF Web Guru
DC 2554C, x36650                WWW Software Specialist
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to