Stephen J. Turnbull wrote:
it does refer to *encoded* characters as the output of
the encoding process:
> The encoding process represents 24-bit groups of input bits
> as output strings of 4 encoded characters.
The "encoding" being referred to there is the encoding
from input bytes to output characters, not an encoding
of the output characters as bytes.
Nowhere in RFC 4648 does it refer to the output as
being made up of "bytes" or "octets". It's always
described in terms of "characters".
As I understand it, the intention of the standard
in using "character" to denote the code unit is similar to that of RFC
3986: BASE encodings are intended to be printable and recognizable to
humans.
Hmmm... so why then does it say, in section 4:
The Base 64 encoding is designed to represent arbitrary sequences of
octets in a form that ... need not be human readable.
If you're using a non-ASCII-superset encoding such as EBCDIC
for text I/O, then you should translate from ASCII to that encoding
for display,
What about the channel you're sending the encoded data over?
Suppose I'm on Windows and I'm embedding the base64 encoded
data in a text message that I'm sending through a mail client
that accepts text in utf-16.
I hope you would agree that, in that situation, encoding the
base64 output in ASCII and giving those bytes directly to
the mail client would be very much the wrong thing to do?
--
Greg
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe:
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com