Bill Janssen wrote: > Greg Ewing wrote: >> Bill Janssen wrote: >> >>> bytes -> base64 -> text >>> text -> de-base64 -> bytes >> It's nice to hear I'm not out of step with >> the entire world on this. :-) > > Well, I can certainly understand the bytes->base64->bytes side of > thing too. The "text" produced is specified as using "a 65-character > subset of US-ASCII", so that's really bytes.
If the base64 codec was a text<->bytes codec, and bytes did not have an encode method, then if you want to convert your original bytes to ascii bytes, you would do: ascii_bytes = orig_bytes.decode("base64").encode("ascii") "Use base64 to convert my byte sequence to characters, then give me the corresponding ascii byte sequence" To reverse the process: orig_bytes = ascii_bytes.decode("ascii").encode("base64") "Use ascii to convert my byte sequence to characters, then use base64 to convert those characters back to the original byte sequence" The only slightly odd aspect is that this inverts the conventional meaning of base64 encoding and decoding, where you expect to encode from bytes to characters and decode from characters to bytes. As strings currently have both methods, the existing codec is able to use the conventional sense for base64: encode goes from "str-as-bytes" to "str-as-text" (giving a longer string with characters that fit in the base64 subset) and decode goes from "str-as-text" to "str-as-bytes" (giving back the original string) All the unicode codecs, on the other hand, use encode to get from characters to bytes and decode to get from bytes to characters. So if bytes objects *did* have an encode method, it should still result in a unicode object, just the same as a decode method does (because you are encoding bytes as characters), and unicode objects would acquire a corresponding decode method (that decodes from a character format such as base64 to the original byte sequence). In the name of TOOWTDI, I'd suggest that we just eat the slight terminology glitch in the rare cases like base64, hex and oct (where the character format is technically the encoded format), and leave it so that there is a single method pair (bytes.decode to go from bytes to characters, and text.encode to go from characters to bytes). Cheers, Nick. -- Nick Coghlan | [EMAIL PROTECTED] | Brisbane, Australia --------------------------------------------------------------- http://www.boredomandlaziness.org _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com