Chris Angelico writes: > Older versions of Python had text and bytes be the same things. That > means that, for backward compatibility, they have some common methods. > But does that really mean that bytes can be uppercased? Or is it that > we allow bytes to be treated as ASCII-encoded text, which is then > uppercased, and then returned to being bytes?
Not just older versions. There have been several, more or less hotly contested, changes post-2/3 fork that basically come down to "bytes are frequently the wire format of ASCII-compatibly-encoded text, so we're going to add text methods for the convenience of people who work with those wire formats but do not need to (and sometimes cannot) decode to Unicode." For example, RFC 5322 header field tags are defined to be case- insensitive ASCII, and therefore it's useful to match them by upper- or lowercasing the tag, then matching against fixed strings. Could you convert to text and do the work? Not usefully: you need to parse the bytes to determine which text encoding is in use. (And ironically enough, if the message is RFC 5322 + RFC 2045-conformant, the hacky iso-8859-1 "conversion" will be allocation of a str object and then a memcpy of the bytes. I don't think that's a rebuttal to your argument, of course, it's just amusing.) That doesn't mean that bytes ARE text that happens to fit in 8-bit code units (PEP 393). It does mean that the similarities of the APIs are neither random accidents nor historical artifact. They're intentional. I don't think this has anything whatsoever to do with whether the "custom string prefix" proposal is a good idea or not. (other) Steve _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-le...@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/3OZLBLIIRGH2YSAOTMGZD3E7MKLO6KOF/ Code of Conduct: http://python.org/psf/codeofconduct/