>>>>> "Greg" == Greg Ewing <[EMAIL PROTECTED]> writes:
Greg> Stephen J. Turnbull wrote: >> No, base64 isn't a wire protocol. It's a family[...]. Greg> Yes, and it's up to the programmer to choose those code Greg> units (i.e. pick an encoding for the characters) that will, Greg> in fact, pass through the channel he is using without Greg> corruption. I don't see how any of this is inconsistent with Greg> what I've said. It's not. It just shows that there are other "correct" ways to think about the issue. >> Only if you do no transformations that will harm the >> base64-encoding. ... It doesn't allow any of the usual >> transformations on characters that might be applied globally to >> a mail composition buffer, for example. Greg> I don't understand that. Obviously if you rot13 your mail Greg> message or turn it into pig latin or something, it's going Greg> to mess up any base64 it might contain. But that would be a Greg> silly thing to do to a message containing base64. What "message containing base64"? "Any base64 in there?" "Nope, nobody here but us Unicode characters!" I certainly hope that in Py3k bytes objects will have neither ROT13 nor case-changing methods, but str objects certainly will. Why give up the safety of that distinction? Greg> Given any piece of text, there are things it makes sense to Greg> do with it and things it doesn't, depending entirely on the Greg> use to which the text will eventually be put. I don't see Greg> how base64 is any different in this regard. If you're going to be binary about it, it's not different. However the kind of "text" for which Unicode was designed is normally produced and consumed by people, who wll pt up w/ ll knds f nnsns. Base64 decoders will not put up with the same kinds of nonsense that people will. You're basically assuming that the person who implements the code that processes a Unicode string is the same person who implemented the code that converts a binary object into base64 and inserts it into a string. I think that's a dangerous (and certainly invalid) assumption. I know I've lost time and data to applications that make assumptions like that. In fact, that's why "MULE" is a four-letter word in Emacs channels.<wink> >> So then you bring it right back in with base64. Now they need >> to know about bytes<->unicode codecs. Greg> No, they need to know about the characteristics of the Greg> channel over which they're sending the data. I meant it in a trivial sense: "How do you use a bytes<->unicode codec properly without knowing that it's a bytes<->unicode codec?" In most environments, it should be possible to hide bytes<->unicode codecs almost all the time, and I think that's a very good thing. I don't think it's a good idea to gratuitously introduce wire protocols as unicode codecs, even if a class of bit patterns which represent the integer 65 are denoted "A" in various sources. Practicality beats purity (especially when you're talking about the purity of a pregnant virgin). Greg> It might be appropriate to to use base64 followed by some Greg> encoding, but the programmer needs to be aware of that and Greg> choose the encoding wisely. It's not possible to shield him Greg> from having to know about encodings in that situation, even Greg> if the encoding is just ascii. What do you think the email module does? Assuming conforming MIME messages and receivers capable of handling UTF-8, the user of the email module does not need to know anything about any encodings at all. With a little more smarts, the email module could even make a good choice of output encoding based on the _language_ of the text, removing the restriction to UTF-8 on the output side, too. With the aid of file(1), it can make excellent guesses about attachments. Sure, the email module programmer needs to know, but the email module programmer needs to know an awful lot about codecs anyway, since mail at that level is a binary channel, while users will be throwing a mixed bag of binary and textual objects at it. -- School of Systems and Information Engineering http://turnbull.sk.tsukuba.ac.jp University of Tsukuba Tennodai 1-1-1 Tsukuba 305-8573 JAPAN Ask not how you can "do" free software business; ask what your business can "do for" free software. _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com