R. David Murray writes: > version of headers to the email5 API, but since any such data would > be non-RFC compliant anyway, [access to non-conforming headers by > reparsing the bytes] will just have to be good enough for now.
But that's potentially unpleasant for, say, Mailman. AFAICS, what you're saying is that Mailman will have to implement a full header parser and repair module, or shunt (and wait for administrator intervention on) any mail that happens to contain even one byte of non-RFC-conforming content in a header it cares about. (Note that we're not talking about moderator-level admins here; we're talking about the Big Cheese with access to the command line on the list host.) That's substantially worse than the current system, where (in theory, and in actual practice where it distributes its own version of email) it can trap the Unicode exception on a per-header basis. I also worry about the implications for backwards compatibility. Eventually email-N needs to handle non-conforming mail in a sensible way, or anybody who gets spam (ie, everybody) and wants a reliable email system will need to implement their own. If you punt completely on handling non-conforming mail now, when is it going to be done? And when it is done, will the backward-compatible interface be able to access the robust implementation, or will people who want robust APIs have to use rather different ones? The way you're going right now, I have to worry about the answer to the second question, at least. > [*] Why '?' and not the unicode invalid character character? Well, the > email5 Generate.flatten can be used to generate data for transmission over > the wire *if* the source is RFC compliant and 7bit-only, and this would > be a normal email5 usage pattern (that is, smtplib.SMTP.sendmail expects > ASCII-only strings as input!). So the data generated by Generator.flatten > should not include unicode... I don't understand this at all. Of course the byte stream generated by Generator.flatten won't contain Unicode (in the headers, anyway); it will contain only ASCII (that happens to conform to QP or Base64 encoding of Unicode in some appropriate UTF in many cases). Why is U+FFFD REPLACEMENT CHARACTER any different from any other non-ASCII character in this respect? (Surely you are not saying that Generator.flatten can't DTRT with non-ASCII content *at all*?) The only thing I can think of is that you might not want to introduce non-ASCII characters into a string that looks like it might simply be corrupted in transmission (eg, it contains only one non-ASCII byte). That's reasonable; there are a lot of people who don't have to deal with anything but ASCII and occasionally Latin-1, and they don't like having Unicode crammed down their throats. > which raises a problem for CTE 8bit sections > that the patch doesn't currently address. AFAIK, there's no requirement, implied or otherwise, that a conforming implementation *produce* CTE 8bit. So just don't do that; that will keep smtplib happy, no? _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com