Ross Ridge (Sat, 21 Feb 2009 18:06:35 -0500) > I understand what Unicode and MIME are for and why they exist. Neither > their merits nor your insults change the fact that the only current > standard governing the content of Usenet posts doesn't require their > use.
Thorsten Kampe <thors...@thorstenkampe.de> wrote: >That's right. As long as you use pure ASCII you can skip this nasty step >of informing other people which charset you are using. If you do use non >ASCII then you have to do that. That's the way virtually all newsreaders >work. It has nothing to do with some 21+ year old RFC. Even your Google >Groups "newsreader" does that ('content="text/html; charset=UTF-8"'). No, the original post demonstrates you don't have include MIME headers for ISO 8859-1 text to be properly displayed by many newsreaders. The fact that your obscure newsreader didn't display it properly doesn't mean that original poster's newsreader is broken. >Being explicit about your encoding is 99% of the whole Unicode magic in >Python and in any communication across the Internet (may it be NNTP, >SMTP or HTTP). HTTP requires the assumption of ISO 8859-1 in the absense of any specified encoding. >Your Google Groups simply uses heuristics to guess the >encoding the OP probably used. Windows newsreaders simply use the locale >of the local host. That's guessing. You can call it assuming but it's >still guessing. There is no way you can be sure without any declaration. Newsreaders assuming ISO 8859-1 instead of ASCII doesn't make it a guess. It's just a different assumption, nor does making an assumption, ASCII or ISO 8850-1, give you any certainty. >And it's unpythonic. Python "assumes" ASCII and if the decodes/encoded >text doesn't fit that encoding it refuses to guess. Which is reasonable given that Python is programming language where it's better to have more conservative assumption about encodings so errors can be more quickly diagnosed. A newsreader however is a different beast, where it's better to make a less conservative assumption that's more likely to display messages correctly to the user. Assuming ISO 8859-1 in the absense of any specified encoding allows the message to be correctly displayed if the character set is either ISO 8859-1 or ASCII. Doing things the "pythonic" way and assuming ASCII only allows such messages to be displayed if ASCII is used. Ross Ridge -- l/ // Ross Ridge -- The Great HTMU [oo][oo] rri...@csclub.uwaterloo.ca -()-/()/ http://www.csclub.uwaterloo.ca/~rridge/ db // -- http://mail.python.org/mailman/listinfo/python-list