Tomohiro KUBOTA <[EMAIL PROTECTED]>: > The key point is that when we receive a mail with raw 8bit characters, > we don't have an easy and relyable method to tell the characters are > from ISO-8859-1 or KOI8-R or other character sets.
If the headers contain 8-bit octets and are valid as UTF-8, it's fairly safe to assume that they really are UTF-8. Otherwise, you could look for a Content-Type field or make it depend on the mailing list. > An easy way is to assume *all* raw 8bit characters to be KOI8-R and > convert into SGML entity. However, I don't know whether there are > some other languages where a certain amount of non-spammer people > use raw 8bit characters. If they exist, they will complain on this > idea. I thought some Japanese non-spammers use iso-2022-jp in headers, which isn't 8-bit, but it isn't us-ascii, either. Am I out of date? Edmund