2010.10.08 17:17 Kamenik, Aleksander rašė: >> You don't have to assume anything. Character set name is written in >> first >> section of B|Q encoding. If character set name is not written and >> subject >> is not encoded, it must be in us ascii. >> >> utf7 is rarely used for Subject. You have utf-8 (=?utf-8?b? or =?utf- >> 8?q?) >> or some unicode variat (unicode-#-# or unicode-#-#-some-text) or you >> confuse Unicode with broken 8bit headers. > > These are from Outlook 2007 as far as I can tell. Not everybody follows > standards. For example: > > # grep 'Subject: palun juur' mbox > Subject: palun juurdepääsu > # grep 'Subject: palun juur' mbox | hexdump -C > 00000000 53 75 62 6a 65 63 74 3a 20 70 61 6c 75 6e 20 6a |Subject: > palun j| > 00000010 75 75 72 64 65 70 c3 a4 c3 a4 73 75 0a > |uurdep....su.| > 0000001d > #
c3 a4 c3 a4 It is in UTF-8, but it is also violation of rfc822/rfc2047. Headers must be encoded. Computer program can't detect used character set, if sender does not specify which character set is used. It is highly unlikely that all your malformed emails are in utf-8. You can have a mix of utf-8, iso-8859-1, iso-8859-13, iso-8859-15, windows-1252, windows-1257 and other character sets. Older Estonian emails are probably not in utf-8. If you try to fix all 8bit subjects, you will break malformed iso-8859-x Estonian texts that look ok in Outlook now. If those utf-8 emails looked OK in Outlook, maybe problem is in libpst. -- Tomas _______________________________________________ DBmail mailing list [email protected] http://mailman.fastxs.nl/cgi-bin/mailman/listinfo/dbmail
