A NOTE has been added to this issue. ====================================================================== http://dbmail.org/mantis/view.php?id=655 ====================================================================== Reported By: idk Assigned To: ====================================================================== Project: DBMail Issue ID: 655 Category: Database layer Reproducibility: random Severity: minor Priority: normal Status: new target: ====================================================================== Date Submitted: 15-Nov-07 01:32 CET Last Modified: 15-Nov-07 10:28 CET ====================================================================== Summary: MIME headers are incorrectly parsed into cached tables Description: Some messages with MIME header encoding are wrongly inserted into dbmail_*field and dbmail_headervalue. It seems like double encoding into utf8.
E.g. From field for some message (see Additional Information) has this two instances: SELECT physmessage_id, HEX(fromname) FROM dbmail_fromfield WHERE physmessage_id BETWEEN 399826 AND 399827 399826 4F6E6C696E652052657A65727661C48D6EC3AD2053797374C3A96D20534D4F534B 399827 4F6E6C696E652052657A65727661C384C28D6EC383C2AD2053797374C383C2A96D20534D4F534B Compare it. The first one is correct. Without accents it is "Online Rezervacni System SMOSK". The second one begining at char 16 where is UNICODE LATIN SMALL LETTER C WITH CARON \u010D, in utf-8 encoding C48D, is corrupted (here is C384C28D). Every non US-ASCII character is interpreted as 4 bytes instead of 2 bytes. When you convert the first byte C4 from iso-8859-2 into utf-8, you get C384, and when you convert 8D, you get C28D. So "corrupted" utf-8 string may be made by iconv -f iso-8859-2 -t utf8 (may be not iso-8859-2, but windows-1250, but the encoding = utf8, default_msg_encoding = utf8, database is utf8 too, environment is en_US.UTF-8, nowhere iso/win). ====================================================================== ---------------------------------------------------------------------- idk - 15-Nov-07 10:28 ---------------------------------------------------------------------- Ah, don't disturb my characters! :o) This is discrimination, my nation is valid EU member! Shame! Czech characters are valid UNICODE characters - and Mantis brokes them. :o) So, Additional Information again (accents has *): Decoded string is with Czech characters: Subject: Zrus*eni* objedna*vky www.smosk.cz From: "Online Rezervac*ni* Syste*m SMOSK" <[EMAIL PROTECTED]> But about half-and-half headers are inserted as: Subject: ZruA*i*enA*** Issue History Date Modified Username Field Change ====================================================================== 15-Nov-07 01:32 idk New Issue 15-Nov-07 10:28 idk Note Added: 0002410 ====================================================================== _______________________________________________ Dbmail-dev mailing list Dbmail-dev@dbmail.org http://twister.fastxs.net/mailman/listinfo/dbmail-dev