Devs,

I recently received a bulk e-mail from an event organizer that displayed in
RoundCube (using Firefox 3) with the little square hex-code glyphs in place
of some of the punctuation marks. I researched why this was happening, and
tracked it down to an encoding issue.

The text/html message part in the e-mail source specified iso-8859-1
encoding. After RoundCube converted the message part to UTF-8, there were
still non-UTF8 characters in the resulting text. One such character was
0x92, which is not even a valid iso-8859-1 character. It turns out that the
message originator must have been using Windows-1252 encoding (in which
0x92 is a single-quote character, which was correct in the context in which
it appeared), but incorrectly specified iso-8859-1 encoding in the MIME
message.

The Windows-1252 character set is effectively a superset of the iso-8859-1
character set, replacing some of the seldom-used control character code
points with additional punctuation and accent characters. Some mail agents
incorrectly blur the line between these two encodings, and send
Windows-1252 characters in iso-8859-1 messages.

The following workaround (in rcube_charset_convert()) corrects the issue
(at least for my one test case):

// Workaround for mail agents that include Windows-1252 characters
// in text advertised as ISO-8859-1
if ($from == "ISO-8859-1" && preg_match("/[\x80-\x9F]/", $str))
$from = "WINDOWS-1252";

What does everyone think of including a workaround like this? I'm generally
reluctant to work around improper behavior from other software, but this
particular kind of relaxed interpretation seems common (check out the
ISO-8859-1 page on Wikipedia).


-- 
Eric Stadtherr
[email protected]

_______________________________________________
List info: http://lists.roundcube.net/dev/

Reply via email to