Jeff Breidenbach writes:

> I've seen a small but not tiny number of messages where the
> Mail User Agent is sticking raw iso-8859-1 characters (outside
> the ASCII range) inside the Subject: header.

It's invalid, but it's not uncommon.  It's getting rarer, though, as
more an more legacy mail user agents end up on the scrap heap.

One puzzling thing I've seen, though, is that it's quite common for
Chinese-language and Russian-language to put unencoded characters
(big-5 and koi8, respectively) in the Subject header.  I haven't
really looked into the causes why, but I've had to add support for
default charset handling into Gmane to make certain lists at all
legible.  That is, when doing the conversion to utf-8, I have a
per-list charset list to be used, which I feed to the converter.

An alternative approach is to use charset-guessing software (which is
supposed to be pretty good, these days), or look for clues in the
rest of the message for what the charset most likely is -- there may
be a Content-Type with a charset parameter, even though the headers
aren't RFC2047 encoded.  Etc.

Or you can just ignore the problem with these invalid email
messages.  :-)

-- 
(domestic pets only, the antidote for overdose, milk.)
  [EMAIL PROTECTED] * Lars Magne Ingebrigtsen

_______________________________________________
Discussion list for The Mail Archive
Gossip@jab.org
http://jab.org/cgi-bin/mailman/listinfo/gossip

Reply via email to