Re: Filtering windows-1252 charset
Kai Schaetzl wrote: Philip Prindeville wrote on Thu, 18 May 2006 08:47:48 -0600: How legitimate is email sent as windows-1252? Very, because broken Windows clients use it. Kai Ah, the Strong Arm school of standards enforcement. ;-) -Philip
Re: Filtering windows-1252 charset
Jonathan Armitage wrote: I see some spam with windows-1252 or other unwanted character sets at the start of the subject. I reject them via an Exim ACL, so SA doesn't even have to scan them. Which brings up the subject... How legitimate is email sent as windows-1252? I see absolutely no reason to send it, since it offers no advantage over iso-8859-1 or utf-8, and the RFC's are pretty clear about using the smallest encoding that will fit a message, i.e. usascii = iso-8859-1 = utf-8 (in that order). Further, if you're in the Unix world (or more broadly, not in the Windows world), why would you want to use vendor-specific encodings for no reason other than they're the broken defaults Microsoft chose to use? -Philip
RE: Filtering windows-1252 charset
Which brings up the subject... How legitimate is email sent as windows-1252? I see absolutely no reason to send it, since it offers no advantage over iso-8859-1 or utf-8, and the RFC's are pretty clear about using the smallest encoding that will fit a message, i.e. usascii = iso-8859-1 = utf-8 (in that order). Further, if you're in the Unix world (or more broadly, not in the Windows world), why would you want to use vendor-specific encodings for no reason other than they're the broken defaults Microsoft chose to use? I don't sending a specific character set is a choice most users make. I have 84 messages in my inbox with windows-1252 character set. A lot of those are personal messages sent by friends that are clueless as far as their computers are concerned. So, unless you can get Microsoft to configure their clients so they don't send that character set by default, or unless you don't have any friends with Windows, you might research it a bit more before you block. Bret
Re: Filtering windows-1252 charset
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Philip Prindeville wrote: Jonathan Armitage wrote: I see some spam with windows-1252 or other unwanted character sets at the start of the subject. I reject them via an Exim ACL, so SA doesn't even have to scan them. Which brings up the subject... How legitimate is email sent as windows-1252? I have a bunch of stuff from paypal and ebay, and much more, which include this charset. I'm not attempting to answer the philosophical question, just the statistical one. C. - -- Craig McLeanhttp://fukka.co.uk [EMAIL PROTECTED] Where the fun never starts Powered by FreeBSD, and GIN! -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.3 (GNU/Linux) iD8DBQFEbJwCMDDagS2VwJ4RAgcdAJ0bIf+EPRmsGEFhqeamY6W5dWBwVgCeLbPf dALIAlLZans4C6EM6R17nyU= =IUJJ -END PGP SIGNATURE-
Re: Filtering windows-1252 charset
Philip Prindeville wrote on Thu, 18 May 2006 08:47:48 -0600: How legitimate is email sent as windows-1252? Very, because broken Windows clients use it. Kai -- Kai Schätzl, Berlin, Germany Get your web at Conactive Internet Services: http://www.conactive.com
Filtering windows-1252 charset
I was trying to filter messages like: Return-Path: [EMAIL PROTECTED] Received: from redfish-solutions.com (ppp125-53.dsl-coc.eth.net [61.11.125.53] (may be forged)) by mail.redfish-solutions.com (8.13.1/8.13.1) with ESMTP id k1SGqvTs021448 for [EMAIL PROTECTED]; Tue, 28 Feb 2006 09:53:01 -0700 Message-Id: [EMAIL PROTECTED] From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: Re: Sample Date: Tue, 28 Feb 2006 22:23:05 +0530 MIME-Version: 1.0 Content-Type: multipart/mixed; boundary==_NextPart_000_0016=_NextPart_000_0016 X-Priority: 3 X-MSMail-Priority: Normal X-Scanned-By: MIMEDefang 2.56 on 192.168.1.2 This is a multi-part message in MIME format. --=_NextPart_000_0016=_NextPart_000_0016 Content-Type: text/plain; charset=Windows-1252 Content-Transfer-Encoding: 7bit I have corrected your document. --=_NextPart_000_0016=_NextPart_000_0016 Content-Type: application/octet-stream; name=document04.zip Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename=document04.zip [snip] Using: # don't allow windows-1252 text attachments... header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i meta L_WIN_CHARSET ((__CTYPE_TEXT_PLAIN || __CTYPE_HTML) __CTYPE_WIN_1252) describe L_WIN_CHARSET Content-Type is Windows-specific text score L_WIN_CHARSET 0.1 but after saving the email to a file and running spamassassin over it by hand, I'm not seeing __CTYPE_WIN_1252 in the rules that matched: [1769] dbg: check: subtests=__CT,__CTYPE_HAS_BOUNDARY,__ENV_AND_HDR_FROM_MATCH,__FROM_YAHOO_COM,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_PRIORITY,__MIME_ATTACHMENT,__MIME_BASE64,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NEXTPART_ALL,__NONEMPTY_BODY,__SANE_MSGID,__TOCC_EXISTS What am I missing? -Philip
Re: Filtering windows-1252 charset
On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote: --=_NextPart_000_0016=_NextPart_000_0016 Content-Type: text/plain; charset=Windows-1252 Content-Transfer-Encoding: 7bit Using: # don't allow windows-1252 text attachments... header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i What am I missing? the charset isn't in the message header, it's in the mime header. you can use the MIMEHeader plugin if you want to. -- Randomly Generated Tagline: 640K ought to be enough for anybody. - Bill Gates, 1981 pgpJNWw5Dp7hz.pgp Description: PGP signature
Re: Filtering windows-1252 charset
Theo Van Dinter wrote: On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote: --=_NextPart_000_0016=_NextPart_000_0016 Content-Type: text/plain; charset=Windows-1252 Content-Transfer-Encoding: 7bit Using: # don't allow windows-1252 text attachments... header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i What am I missing? the charset isn't in the message header, it's in the mime header. you can use the MIMEHeader plugin if you want to. Ok, so I have to use: mimeheader __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i instead. As for the rest... Are there the equivalent subtests of __CTYPE_TEXT_PLAIN and __CTYPE_HTML for the mime header portions? -Philip
Re: Filtering windows-1252 charset
Theo Van Dinter wrote: On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote: --=_NextPart_000_0016=_NextPart_000_0016 Content-Type: text/plain; charset=Windows-1252 Content-Transfer-Encoding: 7bit Using: # don't allow windows-1252 text attachments... header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i What am I missing? the charset isn't in the message header, it's in the mime header. you can use the MIMEHeader plugin if you want to. I see some spam with windows-1252 or other unwanted character sets at the start of the subject. I reject them via an Exim ACL, so SA doesn't even have to scan them.
Re: Filtering windows-1252 charset
If anyone would like to make use of it, I ended up using: # for mime headers... mimeheader __CTYPE_MH_TEXT_PLAIN Content-Type =~ /text\/plain/i mimeheader __CTYPE_MH_HTML Content-Type =~ /text\/html/i # don't allow windows-1252 text attachments... mimeheader __CTYPE_MH_WIN1252 Content-Type =~ /charset=\windows-1252\/i meta L_WIN_CHARSET ((__CTYPE_MH_HTML || __CTYPE_MH_TEXT_PLAIN) __CTYPE_MH_WIN1252) describe L_WIN_CHARSET Content-Type is Windows-specific text score L_WIN_CHARSET 0.1 and it works fine (or at least, it did against my test set of data). If you're in certain parts of the world, it might be worth matching against: /charset=\windows-125[1-9]\/i instead. -Philip