Re: Filtering windows-1252 charset

2006-05-22 Thread Philip Prindeville

Kai Schaetzl wrote:

Philip Prindeville wrote on Thu, 18 May 2006 08:47:48 -0600:

  
How legitimate is email sent as 
windows-1252?



Very, because broken Windows clients use it.

Kai
  


Ah, the Strong Arm school of standards enforcement.  ;-)

-Philip



Re: Filtering windows-1252 charset

2006-05-18 Thread Philip Prindeville
Jonathan Armitage wrote:

I see some spam with windows-1252 or other unwanted character sets at 
the start of the subject. I reject them via an Exim ACL, so SA doesn't 
even have to scan them.
  


Which brings up the subject...  How legitimate is email sent as
windows-1252?

I see absolutely no reason to send it, since it offers no advantage over
iso-8859-1
or utf-8, and the RFC's are pretty clear about using the smallest
encoding that
will fit a message, i.e. usascii = iso-8859-1 = utf-8 (in that order).

Further, if you're in the Unix world (or more broadly, not in the
Windows world),
why would you want to use vendor-specific encodings for no reason other than
they're the broken defaults Microsoft chose to use?

-Philip



RE: Filtering windows-1252 charset

2006-05-18 Thread Bret Miller
 Which brings up the subject...  How legitimate is email sent as
 windows-1252?

 I see absolutely no reason to send it, since it offers no
 advantage over
 iso-8859-1
 or utf-8, and the RFC's are pretty clear about using the smallest
 encoding that
 will fit a message, i.e. usascii = iso-8859-1 = utf-8 (in
 that order).

 Further, if you're in the Unix world (or more broadly, not in the
 Windows world),
 why would you want to use vendor-specific encodings for no
 reason other than
 they're the broken defaults Microsoft chose to use?

I don't sending a specific character set is a choice most users make. I
have 84 messages in my inbox with windows-1252 character set. A lot of
those are personal messages sent by friends that are clueless as far as
their computers are concerned. So, unless you can get Microsoft to
configure their clients so they don't send that character set by
default, or unless you don't have any friends with Windows, you might
research it a bit more before you block.

Bret





Re: Filtering windows-1252 charset

2006-05-18 Thread Craig McLean
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Philip Prindeville wrote:
 Jonathan Armitage wrote:
 
 I see some spam with windows-1252 or other unwanted character sets at 
 the start of the subject. I reject them via an Exim ACL, so SA doesn't 
 even have to scan them.
  

 
 Which brings up the subject...  How legitimate is email sent as
 windows-1252?

I have a bunch of stuff from paypal and ebay, and much more, which
include this charset.
I'm not attempting to answer the philosophical question, just the
statistical one.

C.

- --
Craig McLeanhttp://fukka.co.uk
[EMAIL PROTECTED]   Where the fun never starts
Powered by FreeBSD, and GIN!
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQFEbJwCMDDagS2VwJ4RAgcdAJ0bIf+EPRmsGEFhqeamY6W5dWBwVgCeLbPf
dALIAlLZans4C6EM6R17nyU=
=IUJJ
-END PGP SIGNATURE-


Re: Filtering windows-1252 charset

2006-05-18 Thread Kai Schaetzl
Philip Prindeville wrote on Thu, 18 May 2006 08:47:48 -0600:

 How legitimate is email sent as 
 windows-1252?

Very, because broken Windows clients use it.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com





Filtering windows-1252 charset

2006-04-03 Thread Philip Prindeville
I was trying to filter messages like:

 Return-Path: [EMAIL PROTECTED]
 Received: from redfish-solutions.com (ppp125-53.dsl-coc.eth.net
 [61.11.125.53] (may be forged))
 by mail.redfish-solutions.com (8.13.1/8.13.1) with ESMTP id
 k1SGqvTs021448
 for [EMAIL PROTECTED]; Tue, 28 Feb 2006
 09:53:01 -0700
 Message-Id: [EMAIL PROTECTED]
 From: [EMAIL PROTECTED]
 To: [EMAIL PROTECTED]
 Subject: Re: Sample
 Date: Tue, 28 Feb 2006 22:23:05 +0530
 MIME-Version: 1.0
 Content-Type: multipart/mixed;
 boundary==_NextPart_000_0016=_NextPart_000_0016
 X-Priority: 3
 X-MSMail-Priority: Normal
 X-Scanned-By: MIMEDefang 2.56 on 192.168.1.2

 This is a multi-part message in MIME format.

 --=_NextPart_000_0016=_NextPart_000_0016
 Content-Type: text/plain;
 charset=Windows-1252
 Content-Transfer-Encoding: 7bit

 I have corrected your document.


 --=_NextPart_000_0016=_NextPart_000_0016
 Content-Type: application/octet-stream;
 name=document04.zip
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
 filename=document04.zip

 [snip]


Using:

# don't allow windows-1252 text attachments...
header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i
meta L_WIN_CHARSET  ((__CTYPE_TEXT_PLAIN || __CTYPE_HTML) 
__CTYPE_WIN_1252)
describe L_WIN_CHARSET  Content-Type is Windows-specific text
score L_WIN_CHARSET 0.1


but after saving the email to a file and running spamassassin over it by
hand, I'm not seeing __CTYPE_WIN_1252 in the rules that matched:

 [1769] dbg: check:
 subtests=__CT,__CTYPE_HAS_BOUNDARY,__ENV_AND_HDR_FROM_MATCH,__FROM_YAHOO_COM,__HAS_MSGID,__HAS_MSMAIL_PRI,__HAS_RCVD,__HAS_SUBJECT,__HAS_X_PRIORITY,__MIME_ATTACHMENT,__MIME_BASE64,__MIME_VERSION,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__NEXTPART_ALL,__NONEMPTY_BODY,__SANE_MSGID,__TOCC_EXISTS



What am I missing?

-Philip



Re: Filtering windows-1252 charset

2006-04-03 Thread Theo Van Dinter
On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote:
  --=_NextPart_000_0016=_NextPart_000_0016
  Content-Type: text/plain;
  charset=Windows-1252
  Content-Transfer-Encoding: 7bit
 
 Using:
 # don't allow windows-1252 text attachments...
 header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i
 What am I missing?

the charset isn't in the message header, it's in the mime header.  you can use
the MIMEHeader plugin if you want to.

-- 
Randomly Generated Tagline:
640K ought to be enough for anybody. - Bill Gates, 1981


pgpJNWw5Dp7hz.pgp
Description: PGP signature


Re: Filtering windows-1252 charset

2006-04-03 Thread Philip Prindeville
Theo Van Dinter wrote:

On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote:
  

--=_NextPart_000_0016=_NextPart_000_0016
Content-Type: text/plain;
charset=Windows-1252
Content-Transfer-Encoding: 7bit

  

Using:
# don't allow windows-1252 text attachments...
header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i
What am I missing?



the charset isn't in the message header, it's in the mime header.  you can use
the MIMEHeader plugin if you want to.

  


Ok, so I have to use:

mimeheader   __CTYPE_WIN_1252 Content-Type =~
/charset=\windows-1252\/i

instead.  As for the rest...  Are there the equivalent subtests of
__CTYPE_TEXT_PLAIN and __CTYPE_HTML for the mime header portions?

-Philip



Re: Filtering windows-1252 charset

2006-04-03 Thread Jonathan Armitage

Theo Van Dinter wrote:

On Mon, Apr 03, 2006 at 12:07:00PM -0600, Philip Prindeville wrote:

--=_NextPart_000_0016=_NextPart_000_0016
Content-Type: text/plain;
charset=Windows-1252
Content-Transfer-Encoding: 7bit


Using:
# don't allow windows-1252 text attachments...
header __CTYPE_WIN_1252 Content-Type =~ /charset=\windows-1252\/i
What am I missing?


the charset isn't in the message header, it's in the mime header.  you can use
the MIMEHeader plugin if you want to.

I see some spam with windows-1252 or other unwanted character sets at 
the start of the subject. I reject them via an Exim ACL, so SA doesn't 
even have to scan them.


Re: Filtering windows-1252 charset

2006-04-03 Thread Philip Prindeville
If anyone would like to make use of it, I ended up using:

# for mime headers...
mimeheader __CTYPE_MH_TEXT_PLAIN Content-Type =~ /text\/plain/i
mimeheader __CTYPE_MH_HTML  Content-Type =~ /text\/html/i

# don't allow windows-1252 text attachments...
mimeheader __CTYPE_MH_WIN1252   Content-Type =~ /charset=\windows-1252\/i
meta L_WIN_CHARSET  ((__CTYPE_MH_HTML ||
__CTYPE_MH_TEXT_PLAIN)  __CTYPE_MH_WIN1252)
describe L_WIN_CHARSET  Content-Type is Windows-specific text
score L_WIN_CHARSET 0.1


and it works fine (or at least, it did against my test set of data).  If
you're
in certain parts of the world, it might be worth matching against:

/charset=\windows-125[1-9]\/i

instead.

-Philip