Re: Rule to scan for .html attachments?

2013-05-31 Thread Martin Gregorie
On Fri, 2013-05-31 at 17:48 -0400, Andrew Talbot wrote:
> Thank you for your response. The original test was using a file
> arbitrarily named aa.html .. It still doesn't work with the rewrite
> you provided :/ 
> 
I did wonder. Its absolutely essential to have at least one genuine
message to test your rule against. This would preferably be a spam
message you've recently received. Failing that it would be a message
you've written and then sent to yourself. Either way *you must not
modify the format of the message*[1] that arrived in your mailbox
because that's the only way to guarantee that your test data is as close
as possible to what will be passing through your production SA
installation.

If you merely lash up a fake message with a text editor, any resemblance
between it and genuine spam is purely coincidental because its been
subject to your (mis)understandings and prejudices about the formatting
of a real spam message.

[1] anonymising e-mail addresses, etc is OK provided the format of the
message isn't changed.

If you can post an anonymised message we may be able to provide more
help. However, without sight of the relevant parts of the messages that
you're trying to recognise with the rule its impossible to know why the
rule doesn't match the MIME header and very difficult to reliably
diagnose your problem.


Martin

 
> 
> 
> 
> > -Original Message-
> > From: Martin Gregorie [mailto:mar...@gregorie.org]
> > Sent: Friday, May 31, 2013 3:38 PM
> > To: users@spamassassin.apache.org
> > Subject: Re: Rule to scan for .html attachments?
> > 
> > On Fri, 2013-05-31 at 14:45 -0400, Andrew Talbot wrote:
> > > I need it to fire on any HTML attachment. The modules are enabled. I
> > > can get it to pick up text/html, remember, but the problem is that it
> > > detects messages sent as HTML when it's set up like that. It doesn't
> > > detect plain-text messages, but it will flag plain-text messages with
> > > HTML files attached.
> > >
> > Well, that's exactly what your second rule won't do: it will only fire on 
> > the
> > header of an html attachment for a file that has one of a very restricted 
> > set
> > of filenames. As you haven't posted any example MIME header sets I can
> > only guess, but my guess is that none of the messages you've tried it 
> > against
> > have attachments with names that match the restriction.
> > 
> > As I said before the rule can't work with the '^' in place, because that 
> > says
> > that the 'filename=' string must be at the beginning of a line and NOT
> > preceded by any white space. Thats a harmful restriction because you never
> > see MIME headers like that. With the '^' removed the rule
> > becomes:
> > 
> > header HTML_ATTACH_RULE_2 Content-Disposition =~  /filename\=\"[a-
> > z]{2}\.html\"/i
> > 
> > which has a better chance of working. This version will only fire if the
> > filename associated with the attachment has precisely two alphabetic
> > characters plus a .html extension, i.e. it will fire on filename="aa.html" 
> > or
> > filename="ZZ.HTML" because the trailing 'i' makes it a caseless match, but 
> > it
> > won't fire on filename="cat.html"
> > or filename="x.html" because these don't have two character names and it
> > won't fire if the attachment follows the common Windows convention of
> > using a .htm extension.
> > 
> > If you want the rule to fire on *any* HTML attachment it should be:
> > 
> > header HTML_ATTACH_RULE_2 Content-Disposition =~
> > /filename\=\".{0,30}\.html{0,1}\"/i
> > 
> > which will match any filename with a .html or .htm extension (including
> > ".html" and ".htm").
> > 
> > Could I respectfully suggest that you learn about Perl regular expressions
> > before you try writing any more SA rules? SA rules are all based on using 
> > the
> > Perl flavour of regular expressions to match character strings in headers 
> > and
> > the message body.
> > 
> > You could do a lot worse than getting a copy of "Programming Perl" by Larry
> > Wall, Tom Christiansen & Jon Orwant, published by O'Reilly. If there isn't 
> > one
> > in the firm's technical library, they should be willing to buy a copy. Its 
> > a brick
> > of a book, but you only need to read "Chapter
> > 5: Pattern Matching" to write SA rules and in any case the rest of its 
> > contents
> > will come in handy in future if anybody needs to write Perl programs or SA
> > extension modules.
> > 
> > 
> > Martin
> > 
> > 
> > 
> > 
> 
> 





RE: Rule to scan for .html attachments?

2013-05-31 Thread Andrew Talbot
Hi, Martin -

Thank you for your response. The original test was using a file arbitrarily 
named aa.html .. It still doesn't work with the rewrite you provided :/ 





> -Original Message-
> From: Martin Gregorie [mailto:mar...@gregorie.org]
> Sent: Friday, May 31, 2013 3:38 PM
> To: users@spamassassin.apache.org
> Subject: Re: Rule to scan for .html attachments?
> 
> On Fri, 2013-05-31 at 14:45 -0400, Andrew Talbot wrote:
> > I need it to fire on any HTML attachment. The modules are enabled. I
> > can get it to pick up text/html, remember, but the problem is that it
> > detects messages sent as HTML when it's set up like that. It doesn't
> > detect plain-text messages, but it will flag plain-text messages with
> > HTML files attached.
> >
> Well, that's exactly what your second rule won't do: it will only fire on the
> header of an html attachment for a file that has one of a very restricted set
> of filenames. As you haven't posted any example MIME header sets I can
> only guess, but my guess is that none of the messages you've tried it against
> have attachments with names that match the restriction.
> 
> As I said before the rule can't work with the '^' in place, because that says
> that the 'filename=' string must be at the beginning of a line and NOT
> preceded by any white space. Thats a harmful restriction because you never
> see MIME headers like that. With the '^' removed the rule
> becomes:
> 
> header HTML_ATTACH_RULE_2 Content-Disposition =~  /filename\=\"[a-
> z]{2}\.html\"/i
> 
> which has a better chance of working. This version will only fire if the
> filename associated with the attachment has precisely two alphabetic
> characters plus a .html extension, i.e. it will fire on filename="aa.html" or
> filename="ZZ.HTML" because the trailing 'i' makes it a caseless match, but it
> won't fire on filename="cat.html"
> or filename="x.html" because these don't have two character names and it
> won't fire if the attachment follows the common Windows convention of
> using a .htm extension.
> 
> If you want the rule to fire on *any* HTML attachment it should be:
> 
> header HTML_ATTACH_RULE_2 Content-Disposition =~
> /filename\=\".{0,30}\.html{0,1}\"/i
> 
> which will match any filename with a .html or .htm extension (including
> ".html" and ".htm").
> 
> Could I respectfully suggest that you learn about Perl regular expressions
> before you try writing any more SA rules? SA rules are all based on using the
> Perl flavour of regular expressions to match character strings in headers and
> the message body.
> 
> You could do a lot worse than getting a copy of "Programming Perl" by Larry
> Wall, Tom Christiansen & Jon Orwant, published by O'Reilly. If there isn't one
> in the firm's technical library, they should be willing to buy a copy. Its a 
> brick
> of a book, but you only need to read "Chapter
> 5: Pattern Matching" to write SA rules and in any case the rest of its 
> contents
> will come in handy in future if anybody needs to write Perl programs or SA
> extension modules.
> 
> 
> Martin
> 
> 
> 
> 




Re: Rule to scan for .html attachments?

2013-05-31 Thread Karsten Bräckelmann
On Fri, 2013-05-31 at 11:51 -0400, Andrew Talbot wrote:
> header HTML_ATTACH_RULE_2

You will need a mimeheader [1] rule. A header rule matches the mail
headers only.

>  Content-Disposition =~ /^filename\=\"[a-z]{2}\.html\"/i

That is not matching an arbitrary HTML filename. That's exactly 2
characters A-Z, case-insensitive. Also, the Content-Disposition header
commonly starts with "inline" or "attached" before the filename.

  /filename="[^"]+\.html"/i

Inside the double-quotes, a filename with an html extension, and a
basename of any chars but the double-quote.

HTH


[1] 
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Plugin_MIMEHeader.html

-- 
char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



RE: Rule to scan for .html attachments?

2013-05-31 Thread John Hardin

On Fri, 31 May 2013, Andrew Talbot wrote:

I need it to fire on any HTML attachment. The modules are enabled. I can 
get it to pick up text/html, remember, but the problem is that it 
detects messages sent as HTML when it's set up like that.


Meta it with a negated subrule that hits on regular header Content-Type: 
multipart/alternative; ?


It doesn't detect plain-text messages, but it will flag plain-text 
messages with HTML files attached.




-Original Message-
From: Martin Gregorie [mailto:mar...@gregorie.org]
Sent: Friday, May 31, 2013 2:35 PM
To: users@spamassassin.apache.org
Subject: Re: Rule to scan for .html attachments?

On Fri, 2013-05-31 at 14:10 -0400, Andrew Talbot wrote:

That didn't work :(


Can you post one or two examples of actual MIME attachment headers that
you're trying to get the rule to fire on?

Obvious question, but have you enabled the MIME header module?
I'm using MimeMagic and enabling it requires that MimeMagic.pm and
MimeMagic.cf be included in /etc/mail/spamassassin (or wherever you have
told SA to look for its configuration etc.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Our government wants to do everything it can "for the children,"
  except sparing them crushing tax burdens.
---
 6 days until the 69th anniversary of D-Day


Re: Rule to scan for .html attachments?

2013-05-31 Thread Martin Gregorie
On Fri, 2013-05-31 at 14:45 -0400, Andrew Talbot wrote:
> I need it to fire on any HTML attachment. The modules are enabled. I
> can get it to pick up text/html, remember, but the problem is that it
> detects messages sent as HTML when it's set up like that. It doesn't
> detect plain-text messages, but it will flag plain-text messages with
> HTML files attached. 
> 
Well, that's exactly what your second rule won't do: it will only fire
on the header of an html attachment for a file that has one of a very
restricted set of filenames. As you haven't posted any example MIME
header sets I can only guess, but my guess is that none of the messages
you've tried it against have attachments with names that match the
restriction.

As I said before the rule can't work with the '^' in place, because that
says that the 'filename=' string must be at the beginning of a line
and NOT preceded by any white space. Thats a harmful restriction because
you never see MIME headers like that. With the '^' removed the rule
becomes:

header HTML_ATTACH_RULE_2 Content-Disposition =~
 /filename\=\"[a-z]{2}\.html\"/i

which has a better chance of working. This version will only fire if the
filename associated with the attachment has precisely two alphabetic
characters plus a .html extension, i.e. it will fire on
filename="aa.html" or filename="ZZ.HTML" because the trailing 'i' makes
it a caseless match, but it won't fire on filename="cat.html"
or filename="x.html" because these don't have two character names and it
won't fire if the attachment follows the common Windows convention of
using a .htm extension.

If you want the rule to fire on *any* HTML attachment it should be:

header HTML_ATTACH_RULE_2 Content-Disposition =~
 /filename\=\".{0,30}\.html{0,1}\"/i

which will match any filename with a .html or .htm extension (including
".html" and ".htm").
 
Could I respectfully suggest that you learn about Perl regular
expressions before you try writing any more SA rules? SA rules are all
based on using the Perl flavour of regular expressions to match
character strings in headers and the message body. 

You could do a lot worse than getting a copy of "Programming Perl" by
Larry Wall, Tom Christiansen & Jon Orwant, published by O'Reilly. If
there isn't one in the firm's technical library, they should be willing
to buy a copy. Its a brick of a book, but you only need to read "Chapter
5: Pattern Matching" to write SA rules and in any case the rest of its
contents will come in handy in future if anybody needs to write Perl
programs or SA extension modules.


Martin







Re: Rule to scan for .html attachments?

2013-05-31 Thread David F. Skoll
On Fri, 31 May 2013 14:43:27 -0400
"Andrew Talbot"  wrote:

> That's what I was afraid of. We generally avoid those kinds of rules
> since we are scanning millions of messages a day. 

Well, a few rules won't hurt.  We peak at around 6 million messages/day,
though we do have quite beefy servers.

A few hundred "full" rules will start hurting, though...

Regards,

David.


RE: Rule to scan for .html attachments?

2013-05-31 Thread Andrew Talbot
I need it to fire on any HTML attachment. The modules are enabled. I can get it 
to pick up text/html, remember, but the problem is that it detects messages 
sent as HTML when it's set up like that. It doesn't detect plain-text messages, 
but it will flag plain-text messages with HTML files attached. 


> -Original Message-
> From: Martin Gregorie [mailto:mar...@gregorie.org]
> Sent: Friday, May 31, 2013 2:35 PM
> To: users@spamassassin.apache.org
> Subject: Re: Rule to scan for .html attachments?
> 
> On Fri, 2013-05-31 at 14:10 -0400, Andrew Talbot wrote:
> > That didn't work :(
> >
> Can you post one or two examples of actual MIME attachment headers that
> you're trying to get the rule to fire on?
> 
> Obvious question, but have you enabled the MIME header module?
> I'm using MimeMagic and enabling it requires that MimeMagic.pm and
> MimeMagic.cf be included in /etc/mail/spamassassin (or wherever you have
> told SA to look for its configuration etc.
> 
> 
> Martin
> 
> 




RE: Rule to scan for .html attachments?

2013-05-31 Thread Andrew Talbot
That's what I was afraid of. We generally avoid those kinds of rules since
we are scanning millions of messages a day. 

> -Original Message-
> From: David F. Skoll [mailto:d...@roaringpenguin.com]
> Sent: Friday, May 31, 2013 2:22 PM
> To: users@spamassassin.apache.org
> Subject: Re: Rule to scan for .html attachments?
> 
> On Fri, 31 May 2013 14:10:36 -0400
> Andrew Talbot  wrote:
> 
> > That didn't work :(
> 
> What didn't work?  Oh... you top-posted.
> 
> Anyway... you might need a "full" rule, which can be expensive.
> Something like:
> 
> full HTML_RULE /Content-
> Disposition:.{0,50}name\s{0,2}=\s{0,2}\"?.{0,50}\.html?/i
> 
> Completely untested, of course! :)
> 
> Regards,
> 
> David.



Re: Rule to scan for .html attachments?

2013-05-31 Thread Martin Gregorie
On Fri, 2013-05-31 at 14:10 -0400, Andrew Talbot wrote:
> That didn't work :(
> 
Can you post one or two examples of actual MIME attachment headers that
you're trying to get the rule to fire on?

Obvious question, but have you enabled the MIME header module? 
I'm using MimeMagic and enabling it requires that MimeMagic.pm and
MimeMagic.cf be included in /etc/mail/spamassassin (or wherever you have
told SA to look for its configuration etc.
 

Martin





Re: Rule to scan for .html attachments?

2013-05-31 Thread David F. Skoll
On Fri, 31 May 2013 14:10:36 -0400
Andrew Talbot  wrote:

> That didn't work :(

What didn't work?  Oh... you top-posted.

Anyway... you might need a "full" rule, which can be expensive.
Something like:

full HTML_RULE 
/Content-Disposition:.{0,50}name\s{0,2}=\s{0,2}\"?.{0,50}\.html?/i

Completely untested, of course! :)

Regards,

David.


Re: Rule to scan for .html attachments?

2013-05-31 Thread Andrew Talbot
Didn't work with mime_header (or mimeheader) with either rule.


On Fri, May 31, 2013 at 12:23 PM, Axb  wrote:

> On 05/31/2013 05:51 PM, Andrew Talbot wrote:
>
>> Hey all -
>>
>> I'm trying to set up a custom rule that scores HTML attachments.
>>
>> The problem I'm running across is that using a rule like this one:
>> mimeheader HTML_ATTACH Content-Type =~ /^text\/html/i
>>
>> Will flag all messages that come in as HTML (vs. plain text).
>>
>> I found this :
>> header HTML_ATTACH_RULE_2 Content-Disposition =~
>> /^filename\=\"[a-z]{2}\.html\"**/i
>>
>> But that doesn't ... Work ... At all.
>>
>>
>> Any suggestions? Is this even possible?
>>
>>
> use mime_header instead of header
>


Re: Rule to scan for .html attachments?

2013-05-31 Thread Andrew Talbot
That didn't work :(



On Fri, May 31, 2013 at 12:40 PM, Martin Gregorie wrote:

> On Fri, 2013-05-31 at 11:51 -0400, Andrew Talbot wrote:
> > I'm trying to set up a custom rule that scores HTML attachments.
> >
> ..snippage..
>
> > I found this :
> header HTML_ATTACH_RULE_2 Content-Disposition =~
> > /^filename\=\"[a-z]{2}\.html\"/i
> >
> Don't anchor it to the start of the line, i.e. try this:
>
> header HTML_RULE Content-Disposition =~ /filename\=\"[a-z]{2}\.html\"/i
>
> I have a very similar rule for matching ZIP file attachments whose name
> is xx.zip which works as expected. The only significant difference from
> your rule is that it doesn't use the '^' BOL anchor symbol. My guess is
> that SA's body text parser converts the MIME header into one line, so
> requiring 'filename' to be at the start of the line will always fail.
>
>
> Martin
>
>
>
>


Re: Rule to scan for .html attachments?

2013-05-31 Thread Martin Gregorie
On Fri, 2013-05-31 at 11:51 -0400, Andrew Talbot wrote:
> I'm trying to set up a custom rule that scores HTML attachments.
> 
..snippage..

> I found this :
header HTML_ATTACH_RULE_2 Content-Disposition =~
> /^filename\=\"[a-z]{2}\.html\"/i
> 
Don't anchor it to the start of the line, i.e. try this:

header HTML_RULE Content-Disposition =~ /filename\=\"[a-z]{2}\.html\"/i

I have a very similar rule for matching ZIP file attachments whose name
is xx.zip which works as expected. The only significant difference from
your rule is that it doesn't use the '^' BOL anchor symbol. My guess is
that SA's body text parser converts the MIME header into one line, so
requiring 'filename' to be at the start of the line will always fail.
 

Martin





Re: Rule to scan for .html attachments?

2013-05-31 Thread Axb

On 05/31/2013 05:51 PM, Andrew Talbot wrote:

Hey all -

I'm trying to set up a custom rule that scores HTML attachments.

The problem I'm running across is that using a rule like this one:
mimeheader HTML_ATTACH Content-Type =~ /^text\/html/i

Will flag all messages that come in as HTML (vs. plain text).

I found this :
header HTML_ATTACH_RULE_2 Content-Disposition =~
/^filename\=\"[a-z]{2}\.html\"/i

But that doesn't ... Work ... At all.


Any suggestions? Is this even possible?



use mime_header instead of header