RE: Naming conventions for tests

2006-05-24 Thread Ben Kreunen
  The  main  problem with this approach is that it requires monitoring
  of  the  SPAM  assassin  tests  being  applied  as  the  software is
  updated...
 
 Well,  I'd  say  this  is  a  problem  chiefly  because  whoever  _is_
 administering  the server -- not spamassassin.apache.org -- is clearly
 not encouraging the use of granular client-side filtering.

While true, it is not necessarily the cause of the problem. I've been
using small text strings for my filters to cover a number of SpamAssassin
tests, mainly for the convenience of not having to include a separate filter
for each individual test. The problem I've had is that while tests that have
the same text strings are often similar in nature/function, there are often
additional similar tests that have slight variations of these text strings.
Including these would require complicating the filter(s) and also require
revising the filter with each upgrade. Finding the intial text string to use
is also a very unscientific process of collecting test results and examining
them manually... rather more of a hack than a procedure (albeit a very
effective one).

 If  filtering  on  more  than  the Spam Score were an expectation from
 end-to-end, you would have a consistently updated list provided to you
 by  your  mail  admin,  through an intranet portal or whatever. It's a
 virtual  certainty  that your mail admin is using rules and metas that
 don't ship with SA. What would you do about those?

In theory, also true, but that's a whole different can of worms. My aim has
been to stay out of the can of worms and simply make better use of what we
have. In our case, responsibility for determining what is and isn't spam
rests entirely with the user (apart from a conservative server-side
rejection of score 15). I'm not going to argue whether that's right or
wrong... but it's the situation that we're in, and there is some scope for
making things more efficient for the end user.

A consistent naming convention would make it easier/more efficient for end
users to filter out certain groups of messages regardless of what the server
admins were or weren't doing. Server admins could also use these conventions
for any custom filters they created to provide additional improvements.

Cheers

Ben Kreunen

Imaging and IT Coordinator
Department of Pathology
The University of Melbourne



RE: Naming conventions for tests

2006-05-23 Thread Chris Santerre
Title: RE: Naming conventions for tests







 -Original Message-
 From: Ben Kreunen [mailto:[EMAIL PROTECTED]]
 Sent: Monday, May 22, 2006 8:07 PM
 To: SPAMAssassin email list
 Subject: Naming conventions for tests
 
 
 Hi All
 
 I've been approaching the problem of filtering spam at the 
 email client end
 using the SpamAssassin (3.x) header. Our email server (over 
 which I have no
 control) has a couple of server-side filters that reject emails with
 infected attachments and messages with a spam score  15. 
 This leaves me
 with about 100 spam messages per day.
 
 Rather than rely on the numerical value of the X-Spam-Score 
 header I've been
 looking at client side filters using text strings to pick out 
 groups of
 SpammAssassin tests. Many tests that are similar in nature 
 have common text
 strings, allowing you to create a filter for a single term 
 that includes a
 wide number of tests. The effectiveness of this approach 
 could be improved
 with a better naming scheme for the tests.
 
 The first filter I trialled picks up many tests for 
 blacklisted domains/urls
 using two text strings: 
 X-Spam-Score contains RCVD_IN OR contains BL_
 
 Unfortunately RCVD_IN also includes some good tests so I 
 had to split
 this into two filters:
 X-Spam-Score contains RCVD_IN AND does not contain _IADB_ 
 AND does not
 contain _BSP_
 X-Spam-Score contains BL_
 
 While these two filters do not cover all blacklist tests (and 
 includes other
 types of tests) they do pick up 90% of spam (for me), with 
 numerical scores
 down to 0.35. The main problem with this approach is that it requires
 monitoring of the SPAM assassin tests being applied as the software is
 updated to ensure that it doesn't pick up additional tests 
 for good email.
 On the positive side, the learning aspect of this filter is 
 done by the
 various blacklists.
 
 If the SpamAssassin test could be named with more consistent 
 text strings it
 would be simpler to set up client side filters. 
 E.g. 
 All tests for blacklists contain _BL_
 All possible porn to start with PORN_
 
 Cheers
 
 Ben Kreunen
 
 Imaging and IT Coordinator
 Department of Pathology
 The University of Melbourne


Would it not be easier to create meta rules for the rules you are looking for, then simply add more points for those? Thats what most of us do. Otherwise you are prbly fighting a losing battle trying to get a standard naming scheme. Its a great idea, that simply won't get followed. 

And it might FP less. I can get lots of Ham that hits PORN_ rules. I have lots of friends with potty mouths :) 


Chris Santerre
SysAdmin and SARE/URIBL ninja
http://www.uribl.com
http://www.rulesemporium.com








RE: Naming conventions for tests

2006-05-23 Thread Ben Kreunen
 
 Would it not be easier to create meta rules for the rules you 
 are looking for, then simply add more points for those? Thats 
 what most of us do. Otherwise you are prbly fighting a losing 
 battle trying to get a standard naming scheme. Its a great 
 idea, that simply won't get followed. 

It would, except that I am working solely at the client end, ie. I have no
direct (or indirect) influence on what happens on the server. From where I
stand it's a toss up as to which organisational change is easier to affect
;-)
 
 And it might FP less. I can get lots of Ham that hits PORN_ 
 rules. I have lots of friends with potty mouths :) 

And that's where working at the client end has its benefits. When
incorporating spam filters into standard email filters, users have greater
flexibility as to when a filter is applied. They can filter out ham first
and then apply a filter to treat the remainder as spam.

Having looked through the emails on this list it seems that most of the
focus is on removing spam at the server, but SpamAssassin also provides
users with a useful tool to exercise their own control over what they decide
is spam.

Cheers

Ben Kreunen

Imaging and IT Coordinator
Department of Pathology
The University of Melbourne



Re: Naming conventions for tests

2006-05-23 Thread Sanford Whiteman
 The  main  problem with this approach is that it requires monitoring
 of  the  SPAM  assassin  tests  being  applied  as  the  software is
 updated...

Well,  I'd  say  this  is  a  problem  chiefly  because  whoever  _is_
administering  the server -- not spamassassin.apache.org -- is clearly
not encouraging the use of granular client-side filtering.

If  filtering  on  more  than  the Spam Score were an expectation from
end-to-end, you would have a consistently updated list provided to you
by  your  mail  admin,  through an intranet portal or whatever. It's a
virtual  certainty  that your mail admin is using rules and metas that
don't ship with SA. What would you do about those?

--Sandy