Re: real-time network results

Justin Mason 19 Jan 2005 03:01:00 -0000

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


Daniel Quinlan writes:
> So, we have the _RBL_ header tag and now the NetCache plugin to attempt
> some sort of off-in-the-future real-time network results thing, but
> those aren't going to work unless all the corpus mail has those header
> modifications headers.
> 
> I'm just wondering if those aren't overkill (and it seems to be too much
> work for us to implement) and whether we should just modify our code to
> suck out rule hits from the X-Spam-Status: header.  Most of our DNSBL
> checks are fairly stable in name and we can map the ones that aren't.
> If X-Spam-Status: doesn't exist, then do the test at mass-check time.
> 
> That part should be easy, the harder part is then not running those
> DNS/network queries for weekly checks if not needed, but I'm more
> concerned about accuracy than performance at the moment.  I think the
> 20/20 hindsight of the network tests is the primary reason why the BAYES
> scores are so low in the bayes+net score set.

yep, agreed.   and I think we can definitely reuse the X-Spam-Status
stuff.

However I'm not sure a simple "reuse" is sufficient.  Here's what's
happened to net rules that we might want to reuse, in the past:

  - names have been changed (we can reuse as long as we track the
    name change)
  - logic changed (ie. we cannot reuse those results)
  - rule added (ie. the *lack* of a hit before this date is not
    indicative of a miss, necessarily!)

All these things happen at certain dates, too.   I'd suggest that just
having "reuse" isn't enough; whatever we do should be able to map the
following assertion and reuse hits correctly: "RCVD_IN_SOME_BL was added
under the name RCVD_IN_BL_SOME_BLAH on 2004-11-13, then had its name
changed to RCVD_IN_SOME_BL on 2005-01-02".

the good news: this only needs to be coded with in mass-check, not
part of the main Mail::SpamAssassin lib. ;)

but, meh, "reuse" as proposed is a good first step anyway ;)

- --j.
    

> Here's a possible option to add to the rules files:
> 
> to reuse:
> 
>   reuse <current rule name> [<old rule name>]
> 
> to not reuse (for masses/spamassassin/user_prefs)
> 
>   reuse <current rule name> 0
> 
> For example:
> 
> reuse DCC_CHECK
> reuse DIGEST_MULTIPLE
> reuse DNS_FROM_AHBL_RHSBL
> reuse DNS_FROM_RFC_ABUSE
> reuse DNS_FROM_RFC_BOGUSMX
> reuse DNS_FROM_RFC_DSN
> reuse DNS_FROM_RFC_POST
> reuse DNS_FROM_RFC_WHOIS
> reuse DNS_FROM_SECURITYSAGE
> reuse HABEAS_INFRINGER
> reuse HABEAS_USER
> reuse NO_DNS_FOR_FROM
> reuse PYZOR_CHECK
> reuse RAZOR2_CF_RANGE_51_100
> reuse RAZOR2_CHECK
> reuse RCVD_IN_BL_SPAMCOP_NET
> reuse RCVD_IN_BSP_OTHER
> reuse RCVD_IN_BSP_TRUSTED
> reuse RCVD_IN_DSBL
> reuse RCVD_IN_NJABL_CGI
> reuse RCVD_IN_NJABL_DUL
> reuse RCVD_IN_NJABL_MULTI
> reuse RCVD_IN_NJABL_PROXY
> reuse RCVD_IN_NJABL_RELAY
> reuse RCVD_IN_NJABL_SPAM
> reuse RCVD_IN_SBL
> reuse RCVD_IN_SORBS_BLOCK
> reuse RCVD_IN_SORBS_DUL
> reuse RCVD_IN_SORBS_HTTP
> reuse RCVD_IN_SORBS_MISC
> reuse RCVD_IN_SORBS_SMTP
> reuse RCVD_IN_SORBS_SOCKS
> reuse RCVD_IN_SORBS_SPAM
> reuse RCVD_IN_SORBS_WEB
> reuse RCVD_IN_SORBS_ZOMBIE
> reuse RCVD_IN_WHOIS_INVALID   RCVD_IN_RFC_IPWHOIS
> reuse RCVD_IN_XBL
> reuse ROUND_THE_WORLD
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.5 (GNU/Linux)
Comment: Exmh CVS

iD8DBQFB7c1jMJF5cimLx9ARAnMoAKC2DQWZT69sgPtBxFnQUpHLGR2FqgCeIP8t
pRvqxWN3v1AxaFDVMEmK9aQ=
=wQQE
-----END PGP SIGNATURE-----

Re: real-time network results

Reply via email to