On Wed, Feb 07, 2007 at 01:18:13PM +0100, Marcin Krol wrote:
>
> Automatic whitelisting, definitions:
>
> 1. DSPAM: anybody who sends ${whitelistThreshold} mails is whitelisted
>
> 2. ASSP: anybody who is authenticated over SASL and their
> correspondents from that mail is whitelisted for up to 90 days
>
> Frankly, implementation of definition 1 is not very good really.
> Implementation according to definition 2 is right on target.
In that case, you are wanting to do something dspam isn't designed
for. It will also require additional information which dspam isn't
able to get (it's not privvy to details of how messages were sent,
for example). Therefore, you're going to have do some hacking
yourself.
> From tokenizer.c:
>
> This could be hackable to get token whitelisted on demand
> actually, but I'm afraid of screwing something up in another
> place.
If you really want to implement this sort of functionality into
dspam, you'll have to take the risk. It doesn't look like it would
be terribly difficult; the trickiest part will probably be dealing
with all the different ways of writing email addresses in From:
lines. On the other hand, I'd also be worried that my changes would
have some strange affect on the statistical filtering, leading to a
gradual decrease in accuracy.
> [...] but there's nothing MTA can do if antispam filter decides
> it's spam, can it?
Well, the MTA is responsible for invoking the spam filter, and it
can simply choose not to do so if it already knows the message is
not spam. This has the added benefit of not wasting system resources
trying to come up with an answer when the answer is already known.
> I don't think this actually falls into realm of MTA. You don't say
> to people "if you want whitelist smth in SpamAssassin, make MTA do
> whitelisting instead of SA", do you?
It depends; if you wanted to use a spam detection or whitelisting
method that wasn't supported by spamassassin and was difficult to
add, then a suggestion to implement it at the MTA level would be
reasonable.
Also keep in mind that spamassassin uses a modular design; it's
intended that you can plug in additional functions that use any
number of different methods for determining if a message is spam or
not. dspam is more single-minded, and adding any sort of extra
functionality requires modifying the program itself. Furthermore,
the program is heavily biased around statistical filtering, so any
modifications outside of this scope tend to be even more difficult
or invasive.
For that reason, it may be easier/safer to implement the whitelist
you want at the MTA level, although whether or not that will be
easier probably depends on how familiar you are with Exim. It
shouldn't be too difficult to get it to do a quick lookup for
"pre-authorised" sender/recipient pairs and choose a different
delivery strategy based on the outcome.
Since you already require information which dspam doesn't have, you
will need to create a system which records sender/recipient pairs as
well as some sort of whitelist management system, regardless of
whether you get the MTA or your spam filter to utilise the
information.
You also get the benefit of picking off low-hanging fruit before
sending it to dspam, which will scale better.
Also keep in mind that if you make changes to dspam, you might need
to maintain your own patches unless you can get Jon to merge them
into the official tree. This means it's probably going to need to
be supported by all backends, etc. - a site-specific solution isn't
likely to be incorporated into dspam.
I'm not trying to dissuade you from implementing this into dspam;
just trying to expand on why some people think it's better done at
the MTA level.