On Jun 2, 2012, at 11:15 AM, Jared Johnson wrote:

>> Yup. Part of the motivation for this plugin was to short circuit all the
>> intermediate plugins and handlers so I can feed the message to sa-learn
>> and dspam. Until dspam is trained, that's a very important step in
>> training it. But there's no gain in validating the HELO name, SPF,  or
>> DomainKeys. This plugin and associated changes adds that flexibility while
>> reducing the code and complexity of the plugins.
> 
> It might not be fair to say there's *no* gain.  Our HELO validation and
> SPF plugins (we don't have a DKIM plugin at the moment, for shame) now do
> their lookups unconditionally and add headers to the message so that our
> bayes engine can tokenize the headers themselves.

Wait until you actually run DomainKeys before you decide if it's a gain. It 
requires more resources than I'd have guessed. And surprisingly (to me) is that 
the most reliably signed messages are spam. Or very big "mostly good" senders.  
I've seen enough ham senders with broken DomainKeys so I don't consider it 
reliable enough to reject or train based on. Same goes for SPF. Spammers are 
far more likely to have good SPF than legit mailers. Spammers automate their 
SPF records, so they don't make typo mistakes like "ip:127..." (should be 
"ip4:127...") or missing spaces between the declarations and the ~all. The 
errors are common enough, and affect ham often enough, that I'm tempted to fix 
them up in the SPF plugin before validation. 

And SPF breaks legit forwarding servers that don't implement SRS. So I don't 
reject or train based on SPF alone. 

I too have a custom HELO validation plugin (it needs more work, but I'll 
contribute it eventually), and it may actually provide some gain, but I think 
it's safe to say the one presently in plugins is not a gain.

How do you measure if the resources expended are worth the (likely small) 
benefit you would get from the additional bayes tokens? That will determine if 
it's a gain or not. I've placed my bet on the table, and I'd be pleased to be 
proven wrong.

> Bayes is a little bit of a black box to me, so I can't really quantify
> just how useful this is, but I'd say it's greater than zero. Dspam even
> treats headers in a special way to ensure that their usefulness is
> maximized.

Usefulness != gain.  There may be some gain, but I'm not familiar with bayes 
enough either. But I know someone who is. The dspam author (Stevan Bajić) 
noticed my plugin, contacted me, and will be submitting some improvements, like 
talking directly to the dspam server.  I'm BCC'ing him on this message, and 
hopefully we'll get a more informed opinion.

Matt

Reply via email to