On Jun 2, 2012, at 11:15 AM, Jared Johnson wrote: >> Yup. Part of the motivation for this plugin was to short circuit all the >> intermediate plugins and handlers so I can feed the message to sa-learn >> and dspam. Until dspam is trained, that's a very important step in >> training it. But there's no gain in validating the HELO name, SPF, or >> DomainKeys. This plugin and associated changes adds that flexibility while >> reducing the code and complexity of the plugins. > > It might not be fair to say there's *no* gain. Our HELO validation and > SPF plugins (we don't have a DKIM plugin at the moment, for shame) now do > their lookups unconditionally and add headers to the message so that our > bayes engine can tokenize the headers themselves.
Wait until you actually run DomainKeys before you decide if it's a gain. It requires more resources than I'd have guessed. And surprisingly (to me) is that the most reliably signed messages are spam. Or very big "mostly good" senders. I've seen enough ham senders with broken DomainKeys so I don't consider it reliable enough to reject or train based on. Same goes for SPF. Spammers are far more likely to have good SPF than legit mailers. Spammers automate their SPF records, so they don't make typo mistakes like "ip:127..." (should be "ip4:127...") or missing spaces between the declarations and the ~all. The errors are common enough, and affect ham often enough, that I'm tempted to fix them up in the SPF plugin before validation. And SPF breaks legit forwarding servers that don't implement SRS. So I don't reject or train based on SPF alone. I too have a custom HELO validation plugin (it needs more work, but I'll contribute it eventually), and it may actually provide some gain, but I think it's safe to say the one presently in plugins is not a gain. How do you measure if the resources expended are worth the (likely small) benefit you would get from the additional bayes tokens? That will determine if it's a gain or not. I've placed my bet on the table, and I'd be pleased to be proven wrong. > Bayes is a little bit of a black box to me, so I can't really quantify > just how useful this is, but I'd say it's greater than zero. Dspam even > treats headers in a special way to ensure that their usefulness is > maximized. Usefulness != gain. There may be some gain, but I'm not familiar with bayes enough either. But I know someone who is. The dspam author (Stevan Bajić) noticed my plugin, contacted me, and will be submitting some improvements, like talking directly to the dspam server. I'm BCC'ing him on this message, and hopefully we'll get a more informed opinion. Matt