Re: Quinlan interviewed about SA
On Sunday, March 6, 2005, 7:45:36 AM, Eric Hall wrote: On 3/6/2005 3:25 AM, Matt Kettler wrote: These days spamming is done via botnets That's already trapped by sbl+xbl. sbl-xbl is very good, but it has not and cannot solve the zombie problem entirely. There's always a lag between zombies being detected and their being listed in RBLs. That delay can be exploited by spammers to do a lot of sending. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Quinlan interviewed about SA
On Saturday, March 5, 2005, 11:24:25 AM, Eric Hall wrote: On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. I kind of disagree with this, but only partly. Generally speaking, you want as many good indicators as you have bad indicators. If you have hundreds of indicators that flag every possible spam-sign, then sooner or later every piece of good mail will also get flagged by one rule or another. In order to off-set this, you want to have a collection of good indicators, so that you can cancel out the everything-looks-like-spam effect. Unfortunately, these rules will also hit some kind of spam, so sooner or later a large enough set of good rules will just make everything some shade of grey, or worse will make marginal spam appear to be good. Now then, in order to avoid that, you really should limit the positive indicators to stuff that you can verify (which is only slightly different than authenticate). All the rules are verified by testing against spam and ham corpora before being deployed. Ones that have high false positives are given a low score or not used at all. Folks don't just make up rules and deploy them. The usefulness of the official rules is checked before they're released. YMMV on homemade rules. That said, as the Internet moves towards more useable identification and authentication schemes for mail, they will probably get positive rules in SA. SPF or Domain Keys may (or may not) be examples, but the nice thing is that SA lets us give them relative goodness scores and not an outright pass or fail, so they don't need to be perfect out of the box. That may actually help their adoption as it arguably has with SURBLs. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Quinlan interviewed about SA
On 3/5/2005 9:00 PM, Jeff Chan wrote: On Saturday, March 5, 2005, 11:24:25 AM, Eric Hall wrote: On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. Ones that have high false positives are given a low score or not used at all. Folks don't just make up rules and deploy them. The usefulness of the official rules is checked before they're released. Yes, but we don't have very many of them. I don't mean validate by passing it through pre-release testing either (although that's certainly important), but instead mean that the message itself has to contain enough data for the marker to be validated. Whether this is an external agent that will validate some hash (as in the probable case of DK), or something in the message itself (a trusted relay says that a cert is good), or whatever, the important thing is the verification part (this is still different from authentication). nice thing is that SA lets us give them relative goodness scores and not an outright pass or fail, so they don't need to be perfect out of the box. Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Quinlan interviewed about SA
On Saturday 05 March 2005 9:54 pm, Eric A. Hall wrote: Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). Ah, but from experience, they *haven't* been useful. SA used to have quite a few negative-scoring rules, and as a result spammers started tailoring their spam to hit them. A rather extreme example would be the series of rules that targeted mail programs that spammers rarely used -- things like Pine, Mutt, Mozilla, etc. The result: spam came through with headers for all three and got a base score of -10. This particular case could be mitigated by adding meta-rules (if it hits more than one UA test, it's obviously forged), but as this sort of thing started happening regularly, the devs began taking out any negative-scoring rules that could be gamed like this. That left the default whitelist, Habeas (after some refinements), Bonded Sender, Hashcash, and Bayes (since it's different for each target system). From what I hear a DomainKeys plugin is in the works. It's not that the SpamAssassin team hasn't thought of the idea, it's that they tried it and, for the most part, it didn't work. -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Quinlan interviewed about SA
At 02:58 AM 3/6/2005, Kelson Vibber wrote: Yes, my point being that rather than saying they are not useful we really ought to be working hard on finding ways to add more of them, because it is their volume that makes them useful (otoh, having too many of them, such that the bar is lowered, is indeed bad). Ah, but from experience, they *haven't* been useful. SA used to have quite a few negative-scoring rules, and as a result spammers started tailoring their spam to hit them. I agree entirely, Kelson speaks true here. Any rule based on simple message content alone can be forged trivially and abused by spammers. I think the big point to get across is we aren't just saying they aren't useful, it's We've been there, done that, and got screwed by the spammers for it. 2.50 shipped with a bunch of negative scoring rules, and it resulted in the completely infamous bug 1589 breaking out: http://bugzilla.spamassassin.org/show_bug.cgi?id=1589 That said, I do personally favor having lots of very small-scoring negative rules (ie: -0.01 each) and set the ham autolearn threshold to -0.01. This prevents a lot of low scoring spam learned as ham problems, as now in order to learn it must hit at least one of the ham rules. Learning as ham any message with a small positive score as per default is just asking for trouble. Keeping the scores of the rules small means they are too trivial to be abused for any significant gain by spammers.
Re: Quinlan interviewed about SA
At 03:16 AM 3/6/2005, Eric A. Hall wrote: But, compare this to something like scoring against TLS encryption strength. Spammers are motivated to send as fast as possible, and strong encryption is counter-productive to that mission (increasingly so), and they can't fake it because it can be validated by a trusted relay. Bah, spammers may be motivated by speed, but they are also opportunists and abuse the resources of others. These days spamming is done via botnets and are almost entirely limited by the bandwidth of the node, not it's CPU time. Adding TLS shouldn't slow them down much, as it's mostly a CPU hit to do so... besides, they can always make up for it by grabbing more infected hosts.
Re: Quinlan interviewed about SA
On Sunday, March 6, 2005, 12:16:50 AM, Eric Hall wrote: But, compare this to something like scoring against TLS encryption strength. Spammers are motivated to send as fast as possible, and strong encryption is counter-productive to that mission (increasingly so), and they can't fake it because it can be validated by a trusted relay. Spammers have access to hundreds of thousands of zombies. They probably have all the computing power they need to calculate a few hashes. Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Quinlan interviewed about SA
On 3/6/2005 3:25 AM, Matt Kettler wrote: These days spamming is done via botnets That's already trapped by sbl+xbl. Adding TLS shouldn't slow them down much, as it's mostly a CPU hit to do so... There's a lot of stuff involved, and there's lots of things to score on. Here's a couple of samples from DNSOps and Namedroppers: from darkwing.uoregon.edu (darkwing.uoregon.edu [128.223.142.13]) (using TLSv1 with cipher EDH-RSA-DES-CBC3-SHA (168/168 bits)) (Client CN darkwing.uoregon.edu, Issuer Thawte Server CA (verified OK)) by goose.ehsco.com (Postfix ) with ESMTP for [EMAIL PROTECTED]; Fri, 4 Mar 2005 02:18:17 -0600 (CST) Received: from psg.com (psg.com [147.28.0.62]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by goose.ehsco.com (Postfix ) with ESMTP for [EMAIL PROTECTED]; Sat, 5 Mar 2005 16:16:15 -0600 (CST) Did the client present it's own cert? Is it in a trusted path (not self-signed)? Was there an revocation lookup? How tough was the key, and how many bits were used (and scale the score accordingly)? So getting to the higher cumulative scores wouldn't be very simple, and it would also provide a clear path of responsibility, etc. The same thing could be done with user-certs too, if a plug-in to SA wants to do the verification testing. There's also a possibility of having a generic GOOD_BOY set of meta SMTP tests that give a bonus score if they all successfully match against good administrative practives (such as HELO=RDNS). I'm still thinking about this one; the 'professional marketers' would hit this a lot, and there's too many poorly-run networks, so it might be counter-productive. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
Re: Quinlan interviewed about SA
jdow wrote: Methinks there is a candidate meta rule here. SPF passes and it's in certain of the BLs leads to a higher score than merely being in the BL. In particular, an SPF (or similar) pass will make RHSBLs (right-hand-side blacklists, for those following along) more useful. I mean, if someone forges a mail from [EMAIL PROTECTED], that's dumb. They get a point or two for stupidity. But if knownspammer.biz goes to the effort to set up an SPF policy and sticks to it, the combination SPF pass and RHSBL hit is pretty conclusive. (Me, I'm just looking forward to the day that people stop bouncing mail sent using forged addresses. I checked the number of User unknown hits we handle per day, and it's more than 10 times the number of messages that make it through to an actual mailbox.) -- Kelson Vibber SpeedGate Communications www.speed.net
Re: Quinlan interviewed about SA
Kelson wrote: jdow wrote: Methinks there is a candidate meta rule here. SPF passes and it's in certain of the BLs leads to a higher score than merely being in the BL. In particular, an SPF (or similar) pass will make RHSBLs (right-hand-side blacklists, for those following along) more useful. I mean, if someone forges a mail from [EMAIL PROTECTED], that's dumb. They get a point or two for stupidity. But if knownspammer.biz goes to the effort to set up an SPF policy and sticks to it, the combination SPF pass and RHSBL hit is pretty conclusive. I'm confused. Are you actually seeing legitimate mail that has a forged address from a blacklisted domain? If not, I don't see the need for a penalty difference between mail from blacklisted domains and mail from blacklisted domains that also pass an SPF check. On the other hand, combining whitelists with SPF checks (instead of using whitelist_from_rcvd), make a lot of sense to me. Daryl
Re: Quinlan interviewed about SA
From: Daryl C. W. O'Shea [EMAIL PROTECTED] Kelson wrote: jdow wrote: Methinks there is a candidate meta rule here. SPF passes and it's in certain of the BLs leads to a higher score than merely being in the BL. In particular, an SPF (or similar) pass will make RHSBLs (right-hand-side blacklists, for those following along) more useful. I mean, if someone forges a mail from [EMAIL PROTECTED], that's dumb. They get a point or two for stupidity. But if knownspammer.biz goes to the effort to set up an SPF policy and sticks to it, the combination SPF pass and RHSBL hit is pretty conclusive. I'm confused. Are you actually seeing legitimate mail that has a forged address from a blacklisted domain? If not, I don't see the need for a penalty difference between mail from blacklisted domains and mail from blacklisted domains that also pass an SPF check. On the other hand, combining whitelists with SPF checks (instead of using whitelist_from_rcvd), make a lot of sense to me. If some mentally deficient spammer has the stupidity to maintain an SPF record for his spam site that is identified in black lists he probably should get some additional Brownie Points for his stupidity, eh? {^_-}
Re: Quinlan interviewed about SA
using whitelist_from_rcvd), make a lot of sense to me. If some mentally deficient spammer has the stupidity to maintain an SPF record for his spam site that is identified in black lists he probably should get some additional Brownie Points for his stupidity, eh? {^_-} Just came across someone using this domain for contact addresses earlier this week (stupid, or just forthright): iamaspammer-munged.com (registered in Thailand by Manila Industries) Paul Shupak [EMAIL PROTECTED]
Re: Quinlan interviewed about SA
On 3/4/2005 1:57 PM, Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. I kind of disagree with this, but only partly. Generally speaking, you want as many good indicators as you have bad indicators. If you have hundreds of indicators that flag every possible spam-sign, then sooner or later every piece of good mail will also get flagged by one rule or another. In order to off-set this, you want to have a collection of good indicators, so that you can cancel out the everything-looks-like-spam effect. Unfortunately, these rules will also hit some kind of spam, so sooner or later a large enough set of good rules will just make everything some shade of grey, or worse will make marginal spam appear to be good. Now then, in order to avoid that, you really should limit the positive indicators to stuff that you can verify (which is only slightly different than authenticate). EG, don't trust a received header two hops back in the transfer path, because it can be forged, and you can't verify it. Don't trust the AUTH= field in Received headers, because that can be forged very easily. And so forth. Personally I suspect that a 10-to-1 ratio of bad-to-good indicators would probably get 99% accuracy by itself. Getting to that level with fully verifiable positive indicators is proving to be extremely difficult, however. The reason that I ask is because I'm wondering whether whitelisting is really a good idea. Whitelisting by address is a good example of a positive indicator that cannot be trusted. Given that many of the client-side malware tools out there will read the address book, or read through the local mailstore, or use some other technique to send junk from a sender that is likely to be trusted, well, there's just not a lot of trust in blanket whitelists. The whitelist-by-received (and some greylist) functionality is somewhat more useful, however, since it adds an unrelated identifier, which effectively acts like a hash or a PIN. This still falls apart when malware sends from the local user through a normal delivery path, but that is still an improvement. -- Eric A. Hallhttp://www.ehsco.com/ Internet Core Protocols http://www.oreilly.com/catalog/coreprot/
RE: Quinlan interviewed about SA
Good interview with Daniel Quinlan about SA: http://www.osdir.com/Article4419.phtml Especially: OSDir.com: What's the most effective anti-spam technology that SpamAssassin uses right now? Quinlan: I think network rules are the most effective single technology, in particular, the URI rules that use SURBL, looking for spammer domains in Web links. Thanks DQ! ;-) Jeff C. Hey that was a good interview. And I completely agree with his closing. Antispam has captured my attention because of its ability to change and challenge me. As for the future antispam techniquesI'm not saying anything ;) But we got quite a few ideas lined up. Hopefully they pan out good enough to make it into SA. D.Q. handled himself better then I would. Had I been given the soap box, I'd have torn into ISPs, registrars, ect. --Chris (Proud to wear the SURBL badge!)
RE: Quinlan interviewed about SA
Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. Is he referring to a system which might assume all mail is spam unless proven good? Or, is he referring to whitelisting senders? Or, something else. The reason that I ask is because I'm wondering whether whitelisting is really a good idea. It seems like every article in the world on spam filters says, a product MUST allow for whitelisting senders or it is no good. However: (1) I suspect that the ability to whitelisting senders is more of a way for poor spam filters to hide their poor quality from those situations where their blocking of legit messages would be most noticed. Often, blocked legit messages go unnoticed... until someone you know personally says, did you get my message about Whitelisting senders minimizes such situations... but, ideally, a filter shouldn't block legit messages to begin with. (2) A second problem with whitelisting senders is the potential to whitelist spam that is being sent by a virus which simply played musical chairs with someone address book. Theoretically, a spam virus could go to town if the recipient had whitelisted the same sender that the virus randomly picked to place in the FROM of that spam. But, am I being paranoid? Does anyone know of this happening? Also, maybe a good compromise is to simply lower the score if the sender is on a trusted sender list. Personally, the biggest problem I have with blocking legit messages is when a client might tease his friend about his friend having a small member. It is easy for this to be caught by rules so I do see the need for trusted senders... But I just feel a need to rethink the way that this should be implemented. Any suggestions? Rob McEwen PowerView Systems
Re: Quinlan interviewed about SA
Rob McEwen (PowerView Systems) wrote: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. Is he referring to a system which might assume all mail is spam unless proven good? I can't be certain of Daniel's intended meaning here, but he's likely referring to any negative-scoring rule that relies entirely on message content (headers + body) and does no outside checks for some form of validation. Many (not sure how many, really) such rules have been put into SA in the past - and have been exploited by spammers to reduce the score their message gets. (In 2.53 IIRC a spammer managed to hit 7 or 8 negative-scoring rules for different non-spam MUAs - all at once - gaining ~10 points.) The only (default) negative rules remaining are for Bayes (varies per-system, and often per-user), BondedSender/Habeas/HashCash (sender posts a bond with $company, and if they're found to have spammed, they lose that bond - details vary), ALL_TRUSTED (for mail that only passes through mail systems you trust) and the whitelist* rules. SPF pass is also assigned a negative score, but not much: -0.001. g Some have matching postive-scoring tests (Habeas, Bayes). The reason that I ask is because I'm wondering whether whitelisting is really a good idea. It seems like every article in the world on spam filters says, a product MUST allow for whitelisting senders or it is no good. If $user specifically wants $newsletter, despite the fact that it's spammier than a message containing GTUBE, ANY filter MUST allow that user to make that choice. More critically, in an ISP environment, I as a system administrator can NOT tell customers that they can't have that because it's spam - they'll cancel service and switch to a provider that will let $spammy_message through. Per-account whitelisting allows me to let some types of mail through without effectively forcing that mail to be *black*listed for everyone else. (FlowGo, anyone? :P ) However: (1) I suspect that the ability to whitelisting senders is more of a way for poor spam filters to hide their poor quality from those situations where their blocking of legit messages would be most noticed. Often, blocked legit messages go unnoticed... until someone you know personally says, did you get my message about Whitelisting senders minimizes such situations... but, ideally, a filter shouldn't block legit messages to begin with. Mmmh. It's more of a way to make (almost) absolutely CERTAIN that mail claiming to be from a certain sender will not get tagged. (Or scanned, depending on where you implement your whitelist.) There *is* website-information-listmail, Joke-Of-The-Day mail, and a collection of other stuff that is really some pretty horribly formatted email - but it's legit. On my personal email account, for instance (running 2.64), I receive a pile of website-info email that is thoroughly gooped up clicky-flashy-click-and-drool HTML, and runs ~30K for a mostly-graphical message. Even with a BAYES_00 hit (score -5.4), some of these messages get tagged. But I still want those emails in my inbox rather than having to dig them out of my spam folder. (I'd prefer these messages in plaintext, but that's not an option, and I *do* want the message.) (2) A second problem with whitelisting senders is the potential to whitelist spam that is being sent by a virus which simply played musical chairs with someone address book. Theoretically, a spam virus could go to town if the recipient had whitelisted the same sender that the virus randomly picked to place in the FROM of that spam. Thus the whitelist_from_rcvd test, which requires that the relay that passed the message into your system have a certain rDNS. This doesn't always help; some large lists change mailhosts periodically. This also (obviously) doesn't help much with viruses... but that's what a virus scanner is for. g SA (almost) never sees viruses on my systems, ClamAV gets'em first. But, am I being paranoid? Does anyone know of this happening? I have seen troubles with whitelisting, yes. The reasons have been many, and the solution has always been different. :/ Also, maybe a good compromise is to simply lower the score if the sender is on a trusted sender list. By all means. whitelist_from is just another rule in SA, with a *default* score of -100. There's nothing stopping you from creating local rules that are much less vigorous, or changing that score to -5. I've added local rules on one system that reduce the score by anywhere from 1 to 3 points for local ISPs. I see *maybe* one FN per month due to those rules, although I can't speak for the customers that don't give me feedback on how the filter is working for them. :P Personally, the biggest problem I have with blocking legit messages is when a client might tease his
Re: Quinlan interviewed about SA
Rob McEwen (PowerView Systems) [EMAIL PROTECTED] writes: Quinlan: Any technique that tries to identify good mail without authentication backing it up, or some form of personalized training. It worked well for a while, but it's definitely not an effective technique today. Let me rephrase that to be clearer. Someone (the author or some editor) added that comma to the sentence. My original email had no comma there. A clearer phrasing that would not tempt someone into adding punctuation would be: [The least effective technique is] Any technique that tries to identify good mail with neither authentication backing it up nor some form of personalized training. In SpamAssassin parlance, I was referring to negative scoring rules that can be easily fooled. They also removed the name of the company where I work (IronPort), which struck me as a bit odd considering how my job allows me to do open source was part of the article. I think my employer deserves some kudos for that. Not to mention implying that I'm more than just one of the developers. There are eight committers, six of them on the Project Management Committee and two of them (Justin Mason and Theo Van Dinter) write at least as much code as me. (And Michael Parker is catching up.) Daniel -- Daniel Quinlan http://www.pathname.com/~quinlan/
Re: Quinlan interviewed about SA
On Friday, March 4, 2005, 2:05:52 PM, Daniel Quinlan wrote: They also removed the name of the company where I work (IronPort), which struck me as a bit odd considering how my job allows me to do open source was part of the article. I think my employer deserves some kudos for that. Probably that's exceptional so they assumed you did open source on your own time, or they weren't sure so they didn't mention it. Not to mention implying that I'm more than just one of the developers. There are eight committers, six of them on the Project Management Committee and two of them (Justin Mason and Theo Van Dinter) write at least as much code as me. (And Michael Parker is catching up.) Perhaps a follow letter from you to them might be appropriate? :-) Jeff C. -- Jeff Chan mailto:[EMAIL PROTECTED] http://www.surbl.org/
Re: Quinlan interviewed about SA
From: Rob McEwen (PowerView Systems) [EMAIL PROTECTED] The reason that I ask is because I'm wondering whether whitelisting is really a good idea. It seems like every article in the world on spam filters says, a product MUST allow for whitelisting senders or it is no good. However: (1) I suspect that the ability to whitelisting senders is more of a way for poor spam filters to hide their poor quality from those situations where their blocking of legit messages would be most noticed. Often, blocked legit messages go unnoticed... until someone you know personally says, did you get my message about Whitelisting senders minimizes such situations... but, ideally, a filter shouldn't block legit messages to begin with. Positively not true, Rob. Whitelisting allows you to junk the persistent mortgage spam and still get email from specific sources about mortgages. If the LKML was spam free I'd simply whitelist it since it tends to hit some of the chickenpox rules rather badly. (I think there is one chickenpox rule I need to disable, anyway. It seems to trigger on messages with a lot ofquotes. (2) A second problem with whitelisting senders is the potential to whitelist spam that is being sent by a virus which simply played musical chairs with someone address book. Theoretically, a spam virus could go to town if the recipient had whitelisted the same sender that the virus randomly picked to place in the FROM of that spam. This is indeed a problem. Whitelists must be used very judiciously. (I use an extreme form of whitelist on this list, for example. I have procmail completely bypass SpamAssassin for this list. What I cannot do is wrap my head around automated whitelisting. That concept seems to be remarkably prone to falsely whitelisting spammers. {^_^}
Re: Quinlan interviewed about SA
From: Kris Deugau [EMAIL PROTECTED] The only (default) negative rules remaining are for Bayes (varies per-system, and often per-user), BondedSender/Habeas/HashCash (sender posts a bond with $company, and if they're found to have spammed, they lose that bond - details vary), ALL_TRUSTED (for mail that only passes through mail systems you trust) and the whitelist* rules. SPF pass is also assigned a negative score, but not much: -0.001. g Methinks there is a candidate meta rule here. SPF passes and it's in certain of the BLs leads to a higher score than merely being in the BL. {^_-}