New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
While I still plan for this to primarily be used via rsync and a spamassassin plugin, I've loaded the data into DNS records and created spamassassin rules so it can easily be tested now. It's updating automatically once a day. I'm hoping this will encourage people to contribute data. Because now you should get an immediate improvement in your spam filtration, based on data you've provided on what IPs send you ham and spam. More info, including the script to submit data (either from spam/ham folders, or individual emails piped to standard input) here: http://www.chaosreigns.com/iprep/ The spamassassin rules: ifplugin Mail::SpamAssassin::Plugin::DNSEval header __RCVD_IN_IPREP eval:check_rbl('iprep-firsttrusted', 'iprep.chaosreigns.com.') tflags __RCVD_IN_IPREP nice net header RCVD_IN_IPREPDNS_100 eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe RCVD_IN_IPREPDNS_100 Sender listed at http://www.chaosreigns.com/iprep/, 100% ham tflags RCVD_IN_IPREPDNS_100 nice net header RCVD_IN_IPREPDNS_50eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.50') describe RCVD_IN_IPREPDNS_50Sender listed at http://www.chaosreigns.com/iprep/, 50% ham tflags RCVD_IN_IPREPDNS_50nice net header RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.0') describe RCVD_IN_IPREPDNS_0 Sender listed at http://www.chaosreigns.com/iprep/, 0% ham tflags RCVD_IN_IPREPDNS_0 net meta RCVD_NOT_IN_IPREPDNS ( ! RCVD_IN_IPREPDNS_100 ! RCVD_IN_IPREPDNS_50 ! RCVD_IN_IPREPDNS_0 ! NO_RELAYS ) describe RCVD_NOT_IN_IPREPDNS Sender not listed at http://www.chaosreigns.com/iprep/ tflags RCVD_NOT_IN_IPREPDNS net score RCVD_IN_IPREPDNS_100 -0.1 score RCVD_IN_IPREPDNS_50 -0.0001 score RCVD_IN_IPREPDNS_00.1 score RCVD_NOT_IN_IPREPDNS 0.0001 endif For people not contributing data, this is not likely to be useful yet. Out of the 86,899 IPs I have data for, all but 38 are either 100% spam or 100% ham, so a great predictor of what the next email from known IPs will be. This is why blacklists and whitelists, including spamassassin's AWL (which is another combination of both) are nothing new. The advantages I'm providing over SA's AWL are: 1) It's based on human verified ham and spam, not SA's previous opinions of emails. 2) Shared knowledge from other people's email. What I hope to be an advantage over dnswl.org, which I've been involved in, is increased automation. Here's a test I ran using only the last 500 of my own emails. All hand categorized as spam or ham, and sorted by received data. One by one it learns the IP as a ham source, spammer, or mix, and using what it has learned, guesses what the next email is. Every 100 emails it reports its success rate for the last 100 emails: $ ./progress.pl Rank 100, hit 51.7647058823529% of ham, hit 0% of spam. Rank 50, hit 0% of ham, hit 0% of spam. Rank 0, hit 0% of ham, hit 0% of spam. Rank none, hit 48.2352941176471% of ham, hit 100% of spam. Rank 100, hit 76% of ham, hit 0% of spam. Rank 50, hit 0% of ham, hit 0% of spam. Rank 0, hit 0% of ham, hit 28% of spam. Rank none, hit 24% of ham, hit 72% of spam. Rank 100, hit 72.3684210526316% of ham, hit 0% of spam. Rank 50, hit 0% of ham, hit 0% of spam. Rank 0, hit 0% of ham, hit 4.17% of spam. Rank none, hit 27.6315789473684% of ham, hit 95.8% of spam. Rank 100, hit 79.4520547945205% of ham, hit 0% of spam. Rank 50, hit 0% of ham, hit 0% of spam. Rank 0, hit 0% of ham, hit 48.1481481481481% of spam. Rank none, hit 20.5479452054795% of ham, hit 51.8518518518519% of spam. Rank 100, hit 79.2682926829268% of ham, hit 0% of spam. Rank 50, hit 0% of ham, hit 0% of spam. Rank 0, hit 0% of ham, hit 27.8% of spam. Rank none, hit 20.7317073170732% of ham, hit 72.2% of spam. So after 400 emails, RCVD_IN_IPREPDNS_100 is hitting 79% of ham and no spam. I don't think anything else spamassassin uses can do this well. But I have data from 184,335 emails. Using all that data, results for the last 10,000 emails were: Rank 100, hit 94.1176470588235% of ham, hit 0.0101553772722657% of spam. Rank 50, hit 1.30718954248366% of ham, hit 0.0101553772722657% of spam. Rank 0, hit 0% of ham, hit 64.2022951152635% of spam. Rank none, hit 4.57516339869281% of ham, hit 35.7773941301919% of spam. RCVD_IN_IPREPDNS_100 hits 94% of ham, and 0.01% of spam. RCVD_IN_IPREPDNS_0 hits 64% of spam and no ham. Again, I don't think anything else spamassassin uses can do this well. But results this good can only be expected for people contributing data. At least until we get more people contributing data. -- The price of freedom is the willingness to do sudden battle, anywhere, at any time, and with utter recklessness. - Robert A. Heinlein http://www.ChaosReigns.com
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe Do not forget to backslash-quote dots in a regular expression if you mean a literal dot instead of 'any character'. Mark
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
On 4/1/11 2:34 PM, dar...@chaosreigns.com wrote: header RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.0') describe RCVD_IN_IPREPDNS_0 Sender listed athttp://www.chaosreigns.com/iprep/, 0% ham tflags RCVD_IN_IPREPDNS_0 net might actually need a quantity qualifier. (if this ip is 0 % ham... does that actually mean it is 100% spam?) or does that mean that I (so far) only saw one email hit it, and it is spam? other than this is marking 'spam rates' and DCC commercial does the same thing for 'bulk' rates, what is the difference between this and DCC? note: dcc uses (for large installs) a local, VLDB that they 'sync' (flood they call it) in real time. but it not only tells you the bulk rate of the sender's ip, but the 'bulk hit rate' for the email you just got. sounds similar, but bulk vs spam. (and its inverse.. you collect percentages of HAM. the collect percentages of BULK). maybe 2nd or 3rd octet could contain 'confidence factor'.. eg: some sliding scale of how many actual emails you have seen? -- Michael Scheidell, CTO o: 561-999-5000 d: 561-948-2259 ISN: 1259*1300 *| *SECNAP Network Security Corporation * Best Intrusion Prevention Product, Networks Product Guide * Certified SNORT Integrator * Hot Company Award, World Executive Alliance * Best in Email Security, 2010 Network Products Guide * King of Spam Filters, SC Magazine __ This email has been scanned and certified safe by SpammerTrap(r). For Information please see http://www.secnap.com/products/spammertrap/ __
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
On 04/01, Mark Martinec wrote: eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe Do not forget to backslash-quote dots in a regular expression if you mean a literal dot instead of 'any character'. Eep. That was copied from existing rules. I believe you're right, and there are a bunch of rules that need more escaping. Thanks. -- Will I ever learn? I hope not, I'm having too much fun. - Brent Minime Avis, motorcycle.com http://www.ChaosReigns.com
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
On 04/01, Michael Scheidell wrote: On 4/1/11 2:34 PM, dar...@chaosreigns.com wrote: header RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.0') describe RCVD_IN_IPREPDNS_0 Sender listed athttp://www.chaosreigns.com/iprep/, 0% ham tflags RCVD_IN_IPREPDNS_0 net might actually need a quantity qualifier. (if this ip is 0 % ham... does that actually mean it is 100% spam?) or does that mean that I (so far) only saw one email hit it, and it is spam? It means that all of the email seen from that IP so far has been spam. Which may only have been one email. other than this is marking 'spam rates' and DCC commercial does the same thing for 'bulk' rates, what is the difference between this and DCC? The commercial part. maybe 2nd or 3rd octet could contain 'confidence factor'.. eg: It does, actually. A logarithm of the count of emails seen from that IP (newer emails weighted more than old emails, and scaled up so small old counts are greater than 0). I haven't studied data enough to figure out what threshold is best for what, and I don't think the existing rule definition language provides a good way to specify a range. Also, ignoring it is working quite well. -- I refuse to tip toe through life only to arrive safely at death. http://www.ChaosReigns.com
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
Do not forget to backslash-quote dots in a regular expression if you mean a literal dot instead of 'any character'. Eep. That was copied from existing rules. I believe you're right, and there are a bunch of rules that need more escaping. Thanks. True, there is a bunch of rules that need more escaping. It is noted somewhere in the bug tracking (but not as a standalone ticket), and needs a volunteer to do the cleaning :) Mark
Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script
On 04/01, Mark Martinec wrote: eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe Do not forget to backslash-quote dots in a regular expression if you mean a literal dot instead of 'any character'. Updated rules (thanks again): ifplugin Mail::SpamAssassin::Plugin::DNSEval header __RCVD_IN_IPREPDNS eval:check_rbl('iprep-firsttrusted', 'iprep.chaosreigns.com.') tflags __RCVD_IN_IPREPDNS nice net header RCVD_IN_IPREPDNS_100 eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.100$') describe RCVD_IN_IPREPDNS_100 Sender listed at http://www.chaosreigns.com/iprep/, 100% ham tflags RCVD_IN_IPREPDNS_100 nice net header RCVD_IN_IPREPDNS_50eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.50$') describe RCVD_IN_IPREPDNS_50Sender listed at http://www.chaosreigns.com/iprep/, 50% ham tflags RCVD_IN_IPREPDNS_50nice net header RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', '^127\.\d+\.\d+\.0$') describe RCVD_IN_IPREPDNS_0 Sender listed at http://www.chaosreigns.com/iprep/, 0% ham tflags RCVD_IN_IPREPDNS_0 net meta RCVD_NOT_IN_IPREPDNS ( ! RCVD_IN_IPREPDNS_100 ! RCVD_IN_IPREPDNS_50 ! RCVD_IN_IPREPDNS_0 ! NO_RELAYS ) describe RCVD_NOT_IN_IPREPDNS Sender not listed at http://www.chaosreigns.com/iprep/ tflags RCVD_NOT_IN_IPREPDNS net scoreRCVD_IN_IPREPDNS_100 -0.1 scoreRCVD_IN_IPREPDNS_50-0.0001 scoreRCVD_IN_IPREPDNS_0 0.1 scoreRCVD_NOT_IN_IPREPDNS 0.0001 endif -- Go forth, and be excellent to one another. - http://www.jhuger.com/fredski.php http://www.ChaosReigns.com