Re: add_header all Date of Scan _DATE_
On 09.06.14 05:49, Karsten Bräckelmann wrote: Found the culprit after some digging. Bug 6915 [1], revision 1453407. As a band-aid, the following trivial one-line patch fixes it. Can easily be applied manually. can that by any chance fix problem with Date: in mail received by SSL ? That one behaves similarly... http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk Since it is kind of way past getting late here, and there may be other Template Tags affected, I'll defer proper bug handling and committing code changes for tomorrow. --- lib/Mail/SpamAssassin/Util.pm (revision 1601300) +++ lib/Mail/SpamAssassin/Util.pm (working copy) @@ -582,6 +582,7 @@ } sub time_to_rfc822_date { + my $pms = shift; my($time) = @_; my @days = qw/Sun Mon Tue Wed Thu Fri Sat/; -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Despite the cost of living, have you noticed how popular it remains?
Spam score range and distribution statistics?
As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? Ben
Re: Forged yahoo and mass mailers
I have a few messages that have been incorrectly tagged because the sender used their yahoo address as the sender, but used a mass mailer ( contactbeacon.com) to send their newsletter for them. Apparently this is enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it to be marked as spam. Is there something I'm missing, or is there a better way to do this to avoid the FPs in the future? The problem probably has something to do with Yahoo! (and AOL) publishing strict DMARC records. So anything From: a @yahoo.com (or @aol.com) address that isnt' coming from a Yahoo! (or AOL) mail server is required to be blocked according to DMARC. The mass mailer needs to change the From: address to be something @contactbeacon.com and use the Reply-to: for the email address they want replies to go to. Certainly anything sent From: a @yahoo.com address but from a contactbeacon.com server will be rejected by mail systems that implement DMARC checking, such as Yahoo!, AOL, and more. Anthony -- www.fonant.com - Quality web sites Tel. 01903 867 810 Fonant Ltd is registered in England and Wales, company No. 7006596 Registered office: Amelia House, Crescent Road, Worthing, West Sussex, BN11 1QR
Re: Spam score range and distribution statistics?
On 09.06.14 09:47, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 I don't think it has limits. Maybe just limist for integer. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Eagles may soar, but weasels don't get sucked into jet engines.
Re: Spam score range and distribution statistics?
On Monday 09 June 2014 at 09:50, Matus UHLAR - fantomas wrote: On 09.06.14 09:47, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 I don't think it has limits. Maybe just limist for integer. http://spamassassin.apache.org/gtube for example has a default score of 1000. Antony. -- In fact I wanted to be John Cleese and it took me some time to realise that the job was already taken. - Douglas Adams Please reply to the list; please don't CC me.
Re: add_header all Date of Scan _DATE_
On Mon, 2014-06-09 at 05:49 +0200, Karsten Bräckelmann wrote: On Sun, 2014-06-08 at 20:56 -0500, Chris wrote: In my etc/mail/spamassassin/local.cf I have the above line. I just For completeness: That add_header option does work, although there are actually exactly 3 arguments. add_header { spam | ham | all } header_name string Just like stock configuration shows, the string argument should be enclosed by double quotes. add_header all Date of Scan _DATE_ upgraded to 3.4.0 today and I notice that the 'date of scan' is showing something like this: Sic, it's the (X-Spam-) Date header, not Date of Scan header. ;) X-spam-date: of Scan Sat, 21 Feb 1976 13:57:28 -0500 Does this add header line not work anymore? Previous to the upgrade it was working correctly: X-spam-date: of Scan Sun, 08 Jun 2014 12:35:11 -0500 Interesting. Unrelated to the number of arguments, though... Found the culprit after some digging. Bug 6915 [1], revision 1453407. As a band-aid, the following trivial one-line patch fixes it. Can easily be applied manually. Since it is kind of way past getting late here, and there may be other Template Tags affected, I'll defer proper bug handling and committing code changes for tomorrow. --- lib/Mail/SpamAssassin/Util.pm (revision 1601300) +++ lib/Mail/SpamAssassin/Util.pm (working copy) @@ -582,6 +582,7 @@ } sub time_to_rfc822_date { + my $pms = shift; my($time) = @_; my @days = qw/Sun Mon Tue Wed Thu Fri Sat/; [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6915 Thanks Karsten, that did the trick. Much appreciated. Chris -- Chris KeyID 0xE372A7DA98E6705C 31.11°N 97.89°W (Elev. 1092 ft) 08:31:32 up 6 days, 17:01, 2 users, load average: 0.28, 0.30, 0.26 Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb
Re: Viagra spam not caught
On 06/07/2014 03:55 PM, Matus UHLAR - fantomas wrote: On 06.06.14 18:06, Daniele Paoni wrote: I deleted the bayes database and trained it using real spamham I would not clear the BAYES DB so fast. Even BAYES_00 spam can become BAYES_99 after a few properly trained samples. OK, I will keep it in mind for the next time :-) Today I got another one of these emails, the strange thing is that if I scan it with spamassassin manually the TO_NO_BRKTS_MSFT is triggered but it is not triggered on the original mail scanned with postfix + amavisd-new. did you reload amavis after spamassassin rule updates? Yes I have also rebooted the server for a kernel upgrade so it was definitely restarted.
Re: Can't keep up with spam from SolarVPS sites
On 6/7/2014 3:31 AM, David B Funk wrote: This does require some baby-sitting as it will get traffic that is the results of a real human fat-fingering a legit recipient. Perhaps use just subdomains then? Such as venusflyt...@invalid.uiowa.edu to eliminate the risk of legit fat-fingered email. Regards, KAM
Re: Forged yahoo and mass mailers
On 6/8/2014 10:49 PM, Alex wrote: I have a few messages that have been incorrectly tagged because the sender used their yahoo address as the sender, but used a mass mailer (contactbeacon.com http://contactbeacon.com) to send their newsletter for them. Apparently this is enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it to be marked as spam. Is there something I'm missing, or is there a better way to do this to avoid the FPs in the future? People with Yahoo! accounts (and AOL) and any other senders that have a DMARC policy of reject/quarantine need to use either A) a mailing list sender that has modified their process for DMARC or B) not use those accounts. See http://www.pcworld.com/article/2141120/yahoo-email-antispoofing-policy-breaks-mailing-lists.html Regards, KAM
Re: Spam score range and distribution statistics?
On 6/9/2014 3:47 AM, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? There are no limits on the score. The higher the score, the more likely the email is spam and the lower the score, the more likely it is to be non-spam. Looking through the last month's worth of logs on my server, I see scores ranging from -98 to 101. Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I'm not a statistics guy, so I can't give you all the distribution numbers -- and, as I said, it will likely differ a fair amount between installations. Are you just looking for general information, or is there something you are trying to determine? If you tell us what you are looking for, we may be able to give you some better answers. -- Bowie
Re: Spam score range and distribution statistics?
On 6/9/2014 11:34 AM, Bowie Bailey wrote: On 6/9/2014 3:47 AM, Ben Stover wrote: As far as I found out SpamAssassin calculates the spam score and puts the value into the email header. What is the maximum range of the score? -10,,+10 or other? There are no limits on the score. The higher the score, the more likely the email is spam and the lower the score, the more likely it is to be non-spam. Looking through the last month's worth of logs on my server, I see scores ranging from -98 to 101. Is there a statistic for an average email account how much emails get which score? In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I'm not a statistics guy, so I can't give you all the distribution numbers -- and, as I said, it will likely differ a fair amount between installations. Are you just looking for general information, or is there something you are trying to determine? If you tell us what you are looking for, we may be able to give you some better answers. That spike around zero is going to be your typical boring ham. It passes SPF and some other minor ham rules, and hits very very minor spam rules, if any.
RE: SPAM from a registrar
I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match for com/net domains. I do get some hits for info and us though. But it's normally com and a few us that are on my lists. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. Have you looked into Day old bread? http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB ...Kevin -- Kevin Miller Network/email Administrator, CBJ MIS Dept. 155 South Seward Street .Juneau, Alaska 99801 Phone: (907) 586-0242, Fax: (907) 586-4500 Registered Linux User No: 307357 -Original Message- From: James B. Byrne [mailto:byrn...@harte-lyne.ca] Sent: Wednesday, May 14, 2014 8:52 AM To: users@spamassassin.apache.org Subject: SPAM from a registrar This AM we received (and are continuing to receive) numerous spam messages from multiple domains that were all registered today (2014-05-14) with a company called enom, inc. This firm is also the registrar for the the mail server domain BOSJAW.com that is ending some if not all of the UCEM. That server is hosted in CZ. It seems likely that this is a planned UCEM campaign designed to use disposable domains, probably registered with stolen credit cards or some other form of fraud, in order to escape blacklisting services. No doubt by tomorrow they will be abandoned. Is there any test to check how long a domain name has been in existence and set a spam score with that information? Along the same lines, is there any test to determine the country of origin of the IP address in the last hop before it connects to our servers? - End forwarded message - ---BeginMessage--- I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. Have you looked into Day old bread? http://wiki.apache.org/spamassassin/Rules/URIBL_RHS_DOB ...Kevin -- Kevin Miller Network/email Administrator, CBJ MIS Dept. 155 South Seward Street .Juneau, Alaska 99801 Phone: (907) 586-0242, Fax: (907) 586-4500 Registered Linux User No: 307357 -Original Message- From: James B. Byrne [mailto:byrn...@harte-lyne.ca] Sent: Wednesday, May 14, 2014 8:52 AM To: users@spamassassin.apache.org Subject: SPAM from a registrar This AM we received (and are continuing to receive) numerous spam messages from multiple domains that were all registered today (2014-05-14) with a company called enom, inc. This firm is also the registrar for the the mail server domain BOSJAW.com that is ending some if not all of the UCEM. That server is hosted in CZ. It seems likely that this is a planned UCEM campaign designed to use disposable domains, probably registered with stolen credit cards or some other form of fraud, in order to escape blacklisting services. No doubt by tomorrow they will be abandoned. Is there any test to check how long a domain name has been in existence and set a spam score with that information? Along the same lines, is there any test to determine the country of origin of the IP address in the last hop before it connects to our servers? ---End Message---
Re: Spam score range and distribution statistics?
On Mon, 2014-06-09 at 11:34 -0400, Bowie Bailey wrote: In other words is there something like a gaussian distribution graphic visualisation? That would be different on every server depending on what type of spam and ham you see and which rule sets you are running. I graphed mine out of curiosity and it forms a reasonable bell curve from -14 to 40 peaking at about 9. Although there is an odd spike sticking up from -3 to 1 for some reason (and a rather large spike at 0). I don't think that second spike is odd. That's the majority of your ham. Since the data-set includes both spam and ham combined, there are two spikes to be expected. A single bell curve would mean too many messages in the gray area, no clear distinction between ham and spam, and consequently lots of false positives and negatives. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: SPAM from a registrar
On 6/9/2014 1:23 PM, Patrick Domack wrote: I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match for com/net domains. I do get some hits for info and us though. But it's normally com and a few us that are on my lists. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Regards, KAM
Re: SPAM from a registrar
Quoting Kevin A. McGrail kmcgr...@pccc.com: On 6/9/2014 1:23 PM, Patrick Domack wrote: I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match for com/net domains. I do get some hits for info and us though. But it's normally com and a few us that are on my lists. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... That could be easily done. Only issue is, if you trust the distributed lookups to have accurate infomation. I suppose we could build in a trust system, where if enough distributed clients upload the same info, it could be trusted. This could work out pretty good. Each dns-rbl cluster could run with their own shared database, and you can cross-publish to other dns-rbl clusters, and set your own trust rating, depending on how many copies you get, on if you trust the info, or do your own whois lookup for the info. Bad thing is, I wonder how fast these are hammers out, and if the trust and replication wouldn't matter, due to latency.
Re: SPAM from a registrar
On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 1:23 PM, Patrick Domack wrote: Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Perhaps we should cultivate contacts at a registrar so that the BL can be generated directly off their feed of changes? Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree getting the data for free will be challenging. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws aren't enacted to control guns, they are enacted to control people: catholics (1500s), japanese peasants (1600s), blacks (1860s), italian immigrants (1911), armenians (1911), the irish (1920s), jews (1930s), blacks (1960s), the poor (always) --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: SPAM from a registrar
On 6/9/2014 2:24 PM, Patrick Domack wrote: Quoting Kevin A. McGrail kmcgr...@pccc.com: On 6/9/2014 1:23 PM, Patrick Domack wrote: I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match for com/net domains. I do get some hits for info and us though. But it's normally com and a few us that are on my lists. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... That could be easily done. Only issue is, if you trust the distributed lookups to have accurate infomation. I suppose we could build in a trust system, where if enough distributed clients upload the same info, it could be trusted. This could work out pretty good. Each dns-rbl cluster could run with their own shared database, and you can cross-publish to other dns-rbl clusters, and set your own trust rating, depending on how many copies you get, on if you trust the info, or do your own whois lookup for the info. Bad thing is, I wonder how fast these are hammers out, and if the trust and replication wouldn't matter, due to latency. Thanks for weighing in. These are all issues we've solved with other RBLs via rsync of the data and I want to keep the hurdle low for implementation so you are write about the trust rating, etc.
Domain ages (was Re: SPAM from a registrar)
On Mon, 09 Jun 2014 14:24:19 -0400 Patrick Domack patric...@patrickdk.com wrote: That could be easily done. Only issue is, if you trust the distributed lookups to have accurate infomation. I suppose we could build in a trust system, where if enough distributed clients upload the same info, it could be trusted. There's a company that offers a domain-age-like service: https://www.farsightsecurity.com/Services/NOD/ Their approach is interesting (they receive a huge volume of DNS traffic and keep track of domain lookups that are newly seen.) Their price for practical volumes of lookups, unfortunately, is ridiculously expensive, which has prevented us from pursuing this any further. Regards, David.
Re: SPAM from a registrar
On 6/9/2014 2:33 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 1:23 PM, Patrick Domack wrote: Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Perhaps we should cultivate contacts at a registrar so that the BL can be generated directly off their feed of changes? Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree getting the data for free will be challenging. Good idea. If we can get existing data from trustable sources such as registries, we can add that to the source RBL and then only query the new ones.
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 2:38 PM, David F. Skoll wrote: On Mon, 09 Jun 2014 14:24:19 -0400 Patrick Domack patric...@patrickdk.com wrote: That could be easily done. Only issue is, if you trust the distributed lookups to have accurate infomation. I suppose we could build in a trust system, where if enough distributed clients upload the same info, it could be trusted. There's a company that offers a domain-age-like service: https://www.farsightsecurity.com/Services/NOD/ Their approach is interesting (they receive a huge volume of DNS traffic and keep track of domain lookups that are newly seen.) Their price for practical volumes of lookups, unfortunately, is ridiculously expensive, which has prevented us from pursuing this any further. I think the core issue is that age of domains is a good indicator of spam. So there is merit in building a distributed look-up system using SA. I have more ideas than resources, of course...
Re: SPAM from a registrar
On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 2:33 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 1:23 PM, Patrick Domack wrote: Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Perhaps we should cultivate contacts at a registrar so that the BL can be generated directly off their feed of changes? Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree getting the data for free will be challenging. Good idea. If we can get existing data from trustable sources such as registries, we can add that to the source RBL and then only query the new ones. I was referring to a feed of the new ones. Inferring that is the difficult part, I was hoping there was some way to avoid the inference part. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws aren't enacted to control guns, they are enacted to control people: catholics (1500s), japanese peasants (1600s), blacks (1860s), italian immigrants (1911), armenians (1911), the irish (1920s), jews (1930s), blacks (1960s), the poor (always) --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014, Kevin A. McGrail wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws aren't enacted to control guns, they are enacted to control people: catholics (1500s), japanese peasants (1600s), blacks (1860s), italian immigrants (1911), armenians (1911), the irish (1920s), jews (1930s), blacks (1960s), the poor (always) --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: SPAM from a registrar
Quoting Kevin A. McGrail kmcgr...@pccc.com: On 6/9/2014 2:24 PM, Patrick Domack wrote: Quoting Kevin A. McGrail kmcgr...@pccc.com: On 6/9/2014 1:23 PM, Patrick Domack wrote: I have been tracking this for about 2 weeks now myself. Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I also tried to compair my list to fresh.spameatingmonkey.net, but none of my domains in the 0-5days old would get a match for com/net domains. I do get some hits for info and us though. But it's normally com and a few us that are on my lists. I am currently doing a whois lookups for about 30 tld's, and tracking their time and registar. I do minimize the lookups. I am currently seeing, about 2 .asia, 2 .uk, and then around 100 .com (all the .com are ENOM) sending email to me, with an age 1day old. This is pretty consistant day to day. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... That could be easily done. Only issue is, if you trust the distributed lookups to have accurate infomation. I suppose we could build in a trust system, where if enough distributed clients upload the same info, it could be trusted. This could work out pretty good. Each dns-rbl cluster could run with their own shared database, and you can cross-publish to other dns-rbl clusters, and set your own trust rating, depending on how many copies you get, on if you trust the info, or do your own whois lookup for the info. Bad thing is, I wonder how fast these are hammers out, and if the trust and replication wouldn't matter, due to latency. Thanks for weighing in. These are all issues we've solved with other RBLs via rsync of the data and I want to keep the hurdle low for implementation so you are write about the trust rating, etc. Well, while rsync works, you need a source, if the source was a feed from the tld's themselfs, that would work just fine. The main thing I'm more worried about here is making sure new domains are noticed. Atleast I have seen 1day old domains send a lot more spam than 2-3day old ones. So the new, unknown domain, is going be more important to lookup.
Re: SPAM from a registrar
On Mon, Jun 9, 2014 at 2:39 PM, Kevin A. McGrail kmcgr...@pccc.com wrote: On 6/9/2014 2:33 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 1:23 PM, Patrick Domack wrote: Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Perhaps we should cultivate contacts at a registrar so that the BL can be generated directly off their feed of changes? Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree getting the data for free will be challenging. Good idea. If we can get existing data from trustable sources such as registries, we can add that to the source RBL and then only query the new ones. I haven't been following this whole thread. I always thought it odd to look for new domains. I tend to think that everything is new unless it's been seen before (and there's a bunch of data out there on existing domains) -Jim P.
Re: SPAM from a registrar
On 06/09/2014 08:39 PM, Kevin A. McGrail wrote: On 6/9/2014 2:33 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 1:23 PM, Patrick Domack wrote: Comparing my list of new domains, shows that DOB seems to pick them up after they are 2 days old. I wonder how we can use DNS, an RBL and distributed lookups to get the age of domains AND share the information so it's centrally available... Perhaps we should cultivate contacts at a registrar so that the BL can be generated directly off their feed of changes? Perhaps somebody at DailyChanges.com or WhoisAPI.com? Though I agree getting the data for free will be challenging. Good idea. If we can get existing data from trustable sources such as registries, we can add that to the source RBL and then only query the new ones. WHOIS age data is a good indicator with a handful of TLDs but only in combination with their registrars and NS. Even low scoring on age only will cause lost of surprises. What you want is something like reputation data which URIBL publishes via datafeeds http://www.uribl.com/datasets.shtml domain_data.txt and the you come across such zones as .us which is slow in updating zone data.
Re: Domain ages (was Re: SPAM from a registrar)
Domain age is a good metric to factor in. But I'm always fascinated with some people's desire to block all messages with extremely new domains. (NOT saying that this applies to everyone who posted on this thread!) Keep in mind that many large and famous businesses... who have fairly good mail sending practices... sometimes launch a new products complete with links to very newly registered domains. Same is often true for advertisments for things like rock concerts, etc. Or web sites that deal with specific events or hot-topic political issues that appeared out of nowhere. Yes, some of these are UBE. But many are NOT! These example provide one of the largest source of FPs for all the major domain/URI blacklists. But the better domain/URI blacklists have good mechanisms in place to (a) PREVENT... MANY of these from ever becoming FPs in the first place, and (b) and where those mechanism failed, they have good triggers/feedback to remove whitelist such FPs VERY QUICKLY if/when they do occur. In contrast, many who might go overboard by outright blocking on newness... and/or scoring too agressively on newness... may find too-high FP problems kicking their butts in the long run. And when such a FP starts happening, they may not have the proper telemetry to catch/fix it until AFTER much FP damage has happened. Personally, I think that the real problem here is that some of the most famous URI/domain blacklists are NOT catching everything and/or NOT catching everything fast enough... combined with many sys admins failing to make use of ALL the good and low-FP URI/domain blacklists... where they 'd see MUCH better results if they were using ALL of the good URI blacklists! ...but I'm a little biased on this point! :) -- Rob McEwen +1 (478) 475-9032
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014 11:51:21 -0700 (PDT) John Hardin jhar...@impsec.org wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? Well, here's how it could be done. Imagine someone runs a DNS zone for newdomain.example.net. You want to see if example.org is a new domain, so you look up a TXT record for example.org.newdomain.example.net. The DNS software that serves the zone newdomain.example.net runs the following pseudo-code when example.org is looked up: IF example.org is in my database THEN return the TXT record associated with example.org update the last-looked-up time for example.org ELSE generate a TXT record of the form MMDDHHMMSS corresponding to current time (UTC) insert it in the database return it ENDIF A background job will periodically clean out domains that haven't been queried in a long time. The clever part is that once lots of sites begin using this in their SA setups, we'll very quickly build up quite an accurate database of newly-seen domains that's completely independent of any registrar for a data source. Yes, spammers can poison it by specifically looking up a domain, waiting a couple of days, and then spamming. But I think most won't bother (witness how effective greylisting still is.) Furthermore, you can ignore all but the first few hundred lookups before you enter the TXT record in the database; this will make it more expensive for spammers to poison the data. Or you could not enter a record in the database until it has been looked up from 100 different IP addresses... I can think of a few other countermeasures. So who's volunteering to do this? :) Regards, David.
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 2:51 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? Yes. Because whois data is hard to get and many whois servers limit lookups, distributing and sharing the lookup load to determine age of domains IMO has merit.
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 3:02 PM, Rob McEwen wrote: Domain age is a good metric to factor in. But I'm always fascinated with some people's desire to block all messages with extremely new domains. (NOT saying that this applies to everyone who posted on this thread!) Keep in mind that many large and famous businesses... who have fairly good mail sending practices... sometimes launch a new products complete with links to very newly registered domains. Same is often true for advertisments for things like rock concerts, etc. Or web sites that deal with specific events or hot-topic political issues that appeared out of nowhere. Yes, some of these are UBE. But many are NOT! These example provide one of the largest source of FPs for all the major domain/URI blacklists. But the better domain/URI blacklists have good mechanisms in place to (a) PREVENT... MANY of these from ever becoming FPs in the first place, and (b) and where those mechanism failed, they have good triggers/feedback to remove whitelist such FPs VERY QUICKLY if/when they do occur. In contrast, many who might go overboard by outright blocking on newness... and/or scoring too agressively on newness... may find too-high FP problems kicking their butts in the long run. And when such a FP starts happening, they may not have the proper telemetry to catch/fix it until AFTER much FP damage has happened. Personally, I think that the real problem here is that some of the most famous URI/domain blacklists are NOT catching everything and/or NOT catching everything fast enough... combined with many sys admins failing to make use of ALL the good and low-FP URI/domain blacklists... where they 'd see MUCH better results if they were using ALL of the good URI blacklists! ...but I'm a little biased on this point! :) A great point. My goal is simply to build a system to identify the age of domains and use it as YAIOS or yet another indicator of spamminess not as a poison pill.
Re: Domain ages (was Re: SPAM from a registrar)
Quoting David F. Skoll d...@roaringpenguin.com: On Mon, 9 Jun 2014 11:51:21 -0700 (PDT) John Hardin jhar...@impsec.org wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? Well, here's how it could be done. Imagine someone runs a DNS zone for newdomain.example.net. You want to see if example.org is a new domain, so you look up a TXT record for example.org.newdomain.example.net. The DNS software that serves the zone newdomain.example.net runs the following pseudo-code when example.org is looked up: IF example.org is in my database THEN return the TXT record associated with example.org update the last-looked-up time for example.org ELSE generate a TXT record of the form MMDDHHMMSS corresponding to current time (UTC) insert it in the database return it ENDIF A background job will periodically clean out domains that haven't been queried in a long time. The clever part is that once lots of sites begin using this in their SA setups, we'll very quickly build up quite an accurate database of newly-seen domains that's completely independent of any registrar for a data source. Yes, spammers can poison it by specifically looking up a domain, waiting a couple of days, and then spamming. But I think most won't bother (witness how effective greylisting still is.) Furthermore, you can ignore all but the first few hundred lookups before you enter the TXT record in the database; this will make it more expensive for spammers to poison the data. Or you could not enter a record in the database until it has been looked up from 100 different IP addresses... I can think of a few other countermeasures. So who's volunteering to do this? :) Regards, David. The point was, I have already done this, and have it in production. I did this cause this subject keeps coming up from time to time, and I was personally interested to see the results of it. And I do agree with Rob McEwen on many points. And I would be hisentant to outright block. But so far, and I doubt much in real usage, and haven't found any yet, any issues with blocking 1day outright. But then the only way to be completely sure of that, will be time.
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014, David F. Skoll wrote: On Mon, 9 Jun 2014 11:51:21 -0700 (PDT) John Hardin jhar...@impsec.org wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? The clever part is that once lots of sites begin using this in their SA setups, we'll very quickly build up quite an accurate database of newly-seen domains that's completely independent of any registrar for a data source. Ah, ok, that's where I was confused. The proposal is for a distributed network gathering newly-SEEN domain names, rather than newly-REGISTERED domain names. Thanks for the clarification. I was focusing on the latter. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You can't reason a person out of a position if he didn't use reason to get there in the first place. -- Kristopher, at Marko's --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 09 Jun 2014 15:24:29 -0400 Patrick Domack patric...@patrickdk.com wrote: The point was, I have already done this, and have it in production. I did this cause this subject keeps coming up from time to time, and I was personally interested to see the results of it. Interesting. If you don't mind my asking... how much data do you collect? How many lookups/day? I was thinking a system that gets lookups from thousands or more SA installations would get a pretty good overview of new domains. A local installation would necessarily see a limited subset. And I do agree with Rob McEwen on many points. And I would be hisentant to outright block. But so far, and I doubt much in real usage, and haven't found any yet, any issues with blocking 1day outright. Or even just holding the mail for a day or so and then re-analyzing it. Regards, David.
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 3:24 PM, Patrick Domack wrote: The point was, I have already done this, and have it in production. I did this cause this subject keeps coming up from time to time, and I was personally interested to see the results of it. And I do agree with Rob McEwen on many points. And I would be hisentant to outright block. But so far, and I doubt much in real usage, and haven't found any yet, any issues with blocking 1day outright. But then the only way to be completely sure of that, will be time. My conjecture is that many people have built this for lower volume. But you can't be doing much volume or your IP gets blocked from whois servers. The twist I want to do is bring more data back centralized from SA installations such as whois data where it can only be done in a distributed manner. regards, KAM
RE: Domain ages (was Re: SPAM from a registrar)
If SEM was able to detect newly registered domains more quickly then that would solve the problem. From: John Hardin jhar...@impsec.org Sent: Monday, June 09, 2014 2:24 PM To: users@spamassassin.apache.org Subject: Re: Domain ages (was Re: SPAM from a registrar) On Mon, 9 Jun 2014, David F. Skoll wrote: On Mon, 9 Jun 2014 11:51:21 -0700 (PDT) John Hardin jhar...@impsec.org wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? The clever part is that once lots of sites begin using this in their SA setups, we'll very quickly build up quite an accurate database of newly-seen domains that's completely independent of any registrar for a data source. Ah, ok, that's where I was confused. The proposal is for a distributed network gathering newly-SEEN domain names, rather than newly-REGISTERED domain names. Thanks for the clarification. I was focusing on the latter. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You can't reason a person out of a position if he didn't use reason to get there in the first place. -- Kristopher, at Marko's --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 2:51 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? Yes. Because whois data is hard to get and many whois servers limit lookups, distributing and sharing the lookup load to determine age of domains IMO has merit. Ah, I think there's still two different assumptions occurring in this discussion: newly-seen (David and Patrick) vs. newly-registered (me and Kevin)... Maybe we need to clarify that first. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You can't reason a person out of a position if he didn't use reason to get there in the first place. -- Kristopher, at Marko's --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 3:33 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: On 6/9/2014 2:51 PM, John Hardin wrote: On Mon, 9 Jun 2014, Kevin A. McGrail wrote: So there is merit in building a distributed look-up system using SA. Distributed lookup of *what*, though? Can you clarify that part of your idea? Are you referring to distributed whois queries for a domain name, to determine its age? Yes. Because whois data is hard to get and many whois servers limit lookups, distributing and sharing the lookup load to determine age of domains IMO has merit. Ah, I think there's still two different assumptions occurring in this discussion: newly-seen (David and Patrick) vs. newly-registered (me and Kevin)... Maybe we need to clarify that first. Good clarification. The spam I envision stopping is spammers using things like stolen credit cards or trial accounts to register domains that they then spam and then disappear quite quickly. So this builds a database of domain whois data (initial discussions focused on the creation date) using distributed SA nodes to build the data. And I chose to discuss it here because I get more ideas than I have time and resources to implement. Regards, KAM
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 3:31 PM, David Jones wrote: If SEM was able to detect newly registered domains more quickly then that would solve the problem. That is the crux of the issue, yes. So how do you identify new domains if the registrars/registries won't give you the data? That's the problem my idea solves by monitoring newly seen domains with the idea being that spammers are not going to buy domains and sit on them before using them. Regards, KAM
RE: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014, David Jones wrote: If SEM was able to detect newly registered domains more quickly then that would solve the problem. Oh, agreed. The problem is, a registrar feed of registration changes costs a lot, and this is a free project. That's why I suggested trying to develop relationships with registrars, to maybe get them onboard with providing this data for free for this purpose. It's possible that the Apache name could provide cachet to get registars onboard to provide rsync'able data feeds of domain names registered in the last N days. It might be possible/better to get them to provide the data to URIBL.org (to act as an aggregator) with a license to provide the data free via DNS (i.e. non-bulk access) and at a nominal fee for rsync access (which URIBL already charges for the data they collect). -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You can't reason a person out of a position if he didn't use reason to get there in the first place. -- Kristopher, at Marko's --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On 06/09/2014 09:38 PM, Kevin A. McGrail wrote: That is the crux of the issue, yes. So how do you identify new domains if the registrars/registries won't give you the data? That's the problem my idea solves by monitoring newly seen domains with the idea being that spammers are not going to buy domains and sit on them before using them. You get the TLD zone files... and depending on your budget you get them once/24hrs or hourly diffs (if you can affford a house in The Hamptons, you can afford the diffs .-) Some TLDs won't handout zone, period.
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com wrote: I think the core issue is that age of domains is a good indicator of spam. So there is merit in building a distributed look-up system using SA. I have more ideas than resources, of course... I repeat my question: which domain? HELO, MAIL FROM, From:, ...? -- Matthias
Re: Domain ages (was Re: SPAM from a registrar)
On 6/9/2014 4:25 PM, Matthias Leisi wrote: On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com mailto:kmcgr...@pccc.com wrote: I think the core issue is that age of domains is a good indicator of spam. So there is merit in building a distributed look-up system using SA. I have more ideas than resources, of course... I repeat my question: which domain? HELO, MAIL FROM, From:, ...? I envision it for potentially any and all domains in the email.
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, Jun 9, 2014 at 9:11 PM, David F. Skoll d...@roaringpenguin.com wrote: The clever part is that once lots of sites begin using this in their SA setups, we'll very quickly build up quite an accurate database of newly-seen domains that's completely independent of any registrar for a data source. dnswl.org (and many other DNSxLs) already have some of that data as part of their parsing/handling of DNS logs. For Furthermore, you can ignore all but the first few hundred lookups before you enter the TXT record in the database; this will make it more expensive for spammers to poison the data. Or you could not enter a record in the database until it has been looked up from 100 different IP addresses... I can think of a few other countermeasures. So who's volunteering to do this? :) We had some plans to publish such data. However since it is not really clear what domains to look for, we did not pursue that a lot further. We have at least a primary domain for each DNSWL record, but at least historically we were not strict in what type of domain to put there (we like to use the domain name that most closely links the IPs to the administratively responsible owner, which is admittedly somewhat vague). Based on the useage data we gather, we can pretty accurately extract a last seen date for a particular domain (or, it's associated IPs to be exact). *But*, again: which domains would be queried for such a list? -- Matthias
Re: Domain ages (was Re: SPAM from a registrar)
Quoting Matthias Leisi matth...@leisi.net: On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com wrote: I think the core issue is that age of domains is a good indicator of spam. So there is merit in building a distributed look-up system using SA. I have more ideas than resources, of course... I repeat my question: which domain? HELO, MAIL FROM, From:, ...? -- Matthias HELO hasn't matched anything in my tests. MAIL FROM has matched many, though the helo's are always a different domain From I have only started doing yesterday, and not sure exactly how I will track them. Likely just wait a few days, and check my ham/spam folders and compare what rules where hit.
Re: Domain ages (was Re: SPAM from a registrar)
On 06/09/2014 10:32 PM, Patrick Domack wrote: Quoting Matthias Leisi matth...@leisi.net: On Mon, Jun 9, 2014 at 8:43 PM, Kevin A. McGrail kmcgr...@pccc.com wrote: I think the core issue is that age of domains is a good indicator of spam. So there is merit in building a distributed look-up system using SA. I have more ideas than resources, of course... I repeat my question: which domain? HELO, MAIL FROM, From:, ...? -- Matthias HELO hasn't matched anything in my tests. MAIL FROM has matched many, though the helo's are always a different domain From I have only started doing yesterday, and not sure exactly how I will track them. Likely just wait a few days, and check my ham/spam folders and compare what rules where hit. LOTS of the recent .us .me will match sender/ptr/A/HELO
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, 9 Jun 2014 22:31:55 +0200 Matthias Leisi matth...@leisi.net wrote: *But*, again: which domains would be queried for such a list? I think MAIL FROM domain. Regards, David.
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, June 9, 2014 15:35, Patrick Domack wrote: I guess what would need to be hammered out, is, the exact info wanted. We know age, and registrar. Though doing the registrar isn't so simple, as the same for just ENOM changes between tld, and even within a single tld (likely from the mergers they had). My investigations of the domains used against us revealed that all of the handful checked were between 4 and 20 hours old when first encountered by our servers. It would suffice I think to have a negative lookup RTBL service where if a domain is not listed therein then may be considered as new, at least insofar as mailing traffic is concerned. The registrar and the age of the domain need not concern us overmuch at the outset of a spam attack. What is more important to know is whether the domain has been seen by others before and how long before so that the information in DOB and SEM can be considered in that light. Lookup domains may be added as and when they are encountered albeit after some delay and only if some threshold of volume and distinct number of enquiring hosts is passed. A graded approach is probably called for with one listing a previously unseen domain only after 24 hours from the first enquiry, one only after 48, and so on. Of course, the domains in question need to be verified before being added. And other precautions are no doubt necessary to avoid poisoning or advance loading subversion attempts. Comments? -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, Jun 9, 2014 at 9:11 PM, David F. Skoll d...@roaringpenguin.com wrote: The DNS software that serves the zone newdomain.example.net runs the following pseudo-code when example.org is looked up: [..] So who's volunteering to do this? :) *raises hand* I still have an experimental DNS server (written in Perl) lying around that this more-or-less what is described here. The overall system would need a bit more thought, though. * Distributed over n nodes. Given that data can have pretty long TTL, it does not need a lot of nodes, but still the distributed nature brings some challenges. * Definition of the granularity of data - should a first seen date be returned, or an age (in days?) * Querying whois servers is not practical at that scale. * How would the queries be sent to the nodes? Domain-based BL-type queries? * Would the SA project take on some operational responsibilities? * The dnswl.org project can sponsor resources and take on some operational aspects, but we would welcome some support. -- Matthias
Local BL support?
I’d like to add a plugin (and eventually share it once the bugs are out) that uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for known offending address blocks, or else using the Geo::IP module to blacklist based on the country or ISP. It would need to expose parts of the API depending on how it detects the presence of modules, I suppose. Not sure if it’s worth making run-time detection of the Geo::IP licenses and databases do the same. Is there a prototype Plugin that I could use for doing parsing/looking up the URI’s hostname? Since I’m using a local database without network access, it could happen synchronously… Thanks, -Philip
Re: Local BL support?
On 06/09/2014 10:46 PM, Philip Prindeville wrote: I’d like to add a plugin (and eventually share it once the bugs are out) that uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for known offending address blocks, or else using the Geo::IP module to blacklist based on the country or ISP. It would need to expose parts of the API depending on how it detects the presence of modules, I suppose. Not sure if it’s worth making run-time detection of the Geo::IP licenses and databases do the same. Is there a prototype Plugin that I could use for doing parsing/looking up the URI’s hostname? Since I’m using a local database without network access, it could happen synchronously… Thanks, The standard SA URIBL.pm ? put your data in a local NS instance (rbldnsd, bind, whatever you prefer)
Re: Can't keep up with spam from SolarVPS sites
On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote: If you have to post a spam sample, pls use pastebin and post the full msg On 06/06/2014 11:32 PM, Philip Prindeville wrote: We’re getting a lot of spam that contains URL’s which look like (remove the ): http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol Some observations… The URL’s should be fairly easy to filter against via a regex. Anyone have some working rules they could share? Pls note than any rule shared via lists usually looses its teeth within a few hours .-) Well, it depends on the nature of the rule… Some characteristics are less fungible than others. The other thing is, the URL is almost always hosted by solarvps.com, in the CIDR block 65.181.64.0/18. Is there an easy way to do a domain lookup on the host portion of the URL and then filter it if it’s in this subnet? Yes, there is: run a local A record blacklist with rbldnsd 65.181.64.0/18 and a rule like, for example: uridnssub YOUR_A_URIBL yourabl.example.net. A 127.0.0.2 body YOUR_A_URIBLeval:check_uridnsbl('YOUR_A_URIBL') describe YOUR_A_URIBLURL domain A rec listed by YOUR_A_URIBL score YOUR_A_URIBL 5.0 tflags YOUR_A_URIBL net a If I used local A records, for a /18 network, I’d need all 2^14 records, right? Because a lookup is always on a full dotted-quad (in reverse order)… I tried using multi.uribl.com and couldn’t get this to work. I had: urirhssub L_URIBL_BLACK multi.uribl.com. A 2 body L_URIBL_BLACK eval:check_uridnsbl('L_URIBL_BLACK') describe L_URIBL_BLACK Contains a URL listed in the URIBL blacklist tflags L_URIBL_BLACKnet score L_URIBL_BLACK 20.0 set, and also: skip_rbl_checks 0 at the end of /etc/mail/spamassassin/sa-mimedefang.cf set. Running this over the message in a file: spamassassin -t --lint -D /tmp/cable.eml I get: … Jun 9 14:57:13.029 [32297] dbg: rules: compiled meta tests Jun 9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5 Jun 9 14:57:13.032 [32297] dbg: check: tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS Jun 9 14:57:13.032 [32297] dbg: check: subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID Jun 9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), get_uri_detail_list: 1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 202 (10.6%), compile_eval: 37 (1.9%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 7 (0.4%), tests_pri_-400: 6 (0.3%), tests_pri_0: 404 (21.2%), tests_pri_500: 75 (3.9%) so I’m not sure why it’s failing to find nqtel.com in the uribl.com database. What am I missing? -Philip
Re: Domain ages (was Re: SPAM from a registrar)
On 06/09/2014 10:43 PM, James B. Byrne wrote: On Mon, June 9, 2014 15:35, Patrick Domack wrote: I guess what would need to be hammered out, is, the exact info wanted. We know age, and registrar. Though doing the registrar isn't so simple, as the same for just ENOM changes between tld, and even within a single tld (likely from the mergers they had). My investigations of the domains used against us revealed that all of the handful checked were between 4 and 20 hours old when first encountered by our servers. It would suffice I think to have a negative lookup RTBL service where if a domain is not listed therein then may be considered as new, at least insofar as mailing traffic is concerned. The registrar and the age of the domain need not concern us overmuch at the outset of a spam attack. What is more important to know is whether the domain has been seen by others before and how long before so that the information in DOB and SEM can be considered in that light. Lookup domains may be added as and when they are encountered albeit after some delay and only if some threshold of volume and distinct number of enquiring hosts is passed. A graded approach is probably called for with one listing a previously unseen domain only after 24 hours from the first enquiry, one only after 48, and so on. Of course, the domains in question need to be verified before being added. And other precautions are no doubt necessary to avoid poisoning or advance loading subversion attempts. Comments? You have a domain reputation method on your drawing board and imo, has some flaws: - Delayed data is good for research, not to efficiently stop spam. - Verifying anything that large needs 40k indians in the basement or huge clusters of cycles doing something - neither is trivial or cheap. - There's a bunch of Passsive DNS projects which do what you're describing and non will work as the FUSSP - they're datapoints which can be combined wiht other stuff to achieve something (aka research)
Re: Can't keep up with spam from SolarVPS sites
On 06/09/2014 11:03 PM, Philip Prindeville wrote: On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote: If you have to post a spam sample, pls use pastebin and post the full msg On 06/06/2014 11:32 PM, Philip Prindeville wrote: We’re getting a lot of spam that contains URL’s which look like (remove the ): http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol Some observations… The URL’s should be fairly easy to filter against via a regex. Anyone have some working rules they could share? Pls note than any rule shared via lists usually looses its teeth within a few hours .-) Well, it depends on the nature of the rule… Some characteristics are less fungible than others. The other thing is, the URL is almost always hosted by solarvps.com, in the CIDR block 65.181.64.0/18. Is there an easy way to do a domain lookup on the host portion of the URL and then filter it if it’s in this subnet? Yes, there is: run a local A record blacklist with rbldnsd 65.181.64.0/18 and a rule like, for example: uridnssub YOUR_A_URIBL yourabl.example.net. A 127.0.0.2 body YOUR_A_URIBL eval:check_uridnsbl('YOUR_A_URIBL') describe YOUR_A_URIBL URL domain A rec listed by YOUR_A_URIBL score YOUR_A_URIBL 5.0 tflags YOUR_A_URIBL net a If I used local A records, for a /18 network, I’d need all 2^14 records, right? Because a lookup is always on a full dotted-quad (in reverse order)… nope... wiht robldnsd you set your BL zone to use the ip4trie dataset which as per http://www.corpit.ru/mjt/rbldnsd/rbldnsd.8.html ip4trie Dataset Set of IP4 CIDR ranges with corresponding (A, TXT) values. This dataset is similar to ip4set, but uses a different internal representation. It accepts CIDR ranges only (not a.b.c.d−e.f.g.h), and allows for the specification of A/TXT values on a per CIDR range basis. (If multiple CIDR ranges match a query, the value for longest matching prefix is returned.) Exclusions are supported too. I tried using multi.uribl.com and couldn’t get this to work. I had: urirhssub L_URIBL_BLACK multi.uribl.com. A 2 body L_URIBL_BLACK eval:check_uridnsbl('L_URIBL_BLACK') describe L_URIBL_BLACK Contains a URL listed in the URIBL blacklist tflags L_URIBL_BLACKnet score L_URIBL_BLACK 20.0 URIBL is enabled by default in SA - no need to add extra rules. set, and also: skip_rbl_checks 0 at the end of /etc/mail/spamassassin/sa-mimedefang.cf set. Running this over the message in a file: spamassassin -t --lint -D /tmp/cable.eml I get: … Jun 9 14:57:13.029 [32297] dbg: rules: compiled meta tests Jun 9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5 Jun 9 14:57:13.032 [32297] dbg: check: tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS Jun 9 14:57:13.032 [32297] dbg: check: subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID Jun 9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), get_uri_detail_list: 1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 202 (10.6%), compile_eval: 37 (1.9%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 7 (0.4%), tests_pri_-400: 6 (0.3%), tests_pri_0: 404 (21.2%), tests_pri_500: 75 (3.9%) so I’m not sure why it’s failing to find nqtel.com in the uribl.com database. What am I missing? --lint doesn't do network tests
Re: Domain ages (was Re: SPAM from a registrar)
On 06/09/2014 12:29 PM, Kevin A. McGrail wrote: On 6/9/2014 3:24 PM, Patrick Domack wrote: The point was, I have already done this, and have it in production. I did this cause this subject keeps coming up from time to time, and I was personally interested to see the results of it. And I do agree with Rob McEwen on many points. And I would be hisentant to outright block. But so far, and I doubt much in real usage, and haven't found any yet, any issues with blocking 1day outright. But then the only way to be completely sure of that, will be time. My conjecture is that many people have built this for lower volume. But you can't be doing much volume or your IP gets blocked from whois servers. The twist I want to do is bring more data back centralized from SA installations such as whois data where it can only be done in a distributed manner. regards, KAM A caching whois client (jwhois, for example) can significantly reduce the volume of queries.
Re: Local BL support?
On Mon, 9 Jun 2014, Axb wrote: On 06/09/2014 10:46 PM, Philip Prindeville wrote: I’d like to add a plugin (and eventually share it once the bugs are out) that uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for known offending address blocks, or else using the Geo::IP module to blacklist based on the country or ISP. Is there a prototype Plugin that I could use for doing parsing/looking up the URI’s hostname? Since I’m using a local database without network access, it could happen synchronously… The standard SA URIBL.pm ? put your data in a local NS instance (rbldnsd, bind, whatever you prefer) Second URIBL.pm. For small sites it would be nice if it supported specifying a netblock explicitly in the rule. If you're only doing a few that would be easier than setting up a zone or rbldnsd. You might look at extending URIBL.pm to do that. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- You are in a maze of twisty little protocols, all written by Microsoft. -- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle lists...@islandnetworks.com wrote: A caching whois client (jwhois, for example) can significantly reduce the volume of queries. You will need to query potentially hundreds or thousands of domains *per day* - mostly throw away domains from spammers. 1) What are the typical rate limits on public whois servers? 2) How to protect against attackers sending random non-existant domain names your way, thus ensuring you hit rate limites early? 3) How to parse the myriads of formats sent by whois servers? 4) How do you handle TLDs which do not publish registration dates, like eg .de? (At least they did not last time I checked.) Whois is not a feasible data source. -- Matthias
Re: Domain ages (was Re: SPAM from a registrar)
Quoting Matthias Leisi matth...@leisi.net: On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle lists...@islandnetworks.com wrote: A caching whois client (jwhois, for example) can significantly reduce the volume of queries. You will need to query potentially hundreds or thousands of domains *per day* - mostly throw away domains from spammers. 1) What are the typical rate limits on public whois servers? 2) How to protect against attackers sending random non-existant domain names your way, thus ensuring you hit rate limites early? 3) How to parse the myriads of formats sent by whois servers? 4) How do you handle TLDs which do not publish registration dates, like eg .de? (At least they did not last time I checked.) Whois is not a feasible data source. -- Matthias 1) I dunno, but I am doing around 15k lookups a day, from a single ip, without getting limited/blocked 2) This is hard, and I don't know, currently the postfix reject unknown sender helps solve this for me, but won't for dns based lookups 3) This, while annoying, is solved in my code, not too hard 4) These I just don't bother doing lookups for, there is no solution, other than to let them bypass this system, or rate them via seen before method.
Re: Can't keep up with spam from SolarVPS sites
On Jun 9, 2014, at 3:10 PM, Axb axb.li...@gmail.com wrote: On 06/09/2014 11:03 PM, Philip Prindeville wrote: On Jun 6, 2014, at 3:50 PM, Axb axb.li...@gmail.com wrote: If you have to post a spam sample, pls use pastebin and post the full msg On 06/06/2014 11:32 PM, Philip Prindeville wrote: We’re getting a lot of spam that contains URL’s which look like (remove the ): http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol Some observations… The URL’s should be fairly easy to filter against via a regex. Anyone have some working rules they could share? Pls note than any rule shared via lists usually looses its teeth within a few hours .-) Well, it depends on the nature of the rule… Some characteristics are less fungible than others. BTW, I found that the last N characters of the above URL’s were always the same, and tried to do a “body” rule based on those last N characters, but I couldn’t get the rule to match. Still not sure why. The entire a ... sequence is only 382 characters long. Any ideas? The other thing is, the URL is almost always hosted by solarvps.com, in the CIDR block 65.181.64.0/18. Is there an easy way to do a domain lookup on the host portion of the URL and then filter it if it’s in this subnet? Yes, there is: run a local A record blacklist with rbldnsd 65.181.64.0/18 and a rule like, for example: uridnssub YOUR_A_URIBL yourabl.example.net. A 127.0.0.2 body YOUR_A_URIBL eval:check_uridnsbl('YOUR_A_URIBL') describe YOUR_A_URIBL URL domain A rec listed by YOUR_A_URIBL score YOUR_A_URIBL 5.0 tflags YOUR_A_URIBL net a If I used local A records, for a /18 network, I’d need all 2^14 records, right? Because a lookup is always on a full dotted-quad (in reverse order)… nope... wiht robldnsd you set your BL zone to use the ip4trie dataset which as per http://www.corpit.ru/mjt/rbldnsd/rbldnsd.8.html ip4trie Dataset Set of IP4 CIDR ranges with corresponding (A, TXT) values. This dataset is similar to ip4set, but uses a different internal representation. It accepts CIDR ranges only (not a.b.c.d−e.f.g.h), and allows for the specification of A/TXT values on a per CIDR range basis. (If multiple CIDR ranges match a query, the value for longest matching prefix is returned.) Exclusions are supported too. Okay, and what would 65.181.64.0/18 look like as a BIND RR? I wasn’t able to infer this from the documentation you pointed at. I tried using multi.uribl.com and couldn’t get this to work. I had: urirhssub L_URIBL_BLACK multi.uribl.com. A 2 body L_URIBL_BLACK eval:check_uridnsbl('L_URIBL_BLACK') describe L_URIBL_BLACK Contains a URL listed in the URIBL blacklist tflags L_URIBL_BLACKnet score L_URIBL_BLACK 20.0 URIBL is enabled by default in SA - no need to add extra rules. set, and also: skip_rbl_checks 0 at the end of /etc/mail/spamassassin/sa-mimedefang.cf set. Running this over the message in a file: spamassassin -t --lint -D /tmp/cable.eml I get: … Jun 9 14:57:13.029 [32297] dbg: rules: compiled meta tests Jun 9 14:57:13.032 [32297] dbg: check: is spam? score=-2.348 required=5 Jun 9 14:57:13.032 [32297] dbg: check: tests=L_EMPTY_SENDER,MISSING_DATE,MISSING_HEADERS,NO_RECEIVED,NO_RELAYS Jun 9 14:57:13.032 [32297] dbg: check: subtests=__BODY_TEXT_LINE,__EMPTY_BODY,__EMPTY_SENDER,__GATED_THROUGH_RCVD_REMOVER,__HAS_FROM,__HAS_MESSAGE_ID,__HAS_MSGID,__HAS_SUBJECT,__L_UNDISCLOSED2,__MISSING_REF,__MISSING_REPLY,__MSGID_OK_DIGITS,__MSGID_OK_HOST,__MSOE_MID_WRONG_CASE,__NONEMPTY_BODY,__NOT_SPOOFED,__SANE_MSGID,__TO_NO_ARROWS_R,__UNUSABLE_MSGID Jun 9 14:57:13.033 [32297] dbg: timing: total 1908 ms - init: 1384 (72.5%), parse: 1.17 (0.1%), extract_message_metadata: 11 (0.6%), get_uri_detail_list: 1.06 (0.1%), tests_pri_-1000: 9 (0.5%), compile_gen: 202 (10.6%), compile_eval: 37 (1.9%), tests_pri_-950: 6 (0.3%), tests_pri_-900: 7 (0.4%), tests_pri_-400: 6 (0.3%), tests_pri_0: 404 (21.2%), tests_pri_500: 75 (3.9%) so I’m not sure why it’s failing to find nqtel.com in the uribl.com database. What am I missing? --lint doesn't do network tests Okay, taking out --lint changed the results. Thanks, -Philip
Re: Local BL support?
On Jun 9, 2014, at 3:36 PM, John Hardin jhar...@impsec.org wrote: On Mon, 9 Jun 2014, Axb wrote: On 06/09/2014 10:46 PM, Philip Prindeville wrote: I’d like to add a plugin (and eventually share it once the bugs are out) that uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for known offending address blocks, or else using the Geo::IP module to blacklist based on the country or ISP. Is there a prototype Plugin that I could use for doing parsing/looking up the URI’s hostname? Since I’m using a local database without network access, it could happen synchronously… The standard SA URIBL.pm ? put your data in a local NS instance (rbldnsd, bind, whatever you prefer) Second URIBL.pm. For small sites it would be nice if it supported specifying a netblock explicitly in the rule. If you're only doing a few that would be easier than setting up a zone or rbldnsd. You might look at extending URIBL.pm to do that. I’m happy to try doing that, since I know Perl and need this… I’m just lacking on the expertise about doing SA modules… Anyone want to walk me through it? -Philip
Re: Can't keep up with spam from SolarVPS sites
On Mon, 9 Jun 2014, Philip Prindeville wrote: We’re getting a lot of spam that contains URL’s which look like (remove the ): http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol BTW, I found that the last N characters of the above URL’s were always the same, and tried to do a “body” rule based on those last N characters, but I couldn’t get the rule to match. Still not sure why. The entire a ... sequence is only 382 characters long. Any ideas? If it's in an HTML anchor tag the URL itself isn't in the body text, only the display label will be. Try a uri rule. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws cannot reduce violent crime, because gun control laws focus obsessively on a tool a criminal might use to commit a crime rather than the criminal himself and his act of violence. --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: Domain ages (was Re: SPAM from a registrar)
On 06/09/2014 02:42 PM, Matthias Leisi wrote: On Mon, Jun 9, 2014 at 11:31 PM, Richard Doyle lists...@islandnetworks.com mailto:lists...@islandnetworks.com wrote: A caching whois client (jwhois, for example) can significantly reduce the volume of queries. You will need to query potentially hundreds or thousands of domains *per day* - mostly throw away domains from spammers. 1) What are the typical rate limits on public whois servers? Apparently higher than my usage (cached names aren't rechecked) 2) How to protect against attackers sending random non-existant domain names your way, thus ensuring you hit rate limites early? Sender verification 3) How to parse the myriads of formats sent by whois servers? Don't try (see 4) 4) How do you handle TLDs which do not publish registration dates, like eg .de? (At least they did not last time I checked.) I only check .com, .net and .org Whois is not a feasible data source. Whois certainly has limited usefulness, but is a feasible data source within those limits -- Matthias -Richard
Re: Local BL support?
On Mon, 9 Jun 2014, Philip Prindeville wrote: On Jun 9, 2014, at 3:36 PM, John Hardin jhar...@impsec.org wrote: On Mon, 9 Jun 2014, Axb wrote: On 06/09/2014 10:46 PM, Philip Prindeville wrote: I’d like to add a plugin (and eventually share it once the bugs are out) that uses either Net::CIDR::Lite to allow manual entry of IP-based blacklists for known offending address blocks, or else using the Geo::IP module to blacklist based on the country or ISP. Is there a prototype Plugin that I could use for doing parsing/looking up the URI’s hostname? Since I’m using a local database without network access, it could happen synchronously… The standard SA URIBL.pm ? put your data in a local NS instance (rbldnsd, bind, whatever you prefer) Second URIBL.pm. For small sites it would be nice if it supported specifying a netblock explicitly in the rule. If you're only doing a few that would be easier than setting up a zone or rbldnsd. You might look at extending URIBL.pm to do that. I’m happy to try doing that, since I know Perl and need this… I’m just lacking on the expertise about doing SA modules… Anyone want to walk me through it? Ths URIBL module is already there. If you know Perl it should be fairly easy to look at the existing code and add a variant where it accepts a netblock spec instead of a URIBL hostname and does the IP comparison to that rather than performing a DNS query... -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Gun Control laws cannot reduce violent crime, because gun control laws focus obsessively on a tool a criminal might use to commit a crime rather than the criminal himself and his act of violence. --- 739 days since the first successful private support mission to ISS (SpaceX)
Re: add_header all Date of Scan _DATE_
On Mon, 2014-06-09 at 05:49 +0200, Karsten Bräckelmann wrote: Found the culprit after some digging. Bug 6915 [1], revision 1453407. As a band-aid, the following trivial one-line patch fixes it. Can easily be applied manually. Since it is kind of way past getting late here, and there may be other Template Tags affected, I'll defer proper bug handling and committing code changes for tomorrow. Bug 7050 [1]. Fixed in trunk, to be committed to 3.4 branch after RTC mode review and voting. While the quick fix I posted yesterday does work, it does so only because all occurrences want the current time formatted. It will not work in general for other dates than now (which SA does not use with that function). A proper M::SA::PerMsgStatus.pm fix can be found in bug 7050 comment 1, linked to the svn revision. [1] https://issues.apache.org/SpamAssassin/show_bug.cgi?id=7050 -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: add_header all Date of Scan _DATE_
On Mon, 2014-06-09 at 09:23 +0200, Matus UHLAR - fantomas wrote: On 09.06.14 05:49, Karsten Bräckelmann wrote: Found the culprit after some digging. Bug 6915 [1], revision 1453407. As a band-aid, the following trivial one-line patch fixes it. Can easily be applied manually. can that by any chance fix problem with Date: in mail received by SSL ? That one behaves similarly... http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk No, these are unrelated. The code change mentioned above affects Templates Tags only. And while a date-string related function is involved in this issue, the underlying bug is calling that function with a bad argument. Besides, all instances of calling that function are now correct. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Can't keep up with spam from SolarVPS sites
On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote: On Mon, 9 Jun 2014, Philip Prindeville wrote: http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol If it's in an HTML anchor tag the URL itself isn't in the body text, only the display label will be. Try a uri rule. This URL is already in my AC_SPAMMY_URI template group, though I don't know if this particular one has been released or not (I never sent an update since the first batch a few months ago), and even if so the current version would not have caught it due to being a bit too restrictive. Try this: uri __AC_LONGSTRS_URI /\/[0-9]{8}(?:\/[a-z0-9_~]{50,}){3}\b/ Score as desired (I assign 3 points to all AC_SPAMMY_URI templates, but the released ones score differently). --- Amir
Re: add_header all Date of Scan _DATE_
On Tue, 2014-06-10 at 02:03 +0200, Karsten Bräckelmann wrote: On Mon, 2014-06-09 at 09:23 +0200, Matus UHLAR - fantomas wrote: can that by any chance fix problem with Date: in mail received by SSL ? That one behaves similarly... http://mail-archives.apache.org/mod_mbox/spamassassin-users/201401.mbox/20140131144406.GA28818%40fantomas.sk No, these are unrelated. The code change mentioned above affects Templates Tags only. [...] Moreover, that sample shows SA 3.3.2. The bad Date Template Tag is strictly 3.4 and trunk. I've run the headers (after manually fixing that horribly mis-formatted paste) through a 3.3 test environment and could not reproduce DATE_IN_FUTURE rules firing. We will need a proper sample. Since the check_for_shifted_date() eval works with the actual Date and Received headers, I suspect the glue to result in that rule's misfiring. -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Can't keep up with spam from SolarVPS sites
On Mon, 9 Jun 2014, Amir Caspi wrote: On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote: On Mon, 9 Jun 2014, Philip Prindeville wrote: http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol If it's in an HTML anchor tag the URL itself isn't in the body text, only the display label will be. Try a uri rule. This URL is already in my AC_SPAMMY_URI template group, though I don't know if this particular one has been released or not (I never sent an update since the first batch a few months ago), and even if so the current version would not have caught it due to being a bit too restrictive. Try this: uri __AC_LONGSTRS_URI /\/[0-9]{8}(?:\/[a-z0-9_~]{50,}){3}\b/ Score as desired (I assign 3 points to all AC_SPAMMY_URI templates, but the released ones score differently). --- Amir Just beware of FPs, I've seen some ugly URLs from things like airline reservation confirmations. (spammers are getting better at stealing features from legit messages to protect their garbage). Also be aware that you cannot set the score for the rule __AC_LONGSTRS_URI at all (as it's an indirect rule and thus scoreless), you'll either have to rename it or use it in a meta rule. -- Dave Funk University of Iowa dbfunk (at) engineering.uiowa.eduCollege of Engineering 319/335-5751 FAX: 319/384-0549 1256 Seamans Center Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527 #include std_disclaimer.h Better is not better, 'standard' is better. B{
Re: Can't keep up with spam from SolarVPS sites
On Jun 9, 2014, at 4:25 PM, John Hardin jhar...@impsec.org wrote: On Mon, 9 Jun 2014, Philip Prindeville wrote: We’re getting a lot of spam that contains URL’s which look like (remove the ): http://mabsut.com/20220362/vuxtxumsrnsst6unlornt3umtfuwznvv~5v0nmro0ysnx_u_usqzxsrwlln_t_t_tomtdyumplnl_ts_tn_ttce/unnt7uqs_mrn_ttdfw3yuw_h_03xo_gl_67_8gw_buutxveumpomte3yuo_tlltcx3yumsrnsstziaumte3umm/lst0x0ut0xut7eunty1um_ttf1umnrt2utezdeuteutyutw2utv3utvaut0u_0czz_xz66_a298zty8ux97xvd/e_o8zetdy97utd3aut09ultcdaumtd3un_unsrrtw3utwv8utweut80utecegutfnutaeut263yutdzeumt9cul_ol BTW, I found that the last N characters of the above URL’s were always the same, and tried to do a “body” rule based on those last N characters, but I couldn’t get the rule to match. Still not sure why. The entire a ... sequence is only 382 characters long. Any ideas? If it's in an HTML anchor tag the URL itself isn't in the body text, only the display label will be. Try a uri rule. Thanks, that did it. -Philip
Re: Forged yahoo and mass mailers
Hi, is enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it to be marked as spam. Scores of 1.63 and 2.5 respectively, according to your sample. With a total score of 6.995, it is the latter one pushing it over the 5.0 threshold, not the first one. Moreover, the responsible rule is NOT stock SA. The obvious L local prefix should be a clear hint. You defined it as from yahoo, but not DKIM valid. For amusement, search google for UNVERIFIED_YAHOO (and insist you really mean it literally with the underscore rather than two words). Yahoo uses DKIM and this wasn't signed. Funnily enough, that's a quote from a bug report back April 2007. Actually the OP closing its own report as not a bug. This was a set of rules created by Mark back in 2011. Thanks for not flaming me. Is there something I'm missing, or is there a better way to do this to avoid the FPs in the future? If by doing this you mean writing a safer variant of your local rule, you should have (a) clearly stated it's a local rule, and (b) pasted the complete current version of that local rule. By making us chase your local rules in archives, all you'll get is fingers pointing at your own, local rule. I never intended to do that. I completely forgot this was a local rule. I've disabled it for now, pending any words of wisdom on improving it from those more knowledgeable than myself. header __L_ML1 Precedence =~ m{\b(list|bulk)\b}i header __L_ML2 exists:List-Id header __L_ML3 exists:List-Post header __L_ML4 exists:Mailing-List header __L_HAS_SNDR exists:Sender meta __L_VIA_ML__L_ML1 || __L_ML2 || __L_ML3 || __L_ML4 || __L_HAS_SNDR header __L_FROM_Y1 From:addr =~ m{[@.]yahoo\.com$}i header __L_FROM_Y2 From:addr =~ m{\@yahoo\.com\.(ar|br|cn|hk|mx|my|ph|sg)$}i header __L_FROM_Y3 From:addr =~ m{\@yahoo\.co\.(id|in|jp|nz|th|uk)$}i header __L_FROM_Y4 From:addr =~ m{\@yahoo\.(ca|cn|de|dk|es|fr|gr|ie|it|pl|ru|se)$}i meta __L_FROM_YAHOO __L_FROM_Y1 || __L_FROM_Y2 || __L_FROM_Y3 || __L_FROM_Y4 header __L_FROM_GMAIL From:addr =~ m{\@gmail\.com$}i meta L_UNVERIFIED_YAHOO !DKIM_VALID !DKIM_VALID_AU __L_FROM_YAHOO !__L_VIA_ML priority L_UNVERIFIED_YAHOO 500 scoreL_UNVERIFIED_YAHOO 2.5 meta L_UNVERIFIED_GMAIL !DKIM_VALID !DKIM_VALID_AU __L_FROM_GMAIL !__L_VIA_ML priority L_UNVERIFIED_GMAIL 500 scoreL_UNVERIFIED_GMAIL 2.5 Thanks, Alex
Re: Can't keep up with spam from SolarVPS sites
On Jun 9, 2014, at 7:11 PM, David B Funk dbf...@engineering.uiowa.edu wrote: Just beware of FPs, I've seen some ugly URLs from things like airline reservation confirmations. (spammers are getting better at stealing features from legit messages to protect their garbage). FWIW, I haven't had a single FP on that or any of my other AC rules... but, that's only been tested on ham and spam for myself and my limited user base. An FP could, in principle, happen. Also be aware that you cannot set the score for the rule __AC_LONGSTRS_URI at all (as it's an indirect rule and thus scoreless), you'll either have to rename it or use it in a meta rule. Indeed, I use this as part of a meta for AC_SPAMMY_URIs, so if you're using it standalone, remove the underscores. --- Amir
Re: Forged yahoo and mass mailers
Hi, On Mon, Jun 9, 2014 at 11:27 AM, Kevin A. McGrail kmcgr...@pccc.com wrote: On 6/8/2014 10:49 PM, Alex wrote: I have a few messages that have been incorrectly tagged because the sender used their yahoo address as the sender, but used a mass mailer ( contactbeacon.com) to send their newsletter for them. Apparently this is enough for it to hit FORGED_YAHOO_RCVD and L_UNVERIFIED_YAHOO, causing it to be marked as spam. Is there something I'm missing, or is there a better way to do this to avoid the FPs in the future? People with Yahoo! accounts (and AOL) and any other senders that have a DMARC policy of reject/quarantine need to use either A) a mailing list sender that has modified their process for DMARC or B) not use those accounts. See http://www.pcworld.com/article/2141120/yahoo-email-antispoofing-policy-breaks-mailing-lists.html Great information, thanks so much guys. It looks like it would be better to reject the p=reject DKIM at SMTP time, no? Thanks, Alex
auto-learn
Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Thanks Chris -- Chris KeyID 0xE372A7DA98E6705C 31.11°N 97.89°W (Elev. 1092 ft) 21:38:18 up 7 days, 6:08, 1 user, load average: 0.53, 0.45, 0.34 Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb
Re: Forged yahoo and mass mailers
On Mon, 2014-06-09 at 21:40 -0400, Alex wrote: For amusement, search google for UNVERIFIED_YAHOO (and insist you really mean it literally with the underscore rather than two words). This was a set of rules created by Mark back in 2011. Thanks for not flaming me. Heh. ;) Sorry, but I kind of expect some due diligence, in particular by long time and experienced community members. Coming across blatantly obvious cases of local rules being complained about to misfire might make me snappy. Think about it this way: In order to help you, my first step is to find out details about those rules (grep stock cf files) and their respective score (your sample). You provided an exemplary, flawless sample. Why did you not have a look at the rules' sources? By making us chase your local rules in archives, all you'll get is fingers pointing at your own, local rule. I never intended to do that. I completely forgot this was a local rule. I've disabled it for now, pending any words of wisdom on improving it from those more knowledgeable than myself. The rule itself was not that bad. Actually, as Kevin and Anthony pointed out, Yahoo even expressly states in their DMARC records you should never have genuinely received those messages, nor accepted them. Yahoo classifies it forged. It is the mass mailer's and its client's fault. (Back to the cheap part. Doing mass mailings but don't own your own domain? Accepting and actually using free-mailer address as sender? Even worse, failing to get the note about Yahoo DMARC policy in that business?) -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: auto-learn
On Mon, 2014-06-09 at 21:40 -0500, Chris wrote: Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Answering the direct questions first: Yes, that is correct syntax. No, you don't need them (commented out), they are default. An auto-learning setup generally isn't a bad idea, and actually default. Depending on your amount of messages, you might want to have a look at the recent train-on-error option. If (since) there was any need to wipe your old Bayes DB and start fresh, I seriously recommend continued manual training. And in any case, always (manually) training spam with low-ish Bayes probability. Likewise for ham that doesn't already have a very low Bayes probability. In non-high-volume environments, there's hardly any down-side on training the extremes, too. Learning hand-confirmed non-extremes is always worth it. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html -- char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Forged yahoo and mass mailers
Hi, This was a set of rules created by Mark back in 2011. Thanks for not flaming me. Heh. ;) Sorry, but I kind of expect some due diligence, in particular by long time and experienced community members. Coming across blatantly obvious cases of local rules being complained about to misfire might make me snappy. Think about it this way: In order to help you, my first step is to find out details about those rules (grep stock cf files) and their respective score (your sample). You provided an exemplary, flawless sample. Why did you not have a look at the rules' sources? It really was a temporary lapse. I'm now managing so much, and thought for sure it was an SA rule since I didn't immediately recognize it. Also, my local rules all begin with LOC_, or immediately recognizable KAM_ or AXB_. The rule itself was not that bad. Actually, as Kevin and Anthony pointed out, Yahoo even expressly states in their DMARC records you should never have genuinely received those messages, nor accepted them. Yahoo classifies it forged. It is the mass mailer's and its client's fault. (Back to the cheap part. Doing mass mailings but don't own your own domain? Accepting and actually using free-mailer address as sender? Even worse, failing to get the note about Yahoo DMARC policy in that business?) Great points. I've found the rule's hit a very large amount of ham, even some that's been whitelisted. Investigating a bit further, it appears to hit quite a few messages that indeed pass through yahoo.com. I've included one such example set of headers here: http://pastebin.com/XiHpRbJb However, it doesn't have the p=reject DKIM auth statement, so I don't yet fully understand how it all works. It hit DKIM_SIGNED but not DKIM_VALID, and in fact hit T_DKIM_INVALID. Thanks, Alex
Re: auto-learn
On Tue, 2014-06-10 at 05:13 +0200, Karsten Bräckelmann wrote: On Mon, 2014-06-09 at 21:40 -0500, Chris wrote: Since having to wipe my bayes db I've thought about going back to having 'auto-learn' setup for awhile. It's been so long since I did this I have a fairly dumb question. Do I need the two below lines to be set and if so is this the correct setting? Anything here about a score of 5 is considered spam. # bayes_auto_learn_threshold_nonspam 0.1 # bayes_auto_learn_threshold_spam 12.0 Answering the direct questions first: Yes, that is correct syntax. No, you don't need them (commented out), they are default. An auto-learning setup generally isn't a bad idea, and actually default. Depending on your amount of messages, you might want to have a look at the recent train-on-error option. If (since) there was any need to wipe your old Bayes DB and start fresh, I seriously recommend continued manual training. And in any case, always (manually) training spam with low-ish Bayes probability. Likewise for ham that doesn't already have a very low Bayes probability. In non-high-volume environments, there's hardly any down-side on training the extremes, too. Learning hand-confirmed non-extremes is always worth it. [1] http://spamassassin.apache.org/doc/Mail_SpamAssassin_Plugin_AutoLearnThreshold.html Thanks very much Karsten for the quick reply. -- Chris KeyID 0xE372A7DA98E6705C 31.11°N 97.89°W (Elev. 1092 ft) 22:18:08 up 7 days, 6:48, 1 user, load average: 0.56, 0.49, 0.61 Mandriva Linux 2010.2, kernel 2.6.33.7-desktop586-2mnb
Re: DMARC policy check with AskDNS posible?
On Jun 7, 2014, at 9:49 PM, Christian Laußat us...@spamassassin.shambhu.info wrote: Am 07.06.2014 19:55, schrieb Franck Martin: As DMARC provide a feedback mechanism to the sender, then it is up to the sender to deal with these issues, you are just following their policy, you don’t need to or have to to second guess them. You can use some whitelists in openDMARC for some streams you really care about, like mailing lists. There are usually not too many. The default option of openDMARC is to not reject, as to avoid if you forgot opendkim or spf, and start to reject all the incoming mail… Once you are happy with the config, you ought to change that option. The problem is that the sender is not the postmaster, so if e.g. yahoo.com had changed its policy to p=reject, many sender had been blocked without even knowing why. There are many postmaster who think they understood DMARC and set a wrong policy. For human interaction DMARC policy should be p=none. And p=reject should only be used for automatic mailing systems e.g. shopping systems and banks. This is not correct. I think it is strange to claim that yahoo or aol, being a co-creator of DMARC and having outstanding engineers in the profession do not know what they are doing. So it's your decision if you would risk to loose some e-mail, but for me it is a just another indicator for SpamAssassin to rate the mail. Because of the monitoring mode, when you move to p=reject, with all the aggregate reports, you know exactly how much mail you will loose. As you take control of your email streams it becomes a sweet point where fixing exact domain spoofing is more interesting than losing some emails. Your mileage may vary. If you let OpenDMARC block on policy failures, why don't you let OpenDKIM block on DKIM failures and SPF-milter on SPF failures? Blocking on only one criteria leads to many false positives. That's the power of SpamAssasin, to combine many rating points and then decide if it*s spam or not. DKIM and SPF do not have a reporting to the sender to tell them how many emails were blocked/rejected. DKIM does not have a policy method, only SPF. So as a sender with SPF -all you have no idea how many emails are blocked, very few are willing to take that risk. With DMARC, you know exactly which emails are getting blocked/rejected. signature.asc Description: Message signed with OpenPGP using GPGMail