Re: Harvested Fresh .cn URIBL
Warren Togami wrote: http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail A very sizeable amount of spam (currently 50%) contains .cn domains that were registered very recently. They keep registering new domains in order to keep ahead of the URIBL's. I have an account here that gets a lot of spam. There have been 263 unique .cn domain names contained within urls in spam message bodies of that account today. All but 94 of them were listed in uribl or surbl. If I do http requests on http://thedomain/ for each of those domains, every single one of the pages returned for all of those domains matches one of the following two regexes: link [^]*href=/themes/express/img/pharmacyexpress\.ico [^]* titlePrestige Replicas : Luxury at affordable prices!/title I wrote a module a while ago when the groups.yahoo.com spam was happening which pulled down those pages and found that every single one of them contained html like this: font color=red size=6bCLICK HERE TO ENTER!/b/font/a I've updated it to do http requests on the .cn domains now too. It uses memcache to avoid repeated requests for the same websites. This is usually the point where someone asks for the source code, even though it's not fully ready for other people to use, so I've temporarily stuck it up at https://secure.grepular.com/WebsiteScanner/ in case anyone wants to pick it a part and use bits of it. -- Mike Cardwell - IT Consultant and LAMP developer Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/
Harvested Fresh .cn URIBL
http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail A very sizeable amount of spam (currently 50%) contains .cn domains that were registered very recently. They keep registering new domains in order to keep ahead of the URIBL's. http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_EIGHT/detail Last month, I noticed that a very sizeable percentage of the .cn spam were fresh and random \w{8}.cn domain names. http://ruleqa.spamassassin.org/20091007-r822624-n/T_CN_SEVEN/detail I don't know if it was due to our discussion here, but for whatever reason I began seeing new spam with \w{7}.cn domains registered since October 3rd, and \w{8}.cn seems to be tapering off now. http://spameatingmonkey.com/lists.html#SEM-FRESH \w{8}.cn or any length is unsafe to be used as a real rule. The only safe way to detect these fresh .cn domains would be a URIBL. But URIBL's like SEM-FRESH described here are only capable of knowing new domains of TLD's who provide zone files that can be compared. It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. A targeted URIBL verified by whois for registration dates would be near 100% accurate and deserving of a high score. This would hopefully break the economic feasibility of .cn URI spam by rendering fresh domains quickly useless. This could be a new URIBL, or an existing URIBL. If this is an existing URIBL, spamassassin can use meta rules to boolean match .cn domains and assign a higher score. Example: meta FRESHCN_7 SOME_URIBL CN_URL score FRESHCN_7 0 4.0 0 4.0 Spam Trap Workflow == 1. Spam trap receives spam containing .cn URI. 2. Lookup locally, is this .cn domain already known? 3. If already known, stop. 4. Lookup A record of this domain. If NXDOMAIN stop. 5. Record domain in database with UNKNOWN registration date. URIBL Generation Workflow = 1. If domain has UNKNOWN registration date, attempt whois lookup. Record registration date if found. 2. Ignore all UNKNOWN records. 3. Dump all domains registered in the last 7 days into one zone. score FRESHCN_7 0 4.0 0 4.0 4. Dump all domains registered in the last 14 days into another zone. score FRESHCN_14 0 2.0 0 2.0 5. Stop listing anything older than 14 days. By then the regular URIBL's have listed these domains. 6. Do not delete older .cn domains. Keeping them in the database prevents redundant whois lookups later. The only challenging part here is whois lookup rate limiting. whois lookups are critical to populating this URIBL, but it is a resource that can only be used in small quantities. The above workflow attempts to minimize the number of whois lookups. Given that only spammers would send mail to a trap, the number of .cn domain names might be small enough to handle whois lookups. The goal here is to break the economic model. I'm told that .cn domains cost $3-10/each to register, and whois lookups are certainly cheaper to automate. I can't find a published whois rate limit for CNNIC. In any case, it wouldn't be difficult for us to proxy whois lookups to bypass rate limits should that become necessary. Opinions of this proposal? Is anyone from PSBL, HOSTKARMA, or SEM interested? Warren Togami wtog...@redhat.com
Re: Harvested Fresh .cn URIBL
On 10/7/2009 5:00 PM, Warren Togami wrote: It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. you haven't really looked into Spamhaus data SA rules, have you? have you looked into SURBL/URIBL's data datafeeds? what's the deal here? do you not represent RedHat? have no acccess to RH spam data? just can't imagine RH can't provide itself or the community with plenty of spam data. Opinions of this proposal? sorry - imo, reinventing the wheeel some time too late
Re: Harvested Fresh .cn URIBL
Hi Warren! It seems then the only way to feed a URIBL fresh .cn domains would be a spam trap. This proposed URIBL would be extremely easy to build on the infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. My own volume of spam is too small to do this. you haven't really looked into Spamhaus data SA rules, have you? have you looked into SURBL/URIBL's data datafeeds? what's the deal here? do you not represent RedHat? have no acccess to RH spam data? just can't imagine RH can't provide itself or the community with plenty of spam data. Opinions of this proposal? How many whois lookup you think you can do a day till you get blocked? If you can get a copy of the CN root zone existing blacklists can do the work for you. If not, give the current blacklists like SURBL and URIBL a bit more credit. We are doing that exact same thing. You explain or reinvent the DOB list. Your method is loose and dangerous. While other blacklists have allready mechanisms in place to avoid false positives. We are working on getting .CN zone access. Thats the only way to speed things up. The only challenging part is to get a copy of the CN zone just like we get copy's of other ccTLD/gTLD's. What you describe isnt new and isnt exiting. For me its daily things. Its routine. Check SURBL and URIBL. Try to understand how those lists work. The last 48 hour we added 1211 .CN domains into SURBL. Check for example how URIBL GOLD works. Remeber that a large part of the filtering tips and tricks is not to talk about what you are doing, just do it. Telling in detail how you do things will only give spammers and advantage and doesnt bring the community any good. You seem to have just entered the SA work, and like to help, thats good but dont end up making a whole lot of noise. If RH is sserious about this, attend conferences like MAAWG and talk with people there, talk with the blacklist guys, many are on those events. Dont flood people on the user list please. Most likely there are better lists to start talks like this. Bye, Raymond.
Re: Harvested Fresh .cn URIBL
On 10/07/2009 11:27 AM, Raymond Dijkxhoorn wrote: We are working on getting .CN zone access. Thats the only way to speed things up. The only challenging part is to get a copy of the CN zone just like we get copy's of other ccTLD/gTLD's. OK, I was under the impression that it was impossible to obtain zone access. If this happens then great! Warren
Re: Harvested Fresh .cn URIBL
Warren Togami wrote: Opinions of this proposal? I would love to have a listing of recently registered .cn domains but until the TLD operator starts working with us that just isn't going to happen. Trying to perform a whois lookup on every domain is painfully slow. Once you get a high enough volume of .cn domains detected it will become impossible and that is assuming you are never rate limited. On top of that, most of the time when I do a whois lookup on a .cn domain I find the destination whois server to be unresponsive, stuck in a maintenance mode or doesn't include any data except the domain name and the listed nameservers. Spam from .cn domains can be mitigated with the right rules and querying multiple lists. I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. --Blaine
Re: Harvested Fresh .cn URIBL
Blaine Fleming wrote: I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Which brings up an interesting idea. I wonder how many legit non-spam .cn domains exist? Surely it is a fraction of a percent of the # of .cn domains used for spam purposes, correct? If that assurtion is right, then maybe it makes more sense to build a really good and comprehensive .cn whitelist. Then create a rule in SA whereby .cn domains not on that whitelist would add a point or two to the score. (it shouldn't be used to outright block due to the guilty until proven innocent stance!...and it shouldn't be a default SA rule) I might be interested in maintaining such a freely available list--accessible via rsync at no charge--if someone else would come up with the SA rule or plugin. My unique contribution could be providing a means for my own invaluement URI ratings engine to rate potential candidates for whitelisting--this would separate most of the wheat from the chaff with little effort--just as long as the entries submitted was kept to a reasonably low volume. -- Rob McEwen http://dnsbl.invaluement.com/ r...@invaluement.com +1 (478) 475-9032
Re: Harvested Fresh .cn URIBL
Blaine Fleming wrote: Warren Togami wrote: Opinions of this proposal? I would love to have a listing of recently registered .cn domains but until the TLD operator starts working with us that just isn't going to happen. Trying to perform a whois lookup on every domain is painfully slow. Once you get a high enough volume of .cn domains detected it will become impossible and that is assuming you are never rate limited. On top of that, most of the time when I do a whois lookup on a .cn domain I find the destination whois server to be unresponsive, stuck in a maintenance mode or doesn't include any data except the domain name and the listed nameservers. Spam from .cn domains can be mitigated with the right rules and querying multiple lists. I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. Terry
Re: Harvested Fresh .cn URIBL
On Wed, 7 Oct 2009, Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Of the twenty-two civilizations that have appeared in history, nineteen of them collapsed when they reached the moral state the United States is in now. -- Arnold Toynbee --- 6 days since a sunspot last seen - EPA blames CO2 emissions
Re: Harvested Fresh .cn URIBL
Spam from .cn domains can be mitigated with the right rules and querying multiple lists. I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Indeed, spam with .cn URIs really doesn't appear to be a problem at all. They are well covered by the existing URI DNSBLs -- which are doing an awesome job, btw -- and the rest of the SA rules. There is no value in additional rules that catch anyway high scoring spam. It's the low scorers that need our attention. Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? It has been pointed out before, but it still is kind of funny, how this thread re-invents existing techniques and re-iterates the very same, often discussed problems. And keeps doing so. I haven't seen anything new so far. The design of a BL is pretty off-topic here anyway, even more so on the users list. guenther -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Harvested Fresh .cn URIBL
On Wed, 2009-10-07 at 14:01 -0400, Rob McEwen wrote: I might be interested in maintaining such a freely available list--accessible via rsync at no charge--if someone else would come up with the SA rule or plugin. I'd be interested in giving this a shot, as a proof-of-concept. Any details and further discussion off-list. -- char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1: (c=*++x); c128 (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Harvested Fresh .cn URIBL
John Hardin wrote: On Wed, 7 Oct 2009, Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. Terry
Re: Harvested Fresh .cn URIBL
Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How do you determine age? Whois queries really can't be used because of the reasons I mentioned in my previous post. I guess one option is to record the date that the domain was first seen anywhere and then work from that but what about the domains that are rarely used? One of the big problems I see with trying to look at .cn domains in the wild is the lack of data. How many people here deal with a large volume of mail that would have legitimate .cn domains? I seem to remember recently seeing one of the big blacklists not having enough non-english mail to work with. --Blaine
Re: Harvested Fresh .cn URIBL
On Wed, 7 Oct 2009, Terry Carmen wrote: John Hardin wrote: On Wed, 7 Oct 2009, Terry Carmen wrote: Instead of blacklisting new domains (which is apparently difficult to do), why not blacklist all .cn domains (or simply all domains) newer than xxx days? If they're older than xxx days and not yet on another blacklist for sending actual spam, return a neutral response. How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- We should endeavour to teach our children to be gun-proof rather than trying to design guns to be child-proof --- 6 days since a sunspot last seen - EPA blames CO2 emissions
Re: Harvested Fresh .cn URIBL
John Hardin wrote: The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them.
Re: Harvested Fresh .cn URIBL
On 10/07/2009 03:29 PM, Jason Bertoch wrote: John Hardin wrote: The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them. They have no reason to help others filter spam. They have a competitive advantage. They would prefer everyone to use gmail or pay for their commercial spam filtering service for non-gmail. Warren
Re: Harvested Fresh .cn URIBL
On 10/7/2009 8:01 PM, Rob McEwen wrote: Blaine Fleming wrote: I know my users never see .cn domains in their inbox and if I didn't run a blacklist I wouldn't either. Which brings up an interesting idea. I wonder how many legit non-spam ..cn domains exist? Surely it is a fraction of a percent of the # of .cn domains used for spam purposes, correct? nope.. there are zillions and growing. same thought could apply to .com or .org or .net for chinese mom pop users. and its still pretty off-topic in the SA users list and better placed in spam-l.com
Re: Harvested Fresh .cn URIBL
Hi! The other part of the problem is determining the age of a domain. The only way to do that absent a registrar feed is to do a whois query, which may or may not return the data you need, and which is considered abusive when automated and done often. It would be nice if Google could help out here...if anyone besides the registrar knows when a domain first shows up on the net, it's them. They have no reason to help others filter spam. They have a competitive advantage. They would prefer everyone to use gmail or pay for their commercial spam filtering service for non-gmail. Warren, its a pretty silly statement. You are aware that google helps out a lot of blacklists with data? Appearantly not. Bye, Raymond.
Re: Harvested Fresh .cn URIBL
Hi! How does that simplify the problem? The difficulty is in getting data about when a domain was created. I thought the problem was in getting data for recently created domains, not all domains. If it's a problem with all domains, this won't help at all. If you rely on whois data, and you do this for .nl you will get fooled! The NL whois only shows the date that the domain was -first- registered. If its expired and a year later someone registers that same domain, so it would be new for people, the whois shows the date the first registration was done. If you want reliable data, especially for .cn, .hk and so on you need to get a relation with those registry's. And so far (welcome to communism) not many people succeeded there. Bye, Raymond.