Re: Harvested Fresh .cn URIBL

2009-10-08 Thread Mike Cardwell

Warren Togami wrote:


http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail
A very sizeable amount of spam (currently 50%) contains .cn domains that 
were registered very recently.  They keep registering new domains in 
order to keep ahead of the URIBL's.


I have an account here that gets a lot of spam. There have been 263 
unique .cn domain names contained within urls in spam message bodies of 
that account today. All but 94 of them were listed in uribl or surbl.


If I do http requests on http://thedomain/ for each of those domains, 
every single one of the pages returned for all of those domains matches 
one of the following two regexes:


link [^]*href=/themes/express/img/pharmacyexpress\.ico [^]*
titlePrestige Replicas : Luxury at affordable prices!/title

I wrote a module a while ago when the groups.yahoo.com spam was 
happening which pulled down those pages and found that every single one 
of them contained html like this:


font color=red size=6bCLICK HERE TO ENTER!/b/font/a

I've updated it to do http requests on the .cn domains now too. It uses 
memcache to avoid repeated requests for the same websites.


This is usually the point where someone asks for the source code, even 
though it's not fully ready for other people to use, so I've temporarily 
stuck it up at https://secure.grepular.com/WebsiteScanner/ in case 
anyone wants to pick it a part and use bits of it.


--
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/


Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail
A very sizeable amount of spam (currently 50%) contains .cn domains that 
were registered very recently.  They keep registering new domains in 
order to keep ahead of the URIBL's.


http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_EIGHT/detail
Last month, I noticed that a very sizeable percentage of the .cn spam 
were fresh and random \w{8}.cn domain names.


http://ruleqa.spamassassin.org/20091007-r822624-n/T_CN_SEVEN/detail
I don't know if it was due to our discussion here, but for whatever 
reason I began seeing new spam with \w{7}.cn domains registered since 
October 3rd, and \w{8}.cn seems to be tapering off now.


http://spameatingmonkey.com/lists.html#SEM-FRESH
\w{8}.cn or any length is unsafe to be used as a real rule.  The only 
safe way to detect these fresh .cn domains would be a URIBL.  But 
URIBL's like SEM-FRESH described here are only capable of knowing new 
domains of TLD's who provide zone files that can be compared.


It seems then the only way to feed a URIBL fresh .cn domains would be a 
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or 
SEM.  My own volume of spam is too small to do this.


A targeted URIBL verified by whois for registration dates would be near 
100% accurate and deserving of a high score.  This would hopefully break 
the economic feasibility of .cn URI spam by rendering fresh domains 
quickly useless.  This could be a new URIBL, or an existing URIBL.  If 
this is an existing URIBL, spamassassin can use meta rules to boolean 
match .cn domains and assign a higher score.  Example:


meta FRESHCN_7 SOME_URIBL  CN_URL
score FRESHCN_7 0 4.0 0 4.0

Spam Trap Workflow
==
1. Spam trap receives spam containing .cn URI.
2. Lookup locally, is this .cn domain already known?
3. If already known, stop.
4. Lookup A record of this domain.  If NXDOMAIN stop.
5. Record domain in database with UNKNOWN registration date.

URIBL Generation Workflow
=
1. If domain has UNKNOWN registration date, attempt whois lookup.
   Record registration date if found.
2. Ignore all UNKNOWN records.
3. Dump all domains registered in the last 7 days into one zone.
   score FRESHCN_7 0 4.0 0 4.0
4. Dump all domains registered in the last 14 days into another zone.
   score FRESHCN_14 0 2.0 0 2.0
5. Stop listing anything older than 14 days.  By then the regular 
URIBL's have listed these domains.
6. Do not delete older .cn domains.  Keeping them in the database 
prevents redundant whois lookups later.


The only challenging part here is whois lookup rate limiting.  whois 
lookups are critical to populating this URIBL, but it is a resource that 
can only be used in small quantities.  The above workflow attempts to 
minimize the number of whois lookups.


Given that only spammers would send mail to a trap, the number of .cn 
domain names might be small enough to handle whois lookups.  The goal 
here is to break the economic model.  I'm told that .cn domains cost 
$3-10/each to register, and whois lookups are certainly cheaper to 
automate.  I can't find a published whois rate limit for CNNIC.  In any 
case, it wouldn't be difficult for us to proxy whois lookups to bypass 
rate limits should that become necessary.


Opinions of this proposal?

Is anyone from PSBL, HOSTKARMA, or SEM interested?

Warren Togami
wtog...@redhat.com


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Yet Another Ninja

On 10/7/2009 5:00 PM, Warren Togami wrote:
  It seems then the only way to feed a URIBL fresh .cn domains would be a
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or 
SEM.  My own volume of spam is too small to do this.


you haven't really looked into Spamhaus data  SA rules, have you?
have you looked into SURBL/URIBL's data  datafeeds?

what's the deal here? do you not represent RedHat?
have no acccess to RH spam data?

just can't imagine RH can't provide itself or the community with plenty 
of spam data.



Opinions of this proposal?


sorry - imo, reinventing the wheeel some time too late




Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi Warren!


It seems then the only way to feed a URIBL fresh .cn domains would be a
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. 
My own volume of spam is too small to do this.



you haven't really looked into Spamhaus data  SA rules, have you?
have you looked into SURBL/URIBL's data  datafeeds?

what's the deal here? do you not represent RedHat?
have no acccess to RH spam data?

just can't imagine RH can't provide itself or the community with plenty of 
spam data.



Opinions of this proposal?


How many whois lookup you think you can do a day till you get blocked?

If you can get a copy of the CN root zone existing blacklists can do the
work for you. If not, give the current blacklists like SURBL and URIBL a
bit more credit. We are doing that exact same thing. You explain or
reinvent the DOB list. Your method is loose and dangerous. While other
blacklists have allready mechanisms in place to avoid false positives.

We are working on getting .CN zone access. Thats the only way to speed
things up. The only challenging part is to get a copy of the CN zone just 
like we get copy's of other ccTLD/gTLD's.


What you describe isnt new and isnt exiting. For me its daily things. Its
routine. Check SURBL and URIBL. Try to understand how those lists work.

The last 48 hour we added 1211 .CN domains into SURBL.
Check for example how URIBL GOLD works.

Remeber that a large part of the filtering tips and tricks is not to talk
about what you are doing, just do it. Telling in detail how you do things
will only give spammers and advantage and doesnt bring the community any
good.

You seem to have just entered the SA work, and like to help, thats good 
but dont end up making a whole lot of noise.


If RH is sserious about this, attend conferences like MAAWG and talk 
with people there, talk with the blacklist guys, many are on those 
events.


Dont flood people on the user list please. Most likely there are better 
lists to start talks like this.


Bye,
Raymond.


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

On 10/07/2009 11:27 AM, Raymond Dijkxhoorn wrote:

We are working on getting .CN zone access. Thats the only way to speed
things up. The only challenging part is to get a copy of the CN zone
just like we get copy's of other ccTLD/gTLD's.


OK, I was under the impression that it was impossible to obtain zone 
access.  If this happens then great!


Warren


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Blaine Fleming
Warren Togami wrote:
 Opinions of this proposal?

I would love to have a listing of recently registered .cn domains but
until the TLD operator starts working with us that just isn't going to
happen.

Trying to perform a whois lookup on every domain is painfully slow.
Once you get a high enough volume of .cn domains detected it will become
impossible and that is assuming you are never rate limited.  On top of
that, most of the time when I do a whois lookup on a .cn domain I find
the destination whois server to be unresponsive, stuck in a maintenance
mode or doesn't include any data except the domain name and the listed
nameservers.

Spam from .cn domains can be mitigated with the right rules and querying
multiple lists.  I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.

--Blaine


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Rob McEwen
Blaine Fleming wrote:
 I know my users never see .cn domains in their inbox
 and if I didn't run a blacklist I wouldn't either.

Which brings up an interesting idea. I wonder how many legit non-spam
.cn domains exist? Surely it is a fraction of a percent of the # of .cn
domains used for spam purposes, correct?

If that assurtion is right, then maybe it makes more sense to build a
really good and comprehensive .cn whitelist. Then create a rule in SA
whereby .cn domains not on that whitelist would add a point or two to
the score. (it shouldn't be used to outright block due to the guilty
until proven innocent stance!...and it shouldn't be a default SA rule)

I might be interested in maintaining such a freely available
list--accessible via rsync at no charge--if someone else would come up
with the SA rule or plugin. My unique contribution could be providing a
means for my own invaluement URI ratings engine to rate potential
candidates for whitelisting--this would separate most of the wheat from
the chaff with little effort--just as long as the entries submitted was
kept to a reasonably low volume.

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Terry Carmen

Blaine Fleming wrote:

Warren Togami wrote:
  

Opinions of this proposal?



I would love to have a listing of recently registered .cn domains but
until the TLD operator starts working with us that just isn't going to
happen.

Trying to perform a whois lookup on every domain is painfully slow.
Once you get a high enough volume of .cn domains detected it will become
impossible and that is assuming you are never rate limited.  On top of
that, most of the time when I do a whois lookup on a .cn domain I find
the destination whois server to be unresponsive, stuck in a maintenance
mode or doesn't include any data except the domain name and the listed
nameservers.

Spam from .cn domains can be mitigated with the right rules and querying
multiple lists.  I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.
  
Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains)  newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


Terry






Re: Harvested Fresh .cn URIBL

2009-10-07 Thread John Hardin

On Wed, 7 Oct 2009, Terry Carmen wrote:

Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains)  newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


How does that simplify the problem? The difficulty is in getting data 
about when a domain was created.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Of the twenty-two civilizations that have appeared in history,
  nineteen of them collapsed when they reached the moral state the
  United States is in now.  -- Arnold Toynbee
---
 6 days since a sunspot last seen - EPA blames CO2 emissions


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Karsten Bräckelmann
  Spam from .cn domains can be mitigated with the right rules and querying
  multiple lists.  I know my users never see .cn domains in their inbox
  and if I didn't run a blacklist I wouldn't either.

Indeed, spam with .cn URIs really doesn't appear to be a problem at all.
They are well covered by the existing URI DNSBLs -- which are doing an
awesome job, btw -- and the rest of the SA rules.

There is no value in additional rules that catch anyway high scoring
spam. It's the low scorers that need our attention.


 Instead of blacklisting new domains (which is apparently difficult to 
 do), why not blacklist all .cn domains (or simply all domains)  newer 
 than xxx days?

It has been pointed out before, but it still is kind of funny, how this
thread re-invents existing techniques and re-iterates the very same,
often discussed problems. And keeps doing so. I haven't seen anything
new so far.

The design of a BL is pretty off-topic here anyway, even more so on the
users list.

  guenther


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Karsten Bräckelmann
On Wed, 2009-10-07 at 14:01 -0400, Rob McEwen wrote:
 I might be interested in maintaining such a freely available
 list--accessible via rsync at no charge--if someone else would come up
 with the SA rule or plugin.

I'd be interested in giving this a shot, as a proof-of-concept. Any
details and further discussion off-list.


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Terry Carmen

John Hardin wrote:

On Wed, 7 Oct 2009, Terry Carmen wrote:

Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains) newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


How does that simplify the problem? The difficulty is in getting data 
about when a domain was created.


I thought the problem was in getting data for recently created domains, 
not all domains.


If it's a problem with all domains, this won't help at all.

Terry



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Blaine Fleming
Terry Carmen wrote:

 Instead of blacklisting new domains (which is apparently difficult to
 do), why not blacklist all .cn domains (or simply all domains)  newer
 than xxx days?
 
 If they're older than xxx days and not yet on another blacklist for
 sending actual spam, return a neutral response.


How do you determine age?  Whois queries really can't be used because of
the reasons I mentioned in my previous post.  I guess one option is to
record the date that the domain was first seen anywhere and then work
from that but what about the domains that are rarely used?

One of the big problems I see with trying to look at .cn domains in the
wild is the lack of data.  How many people here deal with a large volume
of mail that would have legitimate .cn domains?  I seem to remember
recently seeing one of the big blacklists not having enough non-english
mail to work with.

--Blaine


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread John Hardin

On Wed, 7 Oct 2009, Terry Carmen wrote:


John Hardin wrote:

 On Wed, 7 Oct 2009, Terry Carmen wrote:

  Instead of blacklisting new domains (which is apparently difficult to 
  do), why not blacklist all .cn domains (or simply all domains) newer 
  than xxx days?
 
  If they're older than xxx days and not yet on another blacklist for 
  sending actual spam, return a neutral response.


 How does that simplify the problem? The difficulty is in getting data
 about when a domain was created.


I thought the problem was in getting data for recently created domains, not 
all domains.


If it's a problem with all domains, this won't help at all.


The other part of the problem is determining the age of a domain. The only 
way to do that absent a registrar feed is to do a whois query, which may 
or may not return the data you need, and which is considered abusive when 
automated and done often.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  We should endeavour to teach our children to be gun-proof
  rather than trying to design guns to be child-proof
---
 6 days since a sunspot last seen - EPA blames CO2 emissions


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Jason Bertoch

John Hardin wrote:


The other part of the problem is determining the age of a domain. The 
only way to do that absent a registrar feed is to do a whois query, 
which may or may not return the data you need, and which is considered 
abusive when automated and done often.


It would be nice if Google could help out here...if anyone besides the 
registrar knows when a domain first shows up on the net, it's them.




Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

On 10/07/2009 03:29 PM, Jason Bertoch wrote:

John Hardin wrote:


The other part of the problem is determining the age of a domain. The
only way to do that absent a registrar feed is to do a whois query,
which may or may not return the data you need, and which is considered
abusive when automated and done often.


It would be nice if Google could help out here...if anyone besides the
registrar knows when a domain first shows up on the net, it's them.



They have no reason to help others filter spam.  They have a competitive 
advantage.  They would prefer everyone to use gmail or pay for their 
commercial spam filtering service for non-gmail.


Warren


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Yet Another Ninja

On 10/7/2009 8:01 PM, Rob McEwen wrote:

Blaine Fleming wrote:

I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.


Which brings up an interesting idea. I wonder how many legit non-spam
..cn domains exist? Surely it is a fraction of a percent of the # of .cn
domains used for spam purposes, correct?


nope.. there are zillions and growing.
same thought could apply to .com or .org or .net for chinese mom  pop 
users.


and its still pretty off-topic in the SA users list and better placed in 
spam-l.com


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi!


The other part of the problem is determining the age of a domain. The
only way to do that absent a registrar feed is to do a whois query,
which may or may not return the data you need, and which is considered
abusive when automated and done often.



It would be nice if Google could help out here...if anyone besides the
registrar knows when a domain first shows up on the net, it's them.


They have no reason to help others filter spam.  They have a competitive 
advantage.  They would prefer everyone to use gmail or pay for their 
commercial spam filtering service for non-gmail.


Warren, its a pretty silly statement.

You are aware that google helps out a lot of blacklists with data?
Appearantly not.

Bye,
Raymond.


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi!

How does that simplify the problem? The difficulty is in getting data about 
when a domain was created.


I thought the problem was in getting data for recently created domains, not 
all domains.


If it's a problem with all domains, this won't help at all.


If you rely on whois data, and you do this for .nl you will get fooled!
The NL whois only shows the date that the domain was -first- registered.

If its expired and a year later someone registers that same domain, so it 
would be new for people, the whois shows the date the first registration 
was done.


If you want reliable data, especially for .cn, .hk and so on you need to 
get a relation with those registry's. And so far (welcome to communism) 
not many people succeeded there.


Bye,
Raymond.