Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi!

How does that simplify the problem? The difficulty is in getting data about 
when a domain was created.


I thought the problem was in getting data for recently created domains, not 
all domains.


If it's a problem with all domains, this won't help at all.


If you rely on whois data, and you do this for .nl you will get fooled!
The NL whois only shows the date that the domain was -first- registered.

If its expired and a year later someone registers that same domain, so it 
would be new for people, the whois shows the date the first registration 
was done.


If you want reliable data, especially for .cn, .hk and so on you need to 
get a relation with those registry's. And so far (welcome to communism) 
not many people succeeded there.


Bye,
Raymond.


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi!


The other part of the problem is determining the age of a domain. The
only way to do that absent a registrar feed is to do a whois query,
which may or may not return the data you need, and which is considered
abusive when automated and done often.



It would be nice if Google could help out here...if anyone besides the
registrar knows when a domain first shows up on the net, it's them.


They have no reason to help others filter spam.  They have a competitive 
advantage.  They would prefer everyone to use gmail or pay for their 
commercial spam filtering service for non-gmail.


Warren, its a pretty silly statement.

You are aware that google helps out a lot of blacklists with data?
Appearantly not.

Bye,
Raymond.


Re: SpamAssassin Ruleset Generation

2009-10-07 Thread Karsten Bräckelmann
On Tue, 2009-10-06 at 13:50 -0700, an anonymous Nabble user wrote:
> Other than the sought rules, all the rules are manually generated?

Actually, as has been said, I believe all stock rules are manually
written. There are some third-party rule-sets out there that are auto
generated -- not limited to Sought.

> Is there any statistics on how frequently are new rules/regex adopted by
> spamassasssin? Who are the people who write them? Any details related to it?

Somehow this begs the question -- why?

Why are you asking? Why and what are you ultimately interested in?

And of course, did you even consider to dig through the SVN repo, some
docs on the wiki and to ask google? Most of this should be pretty easy
to find out if you're willing to read some.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Spam Eating Monkey?

2009-10-07 Thread Len Conrad
-- Original Message --
From: Warren Togami 
Date:  Sun, 04 Oct 2009 19:42:06 -0400

>http://spameatingmonkey.com
>
>Anyone have any experience using these DNSBL and URIBL's?

I plugged these into my main.cf just just before "permit", and therefore before 
content-filter, to see what increment SEM caught that all my other envelope 
filtering missed. I haven't put SEM into SA, yet.

 reject_rbl_client bl.spameatingmonkey.net,
 reject_rhsbl_sender fresh15.spameatingmonkey.net,
 reject_rhsbl_client fresh15.spameatingmonkey.net,
 reject_rhsbl_sender uribl.spameatingmonkey.net,
 reject_rhsbl_client uribl.spameatingmonkey.net,
 reject_rhsbl_sender urired.spameatingmonkey.net,
 reject_rhsbl_client urired.spameatingmonkey.net,

the total SEM reject today, 00:00 to 14:30 :

egrep -ic "reject:.*spameatingmonkey" maillog
3956

breakdown:

egrep -ic "reject:.*bl.spameatingmonkey" maillog
358

egrep -ic "reject:.*fresh*\.spameatingmonkey" maillog
96

egrep -ic "reject:.*urired*\.spameatingmonkey" maillog
3214

egrep -ic "reject:.*uribl*\.spameatingmonkey" maillog
172


That's a very significant chunk out of the rejects that would have been done by 
content-scanning.

here's the helo's for the urired rejects. In nearly all cases, the from and ptr 
domains were the same as helo domain

egrep -i "reject:.*urired*\.spameatingmonkey" maillog | awk '{print $NF}' | 
sort -f | uniq -ic | sort -rnf | less

 160 helo=
 104 helo=
  49 helo=
  42 helo=
  27 helo=
  21 helo=
  20 helo=
  20 helo=
  19 helo=
  19 helo=
  18 helo=<123fiesta.net>
  17 helo=
  17 helo=
  16 helo=
  16 helo=
  15 helo=
  14 helo=
  14 helo=
  14 helo=
  14 helo=
  13 helo=
  12 helo=
  12 helo=
  12 helo=
  12 helo=
  12 helo=
  11 helo=
  11 helo=
  11 helo=
  11 helo=
  11 helo=
  11 helo=
  10 helo=
  10 helo=
  10 helo=
  10 helo=
  10 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   9 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   8 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   7 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   6 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=
   5 helo=<52t.problemsroederer.com>
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=
   4 helo=<24hb.whencorrections.com>
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo=
   3 helo

Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Yet Another Ninja

On 10/7/2009 8:01 PM, Rob McEwen wrote:

Blaine Fleming wrote:

I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.


Which brings up an interesting idea. I wonder how many legit non-spam
..cn domains exist? Surely it is a fraction of a percent of the # of .cn
domains used for spam purposes, correct?


nope.. there are zillions and growing.
same thought could apply to .com or .org or .net for chinese mom & pop 
users.


and its still pretty off-topic in the SA users list and better placed in 
spam-l.com


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

On 10/07/2009 03:29 PM, Jason Bertoch wrote:

John Hardin wrote:


The other part of the problem is determining the age of a domain. The
only way to do that absent a registrar feed is to do a whois query,
which may or may not return the data you need, and which is considered
abusive when automated and done often.


It would be nice if Google could help out here...if anyone besides the
registrar knows when a domain first shows up on the net, it's them.



They have no reason to help others filter spam.  They have a competitive 
advantage.  They would prefer everyone to use gmail or pay for their 
commercial spam filtering service for non-gmail.


Warren


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Jason Bertoch

John Hardin wrote:


The other part of the problem is determining the age of a domain. The 
only way to do that absent a registrar feed is to do a whois query, 
which may or may not return the data you need, and which is considered 
abusive when automated and done often.


It would be nice if Google could help out here...if anyone besides the 
registrar knows when a domain first shows up on the net, it's them.




Re: Harvested Fresh .cn URIBL

2009-10-07 Thread John Hardin

On Wed, 7 Oct 2009, Terry Carmen wrote:


John Hardin wrote:

 On Wed, 7 Oct 2009, Terry Carmen wrote:

>  Instead of blacklisting new domains (which is apparently difficult to 
>  do), why not blacklist all .cn domains (or simply all domains) newer 
>  than xxx days?
> 
>  If they're older than xxx days and not yet on another blacklist for 
>  sending actual spam, return a neutral response.


 How does that simplify the problem? The difficulty is in getting data
 about when a domain was created.


I thought the problem was in getting data for recently created domains, not 
all domains.


If it's a problem with all domains, this won't help at all.


The other part of the problem is determining the age of a domain. The only 
way to do that absent a registrar feed is to do a whois query, which may 
or may not return the data you need, and which is considered abusive when 
automated and done often.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  We should endeavour to teach our children to be gun-proof
  rather than trying to design guns to be child-proof
---
 6 days since a sunspot last seen - EPA blames CO2 emissions


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Blaine Fleming
Terry Carmen wrote:

> Instead of blacklisting new domains (which is apparently difficult to
> do), why not blacklist all .cn domains (or simply all domains)  newer
> than xxx days?
> 
> If they're older than xxx days and not yet on another blacklist for
> sending actual spam, return a neutral response.


How do you determine age?  Whois queries really can't be used because of
the reasons I mentioned in my previous post.  I guess one option is to
record the date that the domain was first seen anywhere and then work
from that but what about the domains that are rarely used?

One of the big problems I see with trying to look at .cn domains in the
wild is the lack of data.  How many people here deal with a large volume
of mail that would have legitimate .cn domains?  I seem to remember
recently seeing one of the big blacklists not having enough non-english
mail to work with.

--Blaine


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Terry Carmen

John Hardin wrote:

On Wed, 7 Oct 2009, Terry Carmen wrote:

Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains) newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


How does that simplify the problem? The difficulty is in getting data 
about when a domain was created.


I thought the problem was in getting data for recently created domains, 
not all domains.


If it's a problem with all domains, this won't help at all.

Terry



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Karsten Bräckelmann
On Wed, 2009-10-07 at 14:01 -0400, Rob McEwen wrote:
> I might be interested in maintaining such a freely available
> list--accessible via rsync at no charge--if someone else would come up
> with the SA rule or plugin.

I'd be interested in giving this a shot, as a proof-of-concept. Any
details and further discussion off-list.


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Karsten Bräckelmann
> > Spam from .cn domains can be mitigated with the right rules and querying
> > multiple lists.  I know my users never see .cn domains in their inbox
> > and if I didn't run a blacklist I wouldn't either.

Indeed, spam with .cn URIs really doesn't appear to be a problem at all.
They are well covered by the existing URI DNSBLs -- which are doing an
awesome job, btw -- and the rest of the SA rules.

There is no value in additional rules that catch anyway high scoring
spam. It's the low scorers that need our attention.


> Instead of blacklisting new domains (which is apparently difficult to 
> do), why not blacklist all .cn domains (or simply all domains)  newer 
> than xxx days?

It has been pointed out before, but it still is kind of funny, how this
thread re-invents existing techniques and re-iterates the very same,
often discussed problems. And keeps doing so. I haven't seen anything
new so far.

The design of a BL is pretty off-topic here anyway, even more so on the
users list.

  guenther


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Harvested Fresh .cn URIBL

2009-10-07 Thread John Hardin

On Wed, 7 Oct 2009, Terry Carmen wrote:

Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains)  newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


How does that simplify the problem? The difficulty is in getting data 
about when a domain was created.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Of the twenty-two civilizations that have appeared in history,
  nineteen of them collapsed when they reached the moral state the
  United States is in now.  -- Arnold Toynbee
---
 6 days since a sunspot last seen - EPA blames CO2 emissions


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Terry Carmen

Blaine Fleming wrote:

Warren Togami wrote:
  

Opinions of this proposal?



I would love to have a listing of recently registered .cn domains but
until the TLD operator starts working with us that just isn't going to
happen.

Trying to perform a whois lookup on every domain is painfully slow.
Once you get a high enough volume of .cn domains detected it will become
impossible and that is assuming you are never rate limited.  On top of
that, most of the time when I do a whois lookup on a .cn domain I find
the destination whois server to be unresponsive, stuck in a "maintenance
mode" or doesn't include any data except the domain name and the listed
nameservers.

Spam from .cn domains can be mitigated with the right rules and querying
multiple lists.  I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.
  
Instead of blacklisting new domains (which is apparently difficult to 
do), why not blacklist all .cn domains (or simply all domains)  newer 
than xxx days?


If they're older than xxx days and not yet on another blacklist for 
sending actual spam, return a neutral response.


Terry






Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Rob McEwen
Blaine Fleming wrote:
> I know my users never see .cn domains in their inbox
> and if I didn't run a blacklist I wouldn't either.

Which brings up an interesting idea. I wonder how many legit non-spam
.cn domains exist? Surely it is a fraction of a percent of the # of .cn
domains used for spam purposes, correct?

If that assurtion is right, then maybe it makes more sense to build a
really good and comprehensive ".cn" whitelist. Then create a rule in SA
whereby ".cn" domains not on that whitelist would add a point or two to
the score. (it shouldn't be used to outright block due to the "guilty
until proven innocent stance!"...and it shouldn't be a default SA rule)

I might be interested in maintaining such a freely available
list--accessible via rsync at no charge--if someone else would come up
with the SA rule or plugin. My unique contribution could be providing a
means for my own "invaluement URI ratings engine" to rate potential
candidates for whitelisting--this would separate most of the wheat from
the chaff with little effort--just as long as the entries submitted was
kept to a reasonably low volume.

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Blaine Fleming
Warren Togami wrote:
> Opinions of this proposal?

I would love to have a listing of recently registered .cn domains but
until the TLD operator starts working with us that just isn't going to
happen.

Trying to perform a whois lookup on every domain is painfully slow.
Once you get a high enough volume of .cn domains detected it will become
impossible and that is assuming you are never rate limited.  On top of
that, most of the time when I do a whois lookup on a .cn domain I find
the destination whois server to be unresponsive, stuck in a "maintenance
mode" or doesn't include any data except the domain name and the listed
nameservers.

Spam from .cn domains can be mitigated with the right rules and querying
multiple lists.  I know my users never see .cn domains in their inbox
and if I didn't run a blacklist I wouldn't either.

--Blaine


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

On 10/07/2009 11:27 AM, Raymond Dijkxhoorn wrote:

We are working on getting .CN zone access. Thats the only way to speed
things up. The only challenging part is to get a copy of the CN zone
just like we get copy's of other ccTLD/gTLD's.


OK, I was under the impression that it was impossible to obtain zone 
access.  If this happens then great!


Warren


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Raymond Dijkxhoorn

Hi Warren!


It seems then the only way to feed a URIBL fresh .cn domains would be a
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or SEM. 
My own volume of spam is too small to do this.



you haven't really looked into Spamhaus data & SA rules, have you?
have you looked into SURBL/URIBL's data & datafeeds?

what's the deal here? do you not represent RedHat?
have no acccess to RH spam data?

just can't imagine RH can't provide itself or the community with plenty of 
spam data.



Opinions of this proposal?


How many whois lookup you think you can do a day till you get blocked?

If you can get a copy of the CN root zone existing blacklists can do the
work for you. If not, give the current blacklists like SURBL and URIBL a
bit more credit. We are doing that exact same thing. You explain or
reinvent the DOB list. Your method is loose and dangerous. While other
blacklists have allready mechanisms in place to avoid false positives.

We are working on getting .CN zone access. Thats the only way to speed
things up. The only challenging part is to get a copy of the CN zone just 
like we get copy's of other ccTLD/gTLD's.


What you describe isnt new and isnt exiting. For me its daily things. Its
routine. Check SURBL and URIBL. Try to understand how those lists work.

The last 48 hour we added 1211 .CN domains into SURBL.
Check for example how URIBL GOLD works.

Remeber that a large part of the filtering tips and tricks is not to talk
about what you are doing, just do it. Telling in detail how you do things
will only give spammers and advantage and doesnt bring the community any
good.

You seem to have just entered the SA work, and like to help, thats good 
but dont end up making a whole lot of noise.


If RH is sserious about this, attend conferences like MAAWG and talk 
with people there, talk with the blacklist guys, many are on those 
events.


Dont flood people on the user list please. Most likely there are better 
lists to start talks like this.


Bye,
Raymond.


Re: Harvested Fresh .cn URIBL

2009-10-07 Thread Yet Another Ninja

On 10/7/2009 5:00 PM, Warren Togami wrote:
 > It seems then the only way to feed a URIBL fresh .cn domains would be a
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or 
SEM.  My own volume of spam is too small to do this.


you haven't really looked into Spamhaus data & SA rules, have you?
have you looked into SURBL/URIBL's data & datafeeds?

what's the deal here? do you not represent RedHat?
have no acccess to RH spam data?

just can't imagine RH can't provide itself or the community with plenty 
of spam data.



Opinions of this proposal?


sorry - imo, reinventing the wheeel some time too late




Harvested Fresh .cn URIBL

2009-10-07 Thread Warren Togami

http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_URL/detail
A very sizeable amount of spam (currently 50%) contains .cn domains that 
were registered very recently.  They keep registering new domains in 
order to keep ahead of the URIBL's.


http://ruleqa.spamassassin.org/20091006-r822170-n/T_CN_EIGHT/detail
Last month, I noticed that a very sizeable percentage of the .cn spam 
were fresh and random \w{8}.cn domain names.


http://ruleqa.spamassassin.org/20091007-r822624-n/T_CN_SEVEN/detail
I don't know if it was due to our discussion here, but for whatever 
reason I began seeing new spam with \w{7}.cn domains registered since 
October 3rd, and \w{8}.cn seems to be tapering off now.


http://spameatingmonkey.com/lists.html#SEM-FRESH
\w{8}.cn or any length is unsafe to be used as a real rule.  The only 
safe way to detect these fresh .cn domains would be a URIBL.  But 
URIBL's like SEM-FRESH described here are only capable of knowing new 
domains of TLD's who provide zone files that can be compared.


It seems then the only way to feed a URIBL fresh .cn domains would be a 
spam trap.  This proposed URIBL would be extremely easy to build on the 
infrastructure of existing trap-based DNSBL's like PSBL, HOSTKARMA or 
SEM.  My own volume of spam is too small to do this.


A targeted URIBL verified by whois for registration dates would be near 
100% accurate and deserving of a high score.  This would hopefully break 
the economic feasibility of .cn URI spam by rendering fresh domains 
quickly useless.  This could be a new URIBL, or an existing URIBL.  If 
this is an existing URIBL, spamassassin can use meta rules to boolean 
match .cn domains and assign a higher score.  Example:


meta FRESHCN_7 SOME_URIBL && CN_URL
score FRESHCN_7 0 4.0 0 4.0

Spam Trap Workflow
==
1. Spam trap receives spam containing .cn URI.
2. Lookup locally, is this .cn domain already known?
3. If already known, stop.
4. Lookup A record of this domain.  If NXDOMAIN stop.
5. Record domain in database with UNKNOWN registration date.

URIBL Generation Workflow
=
1. If domain has UNKNOWN registration date, attempt whois lookup.
   Record registration date if found.
2. Ignore all UNKNOWN records.
3. Dump all domains registered in the last 7 days into one zone.
   score FRESHCN_7 0 4.0 0 4.0
4. Dump all domains registered in the last 14 days into another zone.
   score FRESHCN_14 0 2.0 0 2.0
5. Stop listing anything older than 14 days.  By then the regular 
URIBL's have listed these domains.
6. Do not delete older .cn domains.  Keeping them in the database 
prevents redundant whois lookups later.


The only challenging part here is whois lookup rate limiting.  whois 
lookups are critical to populating this URIBL, but it is a resource that 
can only be used in small quantities.  The above workflow attempts to 
minimize the number of whois lookups.


Given that only spammers would send mail to a trap, the number of .cn 
domain names might be small enough to handle whois lookups.  The goal 
here is to break the economic model.  I'm told that .cn domains cost 
$3-10/each to register, and whois lookups are certainly cheaper to 
automate.  I can't find a published whois rate limit for CNNIC.  In any 
case, it wouldn't be difficult for us to proxy whois lookups to bypass 
rate limits should that become necessary.


Opinions of this proposal?

Is anyone from PSBL, HOSTKARMA, or SEM interested?

Warren Togami
wtog...@redhat.com


Re: SIGCHLD query

2009-10-07 Thread Martin Gregorie
On Wed, 2009-10-07 at 14:31 +0200, Per Jessen wrote:
> Okay, I ran a check on my logs since midnight - yes, I also see a lot of
> child processes running for less than 10secs, in fact slightly more
> than 50%.  Interesting issue.  
> 
Here's the results of a scan across all my mail logs:

Processing file /var/log/maillog*
 3544 Messages found
 3538 Results (99.8%)
6 SIGCHLDs caught (0.2%)
 minavgmax
Message size:353   7340 496682
Scan time (secs):0.52.3   34.5

I've checked all the SIGCHLD log lines. The previuous scan by those
children were all in the range 1.- to 3.1 seconds. I'm using the default
child population and the default --timeout-child of 300 secs.


Martin




Re: consolidating DNSBLs into a single query (was Spam Eating Monkey?)

2009-10-07 Thread Rob McEwen
Mike Cardwell wrote:
> I don't understand the logic of that. Ie, why you'd need to use
> bitmasking? zen.spamhaus.org is a combination of various different
> lists and returns multiple values like this:

If every list is an "outright block" list, then you are correct. My
point applies to situations where some lists are used in scoring mode,
and where there is a desire to be able to calculate a score based on
exactly which lists hit on a particular sending IP.

But even if someone tries this with all "outright block lists", and uses
rbldnsd's built in ability to consolidate lists, then there are still
two problems:

(a) for auditing purposes, there'd be no way to tell *which* lists hit
on that IP since many use the same return codes

(b) some hundreds-of-MB-large lists which previously could have used the
lower-memory "ip4tset" would have to revert back to slower and
higher-memory-usage "ip4set", fwiw

Again, not saying these problems can't be solved, only pointing them out
so that anyone who cares to try can know what they need to do, or need
to expect.

-- 
Rob McEwen
http://dnsbl.invaluement.com/
r...@invaluement.com
+1 (478) 475-9032




Re: SIGCHLD query

2009-10-07 Thread Per Jessen
Per Jessen wrote:

> Martin Gregorie wrote:
>>> Yeah - maybe there is some indication in the log?  I think there is
>>> a switch that determines how many emails a child will process before
>>> needing restart. (just looked it up:  --max-conn-per-child)
>>> I just checked my logs, during the last 9 hours I have 6016 of
>>> these:
>>>
>>> spamd[11362]: spamd: handled cleanup of child pid 14010 due to
>>> SIGCHLD
>>>
>>> Is that the one you mean?
>>>
>> That's the only log message I've seen. Sometimes you can associate it
>> with a scan that exceeded --timeout-child seconds and sometimes, much
>> more rarely, it happens after a scan taking two or three seconds.
> 
> I don't know if that is happening on my systems too, I haven't
> checked.

Okay, I ran a check on my logs since midnight - yes, I also see a lot of
child processes running for less than 10secs, in fact slightly more
than 50%.  Interesting issue.  


/Per Jessen, Zürich



Re: SIGCHLD query

2009-10-07 Thread Martin Gregorie
> Yeah - maybe there is some indication in the log?  I think there is a 
> switch that determines how many emails a child will process before 
> needing restart. (just looked it up:  --max-conn-per-child)
> I just checked my logs, during the last 9 hours I have 6016 of these:
> 
> spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD
> 
> Is that the one you mean?
> 
That's the only log message I've seen. Sometimes you can associate it
with a scan that exceeded --timeout-child seconds and sometimes, much
more rarely, it happens after a scan taking two or three seconds. Tuning
would be easier if there was some indication about why a scan had
terminated - maybe it could be added to the statistics list in the
'results' log line.

> There are also arguments for controlling minimum/maximum number of spare 
> child processes - if your load varies, and you have a significant 
> difference between min and max, I could see that leading to more child 
> processes stopping and starting.
> 
Does the parent or the child determine whether the child stays alive
after completing a scan or whether it should terminate?


Martin





Re: SIGCHLD query

2009-10-07 Thread Per Jessen

Martin Gregorie wrote:
Yeah - maybe there is some indication in the log?  I think there is a 
switch that determines how many emails a child will process before 
needing restart. (just looked it up:  --max-conn-per-child)

I just checked my logs, during the last 9 hours I have 6016 of these:

spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD

Is that the one you mean?


That's the only log message I've seen. Sometimes you can associate it
with a scan that exceeded --timeout-child seconds and sometimes, much
more rarely, it happens after a scan taking two or three seconds. 


I don't know if that is happening on my systems too, I haven't checked. 
 I wonder if the latter could be caused by the maintenance of spare 
child processes?


There are also arguments for controlling minimum/maximum number of spare 
child processes - if your load varies, and you have a significant 
difference between min and max, I could see that leading to more child 
processes stopping and starting.



Does the parent or the child determine whether the child stays alive
after completing a scan or whether it should terminate?


It's the child that determines that "Uh, I've done X scans, all done". 
It's just a for-loop:


for( i=0; i

Re: consolidating DNSBLs into a single query (was Spam Eating Monkey?)

2009-10-07 Thread Mike Cardwell

On 07/10/2009 05:19, Rob McEwen wrote:


Also, this loses the ability to *score* on multiple lists... unless you
use a bitmasked scoring system whereby one list gets assigned ".2",
another ".4", another ".8", on to ".128". But that leaves a maximum of
only 7 lists. Sure, you can add more than 7 by employing other octets in
the "answer IP", but that only severely complicates matters.

And as it stands, you'd also have the complexity of getting the spam
filter to parse, understand, and react properly to those bitmasks.


I don't understand the logic of that. Ie, why you'd need to use 
bitmasking? zen.spamhaus.org is a combination of various different lists 
and returns multiple values like this:


m...@haven:~$ host -t a 2.0.0.127.zen.spamhaus.org
2.0.0.127.zen.spamhaus.org  A   127.0.0.4
2.0.0.127.zen.spamhaus.org  A   127.0.0.10
2.0.0.127.zen.spamhaus.org  A   127.0.0.2
m...@haven:~$

It's perfectly easy for SpamAssassin to see that three different values 
have been returned, so 127.0.0.2 is on three separate lists and that an 
extra score should be applied for each of those three.


It's also quite easy to do it in Exim, eg if I wanted to block an email 
in Exim if the sending ip is on both sbl.spamhaus.org and 
xbl.spamhaus.org I could either do two dns lookups like this:


deny dnslists = sbl.spamhaus.org
 dnslists = xbl.spamhaus.org

Or I could do it with a single dns lookup like this:

deny dnslists = zen.spamhaus.org=127.0.0.2
 dnslists = zen.spamhaus.org=127.0.0.4

You can be 100% backwards compatible by leaving all of your lists as 
they are, but then adding another one which is a combined version of all 
of them...


--
Mike Cardwell - IT Consultant and LAMP developer
Cardwell IT Ltd. (UK Reg'd Company #06920226) http://cardwellit.com/


Re: SIGCHLD query

2009-10-07 Thread Per Jessen

Martin Gregorie wrote:

On Tue, 2009-10-06 at 23:16 +0200, Per Jessen wrote:

Martin, generally speaking, the parent can only report the signal and
that the child has gone away.  The child would have to report on why. 


OK, rephrase that to "a pity the child doesn't say why its generating a
SIGCHLD signal".



Yeah - maybe there is some indication in the log?  I think there is a 
switch that determines how many emails a child will process before 
needing restart. (just looked it up:  --max-conn-per-child)

I just checked my logs, during the last 9 hours I have 6016 of these:

spamd[11362]: spamd: handled cleanup of child pid 14010 due to SIGCHLD

Is that the one you mean?

There are also arguments for controlling minimum/maximum number of spare 
child processes - if your load varies, and you have a significant 
difference between min and max, I could see that leading to more child 
processes stopping and starting.



/Per