Re: General assistance

2006-03-01 Thread DAve

Chris Santerre wrote:

I would like to make a quick comment to everyone who has helped in this
thread:

Great job. Seriously. Some good answers here. Can we we all take a minute to
make sure these answers are posted somewhere on the SA wiki's for future
reference? Its been a while since we had a push for additions.

http://wiki.apache.org/spamassassin/
and
http://www.exit0.us/

Your chance to preserve your helpful info in the anals of history. (That
almost sounds painful!)

Thanks!

Chris Santerre


Chris and all,

I apologize for being so slow in getting to this, things came up.

I found a page in the Wiki I had not seen, and could not find a link 
for, titled FasterPerformance. It gives an explanation of the DNS cache 
solution. I saw no sense in rewriting an already excellent text.


I also added a page titled ChooseYourRules with my thoughts.

Both pages are now linked under "Performance Tips" at 
http://wiki.apache.org/spamassassin/UsingSpamAssassin


DAve

--
This message was checked by forty monkeys and
found to not contain any SPAM whatsoever.

Your monkeys may vary


Re: General assistance

2006-02-14 Thread DAve

Chris Santerre wrote:



-Original Message-
From: DAve [mailto:[EMAIL PROTECTED]
Sent: Tuesday, February 14, 2006 3:14 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance


Chris Santerre wrote:

I would like to make a quick comment to everyone who has 


helped in this


thread:

Great job. Seriously. Some good answers here. Can we we all 


take a minute to

make sure these answers are posted somewhere on the SA 


wiki's for future


reference? Its been a while since we had a push for additions.

http://wiki.apache.org/spamassassin/
and
http://www.exit0.us/


Cool, never saw that before.


Your chance to preserve your helpful info in the anals of 


history. (That


almost sounds painful!)

Thanks!



Tell me what parts should be added, and where to put them,

Tips and Tricks?
Performance Hints?
Managing High Load?

and I will add what I can.

DAve




Thats the beauty of a wiki, put it anywhere you like. We can always change
it. ;) 


--Chris



Don't get me started on Wikis, I still have nightmares about 
faq-o-matics. No one is worse, or more negligent, or more lazy about 
documentation that a sysadmin. I know cause I am one, and I have two 
documentation projects I haven't even started yet (whoops).


Anyone who thought that sysadmins would self document through a Wiki had 
a screw loose or a drinking problem. But I will stop crying now and 
endevor to become part of the solution! ;^)


DAve



RE: General assistance

2006-02-14 Thread Chris Santerre
Title: RE: General assistance







> -Original Message-
> From: DAve [mailto:[EMAIL PROTECTED]]
> Sent: Tuesday, February 14, 2006 3:14 PM
> To: users@spamassassin.apache.org
> Subject: Re: General assistance
> 
> 
> Chris Santerre wrote:
> > I would like to make a quick comment to everyone who has 
> helped in this
> > thread:
> > 
> > Great job. Seriously. Some good answers here. Can we we all 
> take a minute to
> > make sure these answers are posted somewhere on the SA 
> wiki's for future
> > reference? Its been a while since we had a push for additions.
> > 
> > http://wiki.apache.org/spamassassin/
> > and
> > http://www.exit0.us/
> 
> Cool, never saw that before.
> 
> > 
> > Your chance to preserve your helpful info in the anals of 
> history. (That
> > almost sounds painful!)
> > 
> > Thanks!
> > 
> 
> Tell me what parts should be added, and where to put them,
> 
> Tips and Tricks?
> Performance Hints?
> Managing High Load?
> 
> and I will add what I can.
> 
> DAve



Thats the beauty of a wiki, put it anywhere you like. We can always change it. ;) 


--Chris





Re: General assistance

2006-02-14 Thread DAve

Chris Santerre wrote:

I would like to make a quick comment to everyone who has helped in this
thread:

Great job. Seriously. Some good answers here. Can we we all take a minute to
make sure these answers are posted somewhere on the SA wiki's for future
reference? Its been a while since we had a push for additions.

http://wiki.apache.org/spamassassin/
and
http://www.exit0.us/


Cool, never saw that before.



Your chance to preserve your helpful info in the anals of history. (That
almost sounds painful!)

Thanks!



Tell me what parts should be added, and where to put them,

Tips and Tricks?
Performance Hints?
Managing High Load?

and I will add what I can.

DAve


RE: General assistance

2006-02-14 Thread Chris Santerre
Title: RE: General assistance





I would like to make a quick comment to everyone who has helped in this thread:


Great job. Seriously. Some good answers here. Can we we all take a minute to make sure these answers are posted somewhere on the SA wiki's for future reference? Its been a while since we had a push for additions.

http://wiki.apache.org/spamassassin/
and
http://www.exit0.us/


Your chance to preserve your helpful info in the anals of history. (That almost sounds painful!)


Thanks!


Chris Santerre
SysAdmin and SARE/URIBL ninja
http://www.uribl.com
http://www.rulesemporium.com




> -Original Message-
> From: Ed Russell [mailto:[EMAIL PROTECTED]]
> Sent: Friday, February 10, 2006 4:42 PM
> To: users@spamassassin.apache.org
> Subject: RE: General assistance
> 
> 
> I was doing some reading and I am beginning to look into 
> Rules Du Jour.  I
> see there are quite a large number of rulesets to choose from 
> when utilizing
> this.  Does anyone have any advice on what ones would be safe?
> 
> Ed
> 
> 
> ---
> 
>  Talk is cheap since supply always exceeds demand.
> 
> ---
>  
> 
> -Original Message-
> From: DAve [mailto:[EMAIL PROTECTED]] 
> Sent: Friday, February 10, 2006 4:30 PM
> To: users@spamassassin.apache.org
> Subject: Re: General assistance
> 
> Bowie Bailey wrote:
> > DAve wrote:
> > 
> >>Ed Russell wrote:
> >>
> >>>2. Once this is in place should I re-activate pzyor, dcc or razor? 
> >>>Is one better than the other?  Are there advantages to either?
> >>
> >>I use neither, though I think I am in the minority. I 
> routinely check
> >>  my spam and I have found that bayes, rayzor, dcc, and most of the
> >>SARE rules catch little if any spam "for me". So I don't 
> run them and
> >>save the CPU for additional spamd processes.
> > 
> > 
> > That's odd.  Bayes, Razor2, DCC work quite well for me.  
> Check out my
> > stats from today:
> > 
> > TOP SPAM RULES FIRED
> > 
> > RANK    RULE NAME   COUNT %OFRULES 
> %OFMAIL %OFSPAM
> > %OFHAM
> > 
> >    1    RAZOR2_CF_RANGE_51_100   1280 5.02   
> 48.05   83.33
> > 0.98
> >    2    RAZOR2_CHECK 1259 4.94   
> 47.26   81.97
> > 1.15
> >    3    RAZOR2_CF_RANGE_E8_51_100    1164 4.56   
> 43.69   75.78
> > 0.27
> 
> > 
> > 
> > Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.
> > 
> 
> They tagged plenty of spam for me, no doubt about that. But 
> they caught 
> only a few spam that SA wouldn't have caught without them. It is rare 
> that bayes points on top of existing points ever made the 
> score squeek 
> over the threshold.
> 
> Not using them however, dropped my CPU, network, and memory 
> requirements 
> so much I could run twice as many spamd processes. Processing 
> time went 
> from an average of 10 seconds (with all SARE rules, bayes, 
> DCC, Razor) 
> to 2 seconds (limited SARE, no bayes, no razor, no dcc).
> 
> All the SARE rules loaded makes spamd run about 45-75mb each, 
> selective 
> SARE rules and I can see spamd drop to 23-35mb. More spamd, 
> faster spamd.
> 
> Of course tommorrow, everything could change ;^)
> 
> DAve
> 
> 
> 





Re: General assistance

2006-02-14 Thread Daniel Cañas Montero


On Feb 14, 2006, at 10:47 AM, DAve wrote:


Daniel Cañas Montero wrote:

On Feb 11, 2006, at 3:14 PM, Ed Russell wrote:
I have to say a heartfelt THANK YOU to everyone who contributed  
to  this
thread.  My filter is working 500% more efficient that it ever   
was.  I have

done the following:

1.Installed djbdns and I am using dnscache as I was told.  I  
have
increased the cache size to 100 Megabytes and completely  
disabled  logging

after determining it was working properly.
How do you disable logging completely? I use multilog and filter  
out  all the lines so it logs nothing.

Is there a way to tell dnscache not to actually spit anything out?


Only by removing code from dsncache I believe. Most people just  
limit what, if anything, is picked up by multilog. I've never tried  
it but it would be interesting to see what


#svc -d /service/dnscache/log

would do. That would remove any need to modify your log/run script.  
I know some people just redirect dnscache output to /dev/null. I  
sometimes need to see what the stats are for dnscache when checking  
SA (URIBL SURBL), so I've never done it.


DAve



OK. That is what I do currently...have '-*' in my multilog
but I thought there might be a way to avoid having a 'dummy' multilog  
process running.


I have l aso tried not starting the multilog (ie #svc -d /service/ 
dnscache/log), and it seems to work... but I wasn't sure if it would  
do anything funny over the long run, so I started it up again.


Maybe this is a dumb question...
But is it ok to have a process monitored by supervise not to have a  
corresponding multilog running to capture the output?

Re: General assistance

2006-02-14 Thread DAve

Daniel Cañas Montero wrote:


On Feb 11, 2006, at 3:14 PM, Ed Russell wrote:


I have to say a heartfelt THANK YOU to everyone who contributed to  this
thread.  My filter is working 500% more efficient that it ever  was.  
I have

done the following:

1.Installed djbdns and I am using dnscache as I was told.  I have
increased the cache size to 100 Megabytes and completely disabled  
logging

after determining it was working properly.



How do you disable logging completely? I use multilog and filter out  
all the lines so it logs nothing.

Is there a way to tell dnscache not to actually spit anything out?


Only by removing code from dsncache I believe. Most people just limit 
what, if anything, is picked up by multilog. I've never tried it but it 
would be interesting to see what


#svc -d /service/dnscache/log

would do. That would remove any need to modify your log/run script. I 
know some people just redirect dnscache output to /dev/null. I sometimes 
need to see what the stats are for dnscache when checking SA (URIBL 
SURBL), so I've never done it.


DAve






2.I have implemented rbl at the MTA level, I use relays.ordb.org and
sbl-xbl.spamhaus.org.

3.I have implemented Rules Du Jour.  I selected a subset of the SARE
rules and misc others.

4.I have turned back on pyzor, razor and dcc.

Scanning times are well within tolerance with a minimal impact on  
delivery

time.  See below (email addresses removed for privacy):








RE: General assistance

2006-02-14 Thread Ed Russell
[EMAIL PROTECTED] log]# cat /etc/dnscache/log/run 
#!/bin/sh
#exec setuidgid gdnslog multilog t ./main
exec setuidgid gdnslog multilog -*

You can see that as opposed to multilog t ./main I use multilog -*

That will do it.  Enjoy.

Ed


---

 Talk is cheap since supply always exceeds demand.

---
 
-Original Message-
From: Daniel Cañas Montero [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, February 14, 2006 11:14 AM
To: users@spamassassin.apache.org
Subject: Re: General assistance


On Feb 11, 2006, at 3:14 PM, Ed Russell wrote:

> I have to say a heartfelt THANK YOU to everyone who contributed to  
> this
> thread.  My filter is working 500% more efficient that it ever  
> was.  I have
> done the following:
>
> 1.Installed djbdns and I am using dnscache as I was told.  I have
> increased the cache size to 100 Megabytes and completely disabled  
> logging
> after determining it was working properly.

How do you disable logging completely? I use multilog and filter out  
all the lines so it logs nothing.
Is there a way to tell dnscache not to actually spit anything out?


>
> 2.I have implemented rbl at the MTA level, I use relays.ordb.org and
> sbl-xbl.spamhaus.org.
>
> 3.I have implemented Rules Du Jour.  I selected a subset of the SARE
> rules and misc others.
>
> 4.I have turned back on pyzor, razor and dcc.
>
> Scanning times are well within tolerance with a minimal impact on  
> delivery
> time.  See below (email addresses removed for privacy):
>



Re: General assistance

2006-02-14 Thread Daniel Cañas Montero


On Feb 11, 2006, at 3:14 PM, Ed Russell wrote:

I have to say a heartfelt THANK YOU to everyone who contributed to  
this
thread.  My filter is working 500% more efficient that it ever  
was.  I have

done the following:

1.  Installed djbdns and I am using dnscache as I was told.  I have
increased the cache size to 100 Megabytes and completely disabled  
logging

after determining it was working properly.


How do you disable logging completely? I use multilog and filter out  
all the lines so it logs nothing.

Is there a way to tell dnscache not to actually spit anything out?




2.  I have implemented rbl at the MTA level, I use relays.ordb.org and
sbl-xbl.spamhaus.org.

3.  I have implemented Rules Du Jour.  I selected a subset of the SARE
rules and misc others.

4.  I have turned back on pyzor, razor and dcc.

Scanning times are well within tolerance with a minimal impact on  
delivery

time.  See below (email addresses removed for privacy):



Re: General assistance

2006-02-13 Thread DAve

Bowie Bailey wrote:

DAve wrote:


Bowie Bailey wrote:


DAve wrote:



Ed Russell wrote:



2.  Once this is in place should I re-activate pzyor, dcc or
razor? Is one better than the other?  Are there advantages to
either? 


I use neither, though I think I am in the minority. I routinely
check my spam and I have found that bayes, rayzor, dcc, and most
of the SARE rules catch little if any spam "for me". So I don't
run them and save the CPU for additional spamd processes.


That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out
my stats from today: 


Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.


They tagged plenty of spam for me, no doubt about that. But they
caught only a few spam that SA wouldn't have caught without them. It
is rare that bayes points on top of existing points ever made the
score squeek over the threshold.

Not using them however, dropped my CPU, network, and memory
requirements so much I could run twice as many spamd processes.
Processing time went from an average of 10 seconds (with all SARE
rules, bayes, DCC, Razor) to 2 seconds (limited SARE, no bayes, no
razor, no dcc). 


All the SARE rules loaded makes spamd run about 45-75mb each,
selective SARE rules and I can see spamd drop to 23-35mb. More spamd,
faster spamd. 



I guess this is a definite case of 'YMMV'.  With Bayes, Razor2, DCC, and
15 SARE rulesets, my average scantime is 2.5 seconds, but each process
is 46M and I usually only have 2 or 3 running of a max of 8 (although
with 1G of ram, I've got plenty of headroom if I need to add more).


I have 26 configured spamds this week, I have enough ram to run 40, 
though I hit the wall with the CPUs at that point. I generally have 15 
to 25 running all the time, 24x7. Even late at night I can checkin with 
the server and find only 2 or three spamd processes sleeping. Each is 
consuming 30 to 35mb of ram, currently.




The bottom line is that if you have a low to medium mail volume and a
decent server, you can probably turn it all on and not worry too much
about it.  If you have a high volume of mail, or a slower server, you
may need to be a bit more picky with your rulesets and features.

My advice is this:  Try it with Bayes, Razor2, DCC, and Pyzor.  Install
any of the SARE rulesets you think might be useful.  Then monitor your
server and see what happens.

You can use the 'top' command to see how much memory is in use and how
much each spamd process is using.  You should try to configure things
such that the server never uses swap.  If SA goes into swap, your
performance will drop through the floor.

To see your average scantime (assuming that SA is logging to syslog),
you can use this command string (or drop it in a script file):

grep -e 'clean message' -e 'identified spam' /var/log/maillog | perl -ne 'if
(/in (\d+\.\d+) seconds/) { $time += $1; $cnt++;} } $avg = $time/$cnt; print
"$avg\n"; {'

Note that this command string should be all on one line.  Your
mailreader will probably split it...

If everything is working well, you're good.

If you are using too much memory, remove some of the extra rules or
reduce the number of spamd children.

If your scantimes are too slow and the machine is not swapping, then you
should experiment with disabling Bayes, DCC, or Razor or removing rules.
Also, as others have pointed out, a local caching nameserver on your SA
machine can go a long way towards reducing lag from the network tests.



That is a good synopsis of what I went through in determining cost(in 
resources) vs benefit with each SA option/plugin/ruleset. I'm sure most 
admins have done the same, and it is excellent advice for Ed. I couldn't 
have said better myself.


DAve



RE: General assistance

2006-02-13 Thread Ed Russell
Thanks for the advice, it's well suited.  FYI, my average scan time is:

7.73104733769435

I have enabled pyzor, razor and dcc.  All looks fine for now.  Of course
this is a work in progress and I will have to keep a close eye on it.

Ed


---

 Talk is cheap since supply always exceeds demand.

---
 

-Original Message-
From: Bowie Bailey [mailto:[EMAIL PROTECTED] 
Sent: Monday, February 13, 2006 1:20 PM
To: users@spamassassin.apache.org
Subject: RE: General assistance

DAve wrote:
> Bowie Bailey wrote:
> > DAve wrote:
> > 
> > > Ed Russell wrote:
> > > 
> > > > 2.  Once this is in place should I re-activate pzyor, dcc or
> > > > razor? Is one better than the other?  Are there advantages to
> > > > either? 
> > > 
> > > I use neither, though I think I am in the minority. I routinely
> > >  check my spam and I have found that bayes, rayzor, dcc, and most
> > > of the SARE rules catch little if any spam "for me". So I don't
> > > run them and save the CPU for additional spamd processes.
> > 
> > That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out
> > my stats from today: 
> > 
> > Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.
> 
> They tagged plenty of spam for me, no doubt about that. But they
> caught only a few spam that SA wouldn't have caught without them. It
> is rare that bayes points on top of existing points ever made the
> score squeek over the threshold.
> 
> Not using them however, dropped my CPU, network, and memory
> requirements so much I could run twice as many spamd processes.
> Processing time went from an average of 10 seconds (with all SARE
> rules, bayes, DCC, Razor) to 2 seconds (limited SARE, no bayes, no
> razor, no dcc). 
> 
> All the SARE rules loaded makes spamd run about 45-75mb each,
> selective SARE rules and I can see spamd drop to 23-35mb. More spamd,
> faster spamd. 

I guess this is a definite case of 'YMMV'.  With Bayes, Razor2, DCC, and
15 SARE rulesets, my average scantime is 2.5 seconds, but each process
is 46M and I usually only have 2 or 3 running of a max of 8 (although
with 1G of ram, I've got plenty of headroom if I need to add more).

The bottom line is that if you have a low to medium mail volume and a
decent server, you can probably turn it all on and not worry too much
about it.  If you have a high volume of mail, or a slower server, you
may need to be a bit more picky with your rulesets and features.

My advice is this:  Try it with Bayes, Razor2, DCC, and Pyzor.  Install
any of the SARE rulesets you think might be useful.  Then monitor your
server and see what happens.

You can use the 'top' command to see how much memory is in use and how
much each spamd process is using.  You should try to configure things
such that the server never uses swap.  If SA goes into swap, your
performance will drop through the floor.

To see your average scantime (assuming that SA is logging to syslog),
you can use this command string (or drop it in a script file):

grep -e 'clean message' -e 'identified spam' /var/log/maillog | perl -ne 'if
(/in (\d+\.\d+) seconds/) { $time += $1; $cnt++;} } $avg = $time/$cnt; print
"$avg\n"; {'

Note that this command string should be all on one line.  Your
mailreader will probably split it...

If everything is working well, you're good.

If you are using too much memory, remove some of the extra rules or
reduce the number of spamd children.

If your scantimes are too slow and the machine is not swapping, then you
should experiment with disabling Bayes, DCC, or Razor or removing rules.
Also, as others have pointed out, a local caching nameserver on your SA
machine can go a long way towards reducing lag from the network tests.

-- 
Bowie



RE: General assistance

2006-02-13 Thread Bowie Bailey
DAve wrote:
> Bowie Bailey wrote:
> > DAve wrote:
> > 
> > > Ed Russell wrote:
> > > 
> > > > 2.  Once this is in place should I re-activate pzyor, dcc or
> > > > razor? Is one better than the other?  Are there advantages to
> > > > either? 
> > > 
> > > I use neither, though I think I am in the minority. I routinely
> > >  check my spam and I have found that bayes, rayzor, dcc, and most
> > > of the SARE rules catch little if any spam "for me". So I don't
> > > run them and save the CPU for additional spamd processes.
> > 
> > That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out
> > my stats from today: 
> > 
> > Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.
> 
> They tagged plenty of spam for me, no doubt about that. But they
> caught only a few spam that SA wouldn't have caught without them. It
> is rare that bayes points on top of existing points ever made the
> score squeek over the threshold.
> 
> Not using them however, dropped my CPU, network, and memory
> requirements so much I could run twice as many spamd processes.
> Processing time went from an average of 10 seconds (with all SARE
> rules, bayes, DCC, Razor) to 2 seconds (limited SARE, no bayes, no
> razor, no dcc). 
> 
> All the SARE rules loaded makes spamd run about 45-75mb each,
> selective SARE rules and I can see spamd drop to 23-35mb. More spamd,
> faster spamd. 

I guess this is a definite case of 'YMMV'.  With Bayes, Razor2, DCC, and
15 SARE rulesets, my average scantime is 2.5 seconds, but each process
is 46M and I usually only have 2 or 3 running of a max of 8 (although
with 1G of ram, I've got plenty of headroom if I need to add more).

The bottom line is that if you have a low to medium mail volume and a
decent server, you can probably turn it all on and not worry too much
about it.  If you have a high volume of mail, or a slower server, you
may need to be a bit more picky with your rulesets and features.

My advice is this:  Try it with Bayes, Razor2, DCC, and Pyzor.  Install
any of the SARE rulesets you think might be useful.  Then monitor your
server and see what happens.

You can use the 'top' command to see how much memory is in use and how
much each spamd process is using.  You should try to configure things
such that the server never uses swap.  If SA goes into swap, your
performance will drop through the floor.

To see your average scantime (assuming that SA is logging to syslog),
you can use this command string (or drop it in a script file):

grep -e 'clean message' -e 'identified spam' /var/log/maillog | perl -ne 'if
(/in (\d+\.\d+) seconds/) { $time += $1; $cnt++;} } $avg = $time/$cnt; print
"$avg\n"; {'

Note that this command string should be all on one line.  Your
mailreader will probably split it...

If everything is working well, you're good.

If you are using too much memory, remove some of the extra rules or
reduce the number of spamd children.

If your scantimes are too slow and the machine is not swapping, then you
should experiment with disabling Bayes, DCC, or Razor or removing rules.
Also, as others have pointed out, a local caching nameserver on your SA
machine can go a long way towards reducing lag from the network tests.

-- 
Bowie


Re: General assistance

2006-02-12 Thread DAve

Ed Russell wrote:

I have to say a heartfelt THANK YOU to everyone who contributed to this
thread.  My filter is working 500% more efficient that it ever was.  I have
done the following:

1.  Installed djbdns and I am using dnscache as I was told.  I have
increased the cache size to 100 Megabytes and completely disabled logging
after determining it was working properly.

2.  I have implemented rbl at the MTA level, I use relays.ordb.org and
sbl-xbl.spamhaus.org.

3.  I have implemented Rules Du Jour.  I selected a subset of the SARE
rules and misc others.

4.  I have turned back on pyzor, razor and dcc.

Scanning times are well within tolerance with a minimal impact on delivery
time.  See below (email addresses removed for privacy):

Feb 11 16:10:18 as spamd[4137]: spamd: identified spam (31.3/4.0) for
[EMAIL PROTECTED] :99 in 4.5 seconds, 1178 bytes. 
Feb 11 16:10:18 as spamd[363]: spamd: clean message (1.2/4.0) for
[EMAIL PROTECTED] :99 in 3.1 seconds, 8939 bytes. 
Feb 11 16:10:19 as spamd[4218]: spamd: clean message (0.0/4.0) for

[EMAIL PROTECTED] :99 in 5.4 seconds, 2245 bytes.

I have some final questions though,

a.  Can I get any statistics from rblsmtpd (I know this isn't a group
devoted to it, but I figured I would ask)?  I would like to know how many
got dropped and from where.


I don't use it anymore as my qmail toasters are not allowed traffic from 
the outside, only from my MailScanner servers. I run Sendmail and do my 
rbl checks there. But I would think this would get you a quick count,


#cd /var/log/qmail/smtpd/
#cat current [EMAIL PROTECTED] | grep rblsmtpd | wc -l
#cat current [EMAIL PROTECTED] | grep relays.ordb.org | wc -l
#cat current [EMAIL PROTECTED] | grep sbl-xbl.spamhaus.org | wc -l

Script from there forward and you can gleen just about as much as you 
care to sift through. awk, sed, Ruby, or Perl are your friends there. 
You can check access times on the logs to make sure you are checking 
today's or yesterday's logs.




b.  Does anyone have any utilities to get statistics from SA?  Such as
what rules triggered spam etc etc.  I have seen some posts with some
interesting looking reports.  Currently I only use a hacked together script
I wrote to give me the raw amount of spam caught per day which greps
"identified spam" on maillog and then gives me a wc -l.



I see that has been answered already.


Once again, thanks so much to everyone.  This group is simply amazing.



I second that!

DAve


Ed




-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 1:19 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:


User validation is going to be tough or all but impossible.  This box
forwards off the mail to an NT box running SL Mail.  There is no easy way


to


get a userlist out of this product.  In addition the users change daily


and

some even use multi-drops.  



You don't need to get a user list, you just need to ask the destination 
server if the user exists before accepting the message. This is what 
milter-ahead does on my MailScanner servers. I process and forward to 
servers running qmail(my toasters) and Exchange, GroupMail, Groupwise, 
Sendmail(my clients servers). All respond correctly to milter-ahead. I 
do not know of a way to duplicate milter-ahead in qmail without 
requiring something like vpopmail or LDAP.


Did you look at using dnscache? That might buy you enough breathing room 
to shop around for a solution to user verification.


DAve





Ed


---

Talk is cheap since supply always exceeds demand.

---


-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:39 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:



[EMAIL PROTECTED] smtpd]# spamassassin --version
SpamAssassin version 3.1.0
running on Perl version 5.8.7


Spamd running with:
  OPTIONS="-L -x -d -u nobody -m 45"

No user verification or RBL at the MTA level.



Absolutely do user verification. I can throw out from 20% to 80% of my 
traffic depending on the current level of dictionary and Joe-Job 
attacks. Since you are processing ahead of your clients Exchange boxes 
I'm not sure how you can do that with qmail. I do it on my gateways 
running MailScanner via milter-ahead, and on my toasters via checkuser 
in vpopmail.


There might be a way to get qmail to check with an Exchange box to 
validate a user without running vpopmail, but I won't know it.


DAve




12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
buff
Swap: 2097136K av,   0K used, 20

Re: General assistance

2006-02-11 Thread Loren Wilton
> b. Does anyone have any utilities to get statistics from SA?  Such as

Can't help you on your first question, but likely someone else can.

On the second question, there are two different stats scripts.  Confusingly
enough they are BOTH named sa_stats.pl.

One is distributed with SA itself.  I forget the directory where it ends up,
but digging for sa_stats.pl should turn it up.

The other one was written by Dallas, and is available on the rulesemporium
website.

I believe both of these just dig through the log to get their answers.

Loren



RE: General assistance

2006-02-11 Thread Ed Russell
I have to say a heartfelt THANK YOU to everyone who contributed to this
thread.  My filter is working 500% more efficient that it ever was.  I have
done the following:

1.  Installed djbdns and I am using dnscache as I was told.  I have
increased the cache size to 100 Megabytes and completely disabled logging
after determining it was working properly.

2.  I have implemented rbl at the MTA level, I use relays.ordb.org and
sbl-xbl.spamhaus.org.

3.  I have implemented Rules Du Jour.  I selected a subset of the SARE
rules and misc others.

4.  I have turned back on pyzor, razor and dcc.

Scanning times are well within tolerance with a minimal impact on delivery
time.  See below (email addresses removed for privacy):

Feb 11 16:10:18 as spamd[4137]: spamd: identified spam (31.3/4.0) for
[EMAIL PROTECTED] :99 in 4.5 seconds, 1178 bytes. 
Feb 11 16:10:18 as spamd[363]: spamd: clean message (1.2/4.0) for
[EMAIL PROTECTED] :99 in 3.1 seconds, 8939 bytes. 
Feb 11 16:10:19 as spamd[4218]: spamd: clean message (0.0/4.0) for
[EMAIL PROTECTED] :99 in 5.4 seconds, 2245 bytes.

I have some final questions though,

a.  Can I get any statistics from rblsmtpd (I know this isn't a group
devoted to it, but I figured I would ask)?  I would like to know how many
got dropped and from where.

b.  Does anyone have any utilities to get statistics from SA?  Such as
what rules triggered spam etc etc.  I have seen some posts with some
interesting looking reports.  Currently I only use a hacked together script
I wrote to give me the raw amount of spam caught per day which greps
"identified spam" on maillog and then gives me a wc -l.

Once again, thanks so much to everyone.  This group is simply amazing.

Ed




-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 1:19 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:
> User validation is going to be tough or all but impossible.  This box
> forwards off the mail to an NT box running SL Mail.  There is no easy way
to
> get a userlist out of this product.  In addition the users change daily
and
> some even use multi-drops.  

You don't need to get a user list, you just need to ask the destination 
server if the user exists before accepting the message. This is what 
milter-ahead does on my MailScanner servers. I process and forward to 
servers running qmail(my toasters) and Exchange, GroupMail, Groupwise, 
Sendmail(my clients servers). All respond correctly to milter-ahead. I 
do not know of a way to duplicate milter-ahead in qmail without 
requiring something like vpopmail or LDAP.

Did you look at using dnscache? That might buy you enough breathing room 
to shop around for a solution to user verification.

DAve



> 
> Ed
> 
> 
> ---
> 
>  Talk is cheap since supply always exceeds demand.
> 
> ---
>  
> 
> -Original Message-
> From: DAve [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 10, 2006 12:39 PM
> To: users@spamassassin.apache.org
> Subject: Re: General assistance
> 
> Ed Russell wrote:
> 
>>[EMAIL PROTECTED] smtpd]# spamassassin --version
>>SpamAssassin version 3.1.0
>>  running on Perl version 5.8.7
>>
>>
>>Spamd running with:
>>OPTIONS="-L -x -d -u nobody -m 45"
>>
>>No user verification or RBL at the MTA level.
> 
> 
> Absolutely do user verification. I can throw out from 20% to 80% of my 
> traffic depending on the current level of dictionary and Joe-Job 
> attacks. Since you are processing ahead of your clients Exchange boxes 
> I'm not sure how you can do that with qmail. I do it on my gateways 
> running MailScanner via milter-ahead, and on my toasters via checkuser 
> in vpopmail.
> 
> There might be a way to get qmail to check with an Exchange box to 
> validate a user without running vpopmail, but I won't know it.
> 
> DAve
> 
> 
>>
>>12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
>>313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
>>CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
>>Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
>>buff
>>Swap: 2097136K av,   0K used, 2097136K free  225380K
>>cached
>>
>>As you can see I have loads of head room as far as memory goes.  I was
>>looking into integrating RBL into Qmail, but with the very high volume I
> 
> am
> 
>>quite concerned that this will introduce a slowdown.  If I increase the
>>inbound concurrent rate I eventually run into qmail-scanner problems with
>>reformime.  Is there anything else I need consider?
>>
>

RE: General assistance

2006-02-11 Thread Ed Russell
You are completely correct, qmail-scanner does use spamc to talk to the
already running spamd.  I just had trouble explaining what the setup was
I may indeed look into having procmail be the agent for Spamassassin.  As
for automatic deletion, well that's a decision we made and for the most part
it works.  We just ensure that we are not too aggressive on the rules.

Ed


-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Saturday, February 11, 2006 12:28 AM
To: users@spamassassin.apache.org
Subject: Re: General assistance

No, Ed, qmail-scanner should not initiate spamd. It should use spamc to
call the already running spamd. I hope that is what you mean. That is
what stood the hairs on end. It made me wonder if you really knew what
was going on. {o.o}

And seriously, if you are using procmail it's perhaps better to fire
off spamc from procmail. That way you can skip SA scanning for some
specific addresses, if you want. Or you can skip SA scanning if the
message size is too big. If you're running procmail anyway it might as
well be the agent for running SpamAssassin. That way you are SURE the
markups are there for when you delegate the spam to /dev/null.

And as a general rule I believe dumping mail to /dev/null is asking for
"I sent you the ebay notifications you needed! I can't help it if your
spam filter deleted them! Why'd you give me a bad review, you [EMAIL PROTECTED]
3-)(#$*&&&!"

{o.o}
- Original Message - 
From: "Ed Russell" <[EMAIL PROTECTED]>
To: 
Sent: 2006 February, 10, Friday 20:11
Subject: RE: General assistance


>I think you are confused as to how I have set this up.  Qmail-scanner is my
> replacement qmail queue.  Qmail simply receives mail from the outside
world,
> then passes it to qmail-scanner for processing.  Qmail-scanner initiates
> spamd which scans the mail and off it goes.  From there procmail will look
> at the mail and determine if the spam status is marked in the header, if
yes
> it kills the mail, if not it passes it along.  Keep in mind no users
> whatsoever live on this box.  It is as I mentioned a pass through filter.

> 
> 
> 
> -Original Message-
> From: jdow [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 10, 2006 10:55 PM
> To: users@spamassassin.apache.org
> Subject: Re: General assistance
> 
> From: "Ed Russell" <[EMAIL PROTECTED]>
> 
>> If everyone would indulge me I would like to put forth the setup I am
>> utilizing and get some feedback.   I have a box that I have been using
for
>> some time which acts as a pass-through filter for many domains (currently
>> about 100) for spam, this is a fairly high traffic server processing
about
>> 150,000 to 200,000 messages per day.  I use the following method.
>> 
>> Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of
RAM.
>> 
>> Qmail runs which accepts the email from the world (with a
>> concurrencyincoming of 100) and passes it through qmail-scanner (which
> calls
>> spamd) and spamassassin which checks the email and writes spam status to
> the
>> header.  Each message gets then passed through a procmail filter which
> will
>> delete it if it is spam.  The procmail filter is:
> 
> I note the other answers and thought I'd comment because the above
> description of your mail topology raised the hairs on the back of my
> neck. (And that takes doing considering their length. {^_-})
> 
> First I not you say Qmail (it's own punishment) feets qmail-scanner.
> The qmail-scanner calls spamd? Naw, can't DO that. AND it calls
> spamassassin? That's even stranger. But then it goes to procmail for
> the delivery.
> 
> My topology is somewhat different but useful. If you are using
qmail-scanner
> only to make the spamassassin run and the procmail run then jettison it
> and go to procmail directly. That MAY reduce the machine load a little.
> Also make sure spamd is running, exactly once, from your /etc/init.d
> files or the equivalent on BSDs. You'd then use spamc to get to the
> SpamAssassin run. You show some data below. (I am not sure what the
> EXITCODE is supposed to do for you. I never set it here. But that may
> be because I use procmail alone. It exits and mail is "delivered" either
> to a diversion directory, /dev/null, or the user's mailbox.)
> 
> Anyway, you can call spamc from inside procmail this way:
> 
> :0
> * < 50
> * !^List-Id: .*(spamassassin\.apache.\org)
> | /usr/bin/spamc -t 150 -u $USER
> 
>> :0
>> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
>> {
>>EXITCODE=99
>>:0
>>/dev/null
>> }
>> 
>> :0
>> * ^X-Spam-Status: Yes
>>

Re: General assistance

2006-02-11 Thread Michael Monnerie
On Freitag, 10. Februar 2006 22:42 Ed Russell wrote:
> I was doing some reading and I am beginning to look into Rules Du
> Jour.  I see there are quite a large number of rulesets to choose
> from when utilizing this.  Does anyone have any advice on what ones
> would be safe?

I use those:
SARE_ADULT
SARE_OBFU0 
SARE_OBFU1 
SARE_URI0 
SARE_REDIRECT_POST300 
SARE_HTML0 
SARE_HEADER0 
SARE_SPECIFIC 
SARE_BML 
SARE_FRAUD 
SARE_SPOOF 
SARE_GENLSUBJ0 
SARE_UNSUB 
SARE_WHITELIST_RCVD 
SARE_WHITELIST_SPF 
ZMI_GERMAN

The last one being specific for german language SPAM. 

Additionaly, I use the blacklist by William Stearns for postfix, running 
a cron job: rsync -qL 
rsync.sa-blacklist.stearns.org::wstearns/sa-blacklist/sa-blacklist.current.reject
 /etc/postfix/sender_blacklist ; 
postmap /etc/postfix/sender_blacklist

That's much better than the blacklist by SARE, as it's less memory 
consuming and faster - a drop by MTA is generally faster than handing 
it over to SA.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at   Tel: 0660/4156531  Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879


pgp087jPloNCP.pgp
Description: PGP signature


Re: General assistance

2006-02-11 Thread Michael Monnerie
On Freitag, 10. Februar 2006 19:32 Ed Russell wrote:
> 1.  Does anyone have an opinion as to what RBL to contact?  I
> know there are quite a few.

sbl-xbl.spamhaus.org, multi.surbl.org, safe.dnsbl.sorbs.net, 
dnsbl.njabl.org, bl.spamcop.net, relays.ordb.org

I use those at MTA level. That dropped 62.000 messages, and only 378 
spams were detected by SA during that time. I guess that saved a lot of 
CPU.

Since you seem to have a problem with DNS queries ("if I disable RBL 
checks and razor, pyzor and dcc the delay goes away"), I would suggest:

- make RBL checks at the MTA already
- get permission from RBL maintainers to make a zone transfer to your 
box, and run a local named or whatever. By that, you only have local 
DNS queries, that should help a lot.

> 2.  Once this is in place should I re-activate pzyor, dcc or
> razor?  Is one better than the other?  Are there advantages to
> either?

Each of them are different, altogether they help a lot. I use all of 
them, but I'm not in a situation where I have problems with delay. 
First try RBL at MTA, and possibly you have enough CPU cycles left then 
to reactivate that checks.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at   Tel: 0660/4156531  Linux 2.6.11
// PGP Key:   "lynx -source http://zmi.at/zmi2.asc | gpg --import"
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879


pgp3dyy87IfpR.pgp
Description: PGP signature


Re: General assistance

2006-02-10 Thread jdow

No, Ed, qmail-scanner should not initiate spamd. It should use spamc to
call the already running spamd. I hope that is what you mean. That is
what stood the hairs on end. It made me wonder if you really knew what
was going on. {o.o}

And seriously, if you are using procmail it's perhaps better to fire
off spamc from procmail. That way you can skip SA scanning for some
specific addresses, if you want. Or you can skip SA scanning if the
message size is too big. If you're running procmail anyway it might as
well be the agent for running SpamAssassin. That way you are SURE the
markups are there for when you delegate the spam to /dev/null.

And as a general rule I believe dumping mail to /dev/null is asking for
"I sent you the ebay notifications you needed! I can't help it if your
spam filter deleted them! Why'd you give me a bad review, you [EMAIL PROTECTED]
3-)(#$*&&&!"

{o.o}
- Original Message - 
From: "Ed Russell" <[EMAIL PROTECTED]>

To: 
Sent: 2006 February, 10, Friday 20:11
Subject: RE: General assistance



I think you are confused as to how I have set this up.  Qmail-scanner is my
replacement qmail queue.  Qmail simply receives mail from the outside world,
then passes it to qmail-scanner for processing.  Qmail-scanner initiates
spamd which scans the mail and off it goes.  From there procmail will look
at the mail and determine if the spam status is marked in the header, if yes
it kills the mail, if not it passes it along.  Keep in mind no users
whatsoever live on this box.  It is as I mentioned a pass through filter.  




-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 10:55 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

From: "Ed Russell" <[EMAIL PROTECTED]>


If everyone would indulge me I would like to put forth the setup I am
utilizing and get some feedback.   I have a box that I have been using for
some time which acts as a pass-through filter for many domains (currently
about 100) for spam, this is a fairly high traffic server processing about
150,000 to 200,000 messages per day.  I use the following method.

Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.

Qmail runs which accepts the email from the world (with a
concurrencyincoming of 100) and passes it through qmail-scanner (which

calls

spamd) and spamassassin which checks the email and writes spam status to

the

header.  Each message gets then passed through a procmail filter which

will

delete it if it is spam.  The procmail filter is:


I note the other answers and thought I'd comment because the above
description of your mail topology raised the hairs on the back of my
neck. (And that takes doing considering their length. {^_-})

First I not you say Qmail (it's own punishment) feets qmail-scanner.
The qmail-scanner calls spamd? Naw, can't DO that. AND it calls
spamassassin? That's even stranger. But then it goes to procmail for
the delivery.

My topology is somewhat different but useful. If you are using qmail-scanner
only to make the spamassassin run and the procmail run then jettison it
and go to procmail directly. That MAY reduce the machine load a little.
Also make sure spamd is running, exactly once, from your /etc/init.d
files or the equivalent on BSDs. You'd then use spamc to get to the
SpamAssassin run. You show some data below. (I am not sure what the
EXITCODE is supposed to do for you. I never set it here. But that may
be because I use procmail alone. It exits and mail is "delivered" either
to a diversion directory, /dev/null, or the user's mailbox.)

Anyway, you can call spamc from inside procmail this way:

:0
* < 50
* !^List-Id: .*(spamassassin\.apache.\org)
| /usr/bin/spamc -t 150 -u $USER


:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
{
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^X-Spam-Status: Yes
{
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^^rom[ ]
{
 LOG="*** Dropped F off From_ header! Fixing up. "
 
 :0 fhw

 | sed -e '1s/^/F/'
}

:0
/dev/null

Mail that is clean gets passed off to a second qmail install which then
delivers the mail to our servers using smtproutes.


Ouch. And what is that final redirect of EVERYTHING to /dev/null? I just
let procmail deliver it.

{o.o}




RE: General assistance

2006-02-10 Thread Ed Russell
I think you are confused as to how I have set this up.  Qmail-scanner is my
replacement qmail queue.  Qmail simply receives mail from the outside world,
then passes it to qmail-scanner for processing.  Qmail-scanner initiates
spamd which scans the mail and off it goes.  From there procmail will look
at the mail and determine if the spam status is marked in the header, if yes
it kills the mail, if not it passes it along.  Keep in mind no users
whatsoever live on this box.  It is as I mentioned a pass through filter.  



-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 10:55 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

From: "Ed Russell" <[EMAIL PROTECTED]>

> If everyone would indulge me I would like to put forth the setup I am
> utilizing and get some feedback.   I have a box that I have been using for
> some time which acts as a pass-through filter for many domains (currently
> about 100) for spam, this is a fairly high traffic server processing about
> 150,000 to 200,000 messages per day.  I use the following method.
> 
> Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.
> 
> Qmail runs which accepts the email from the world (with a
> concurrencyincoming of 100) and passes it through qmail-scanner (which
calls
> spamd) and spamassassin which checks the email and writes spam status to
the
> header.  Each message gets then passed through a procmail filter which
will
> delete it if it is spam.  The procmail filter is:

I note the other answers and thought I'd comment because the above
description of your mail topology raised the hairs on the back of my
neck. (And that takes doing considering their length. {^_-})

First I not you say Qmail (it's own punishment) feets qmail-scanner.
The qmail-scanner calls spamd? Naw, can't DO that. AND it calls
spamassassin? That's even stranger. But then it goes to procmail for
the delivery.

My topology is somewhat different but useful. If you are using qmail-scanner
only to make the spamassassin run and the procmail run then jettison it
and go to procmail directly. That MAY reduce the machine load a little.
Also make sure spamd is running, exactly once, from your /etc/init.d
files or the equivalent on BSDs. You'd then use spamc to get to the
SpamAssassin run. You show some data below. (I am not sure what the
EXITCODE is supposed to do for you. I never set it here. But that may
be because I use procmail alone. It exits and mail is "delivered" either
to a diversion directory, /dev/null, or the user's mailbox.)

Anyway, you can call spamc from inside procmail this way:

:0
* < 50
* !^List-Id: .*(spamassassin\.apache.\org)
| /usr/bin/spamc -t 150 -u $USER

> :0
> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
> {
>EXITCODE=99
>:0
>/dev/null
> }
> 
> :0
> * ^X-Spam-Status: Yes
> {
>EXITCODE=99
>:0
>/dev/null
> }
> 
> :0
> * ^^rom[ ]
> {
>  LOG="*** Dropped F off From_ header! Fixing up. "
>  
>  :0 fhw
>  | sed -e '1s/^/F/'
> }
> 
> :0
> /dev/null
> 
> Mail that is clean gets passed off to a second qmail install which then
> delivers the mail to our servers using smtproutes.

Ouch. And what is that final redirect of EVERYTHING to /dev/null? I just
let procmail deliver it.

{o.o}





Re: General assistance

2006-02-10 Thread jdow

From: "Ed Russell" <[EMAIL PROTECTED]>


If everyone would indulge me I would like to put forth the setup I am
utilizing and get some feedback.   I have a box that I have been using for
some time which acts as a pass-through filter for many domains (currently
about 100) for spam, this is a fairly high traffic server processing about
150,000 to 200,000 messages per day.  I use the following method.

Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.

Qmail runs which accepts the email from the world (with a
concurrencyincoming of 100) and passes it through qmail-scanner (which calls
spamd) and spamassassin which checks the email and writes spam status to the
header.  Each message gets then passed through a procmail filter which will
delete it if it is spam.  The procmail filter is:


I note the other answers and thought I'd comment because the above
description of your mail topology raised the hairs on the back of my
neck. (And that takes doing considering their length. {^_-})

First I not you say Qmail (it's own punishment) feets qmail-scanner.
The qmail-scanner calls spamd? Naw, can't DO that. AND it calls
spamassassin? That's even stranger. But then it goes to procmail for
the delivery.

My topology is somewhat different but useful. If you are using qmail-scanner
only to make the spamassassin run and the procmail run then jettison it
and go to procmail directly. That MAY reduce the machine load a little.
Also make sure spamd is running, exactly once, from your /etc/init.d
files or the equivalent on BSDs. You'd then use spamc to get to the
SpamAssassin run. You show some data below. (I am not sure what the
EXITCODE is supposed to do for you. I never set it here. But that may
be because I use procmail alone. It exits and mail is "delivered" either
to a diversion directory, /dev/null, or the user's mailbox.)

Anyway, you can call spamc from inside procmail this way:

:0
* < 50
* !^List-Id: .*(spamassassin\.apache.\org)
| /usr/bin/spamc -t 150 -u $USER


:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
{
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^X-Spam-Status: Yes
{
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^^rom[ ]
{
 LOG="*** Dropped F off From_ header! Fixing up. "
 
 :0 fhw

 | sed -e '1s/^/F/'
}

:0
/dev/null

Mail that is clean gets passed off to a second qmail install which then
delivers the mail to our servers using smtproutes.


Ouch. And what is that final redirect of EVERYTHING to /dev/null? I just
let procmail deliver it.

{o.o}



Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

I was doing some reading and I am beginning to look into Rules Du Jour.  I
see there are quite a large number of rulesets to choose from when utilizing
this.  Does anyone have any advice on what ones would be safe?



My experience with SARE has been they try very hard to classify their 
rules based on their ability to hit spam correctly, and ham incorrectly. 
After the first year using SARE I now just trust in their judgment ;^).


I generally get the zero rules when there is a choice (rules that hit 
only spam in testing, named with a zero) and try them first. I choose 
the rules based on the spam I am seeing slip through. I generally never 
adjust their assigned points either.


These have always been good performers for me.
70_sare_html0.cf
70_sare_adult.cf
70_sare_oem.cf
70_sare_obfu.cf

This one is proving useful over the past few weeks,
70_sare_stocks.cf

I almost always grab any new rules announced and give them a try for a 
few days as well.


DAve


Ed


---

 Talk is cheap since supply always exceeds demand.

---
 


-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 4:30 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Bowie Bailey wrote:


DAve wrote:



Ed Russell wrote:


2.	Once this is in place should I re-activate pzyor, dcc or razor? 
Is one better than the other?  Are there advantages to either?


I use neither, though I think I am in the minority. I routinely check
my spam and I have found that bayes, rayzor, dcc, and most of the
SARE rules catch little if any spam "for me". So I don't run them and
save the CPU for additional spamd processes.



That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out my
stats from today:

TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

  1RAZOR2_CF_RANGE_51_100   1280 5.02   48.05   83.33
0.98
  2RAZOR2_CHECK 1259 4.94   47.26   81.97
1.15
  3RAZOR2_CF_RANGE_E8_51_1001164 4.56   43.69   75.78
0.27






Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.




They tagged plenty of spam for me, no doubt about that. But they caught 
only a few spam that SA wouldn't have caught without them. It is rare 
that bayes points on top of existing points ever made the score squeek 
over the threshold.


Not using them however, dropped my CPU, network, and memory requirements 
so much I could run twice as many spamd processes. Processing time went 
from an average of 10 seconds (with all SARE rules, bayes, DCC, Razor) 
to 2 seconds (limited SARE, no bayes, no razor, no dcc).


All the SARE rules loaded makes spamd run about 45-75mb each, selective 
SARE rules and I can see spamd drop to 23-35mb. More spamd, faster spamd.


Of course tommorrow, everything could change ;^)

DAve









Re: General assistance

2006-02-10 Thread DAve

Joey wrote:

Dave,

What paramters are you using for logging with the caching name server?
I currently use this:

logging {
category lame-servers { null; };
};

Thanks,

Joey



I was speaking of dnscache, the program, not dnscache as in "a cacheing 
DNS server". See http://cr.yp.to/djbdns.html. It can log in a very 
verbose way, generating gigabytes of log files a day.


What you have above for Bind.

DAve




-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:28 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:


If everyone would indulge me I would like to put forth the setup I am
utilizing and get some feedback.   I have a box that I have been using for
some time which acts as a pass-through filter for many domains 
(currently about 100) for spam, this is a fairly high traffic server 
processing about 150,000 to 200,000 messages per day.  I use the following


method.


Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.

Qmail runs which accepts the email from the world (with a 
concurrencyincoming of 100) and passes it through qmail-scanner (which 
calls
spamd) and spamassassin which checks the email and writes spam status 
to the header.  Each message gets then passed through a procmail 
filter which will delete it if it is spam.  The procmail filter is:


:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* {
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^X-Spam-Status: Yes
{
   EXITCODE=99
   :0
   /dev/null
}

:0
* ^^rom[ ]
{
 LOG="*** Dropped F off From_ header! Fixing up. "
 
 :0 fhw

 | sed -e '1s/^/F/'
}

:0
/dev/null

Mail that is clean gets passed off to a second qmail install which 
then delivers the mail to our servers using smtproutes.


This has been working fine for a few years now, but recently we have 
experienced major delays in the processing of email.  Due to the very 
high volume pretty much all the time the system is handling 100 
concurrent incoming pieces of email.  Of course with everything else 
going on it is not uncommon for this system to have up to 400 
processes running.  Sometimes mail can take hours to get through to 
its destination.  What I have discovered is that if I disable RBL 
checks and razor, pyzor and dcc the delay goes away.  However, the


effectiveness of the filter reduces.

Am I completely off base in the way I have this all setup?  I have 
went with a higher speed HD to increase the threshold on file I/O.  
Can I tune the performance of razor etc while maintaining delivery 
time?  Is there anything else I should be considering?  If I have not 
explained things well or more information is needed I will certainly


provide anything.


Thanks



Since you are running qmail, consider doing your rbl checks in qmail-smtpd.
No sense scanning a message if you can drop it at the door first.

Also, are your running dnscache? I run dnscache on all my servers, web,
webmail, toasters, etc. It can speed things up considerably as it will cache
your RBL lookups, SURBL lookups, etc. It's a nice thing to do for the URIBL
and SURBL folks too.

If you do run dnscache, consider turning logging off once you are configured
and satisfied it works as intended. dnscache can keep a disk pretty busy
with it's potential to log a lot of data.

DAve








Re: General assistance

2006-02-10 Thread Mike Jackson

I was doing some reading and I am beginning to look into Rules Du Jour.  I
see there are quite a large number of rulesets to choose from when 
utilizing

this.  Does anyone have any advice on what ones would be safe?


I use these:

SARE_ADULT
SARE_BAYES_POISON_NXM
SARE_FRAUD
SARE_HEADER0
SARE_HEADER1
SARE_HTML0
SARE_OBFU0
SARE_OEM
SARE_RANDOM
SARE_REDIRECT_POST300
SARE_SPAMCOP_TOP200
SARE_SPECIFIC
SARE_SPOOF
SARE_STOCKS
SARE_WHITELIST_RCVD
SARE_WHITELIST_SPF

This is on a server with 165 domains and several hundred users. In this 
environment, I'd rather create false negatives than false positives, hence 
my choice of rulesets. On my box at home, where it's just me and my wife 
receiving mail, I add these as well:


BOGUSVIRUS
SARE_BML
SARE_EVILNUMBERS0
SARE_GENLSUBJ0
SARE_URI0
TRIPWIRE

On the work box, I use the SBL/XBL lists from Spamhaus and 
bogusmx.rfc-ignorant.org at the MTA level. At home, I add 
dynablock.njabl.org, dsn.rfc-ignorant.org, blackholes.mail-abuse.org, 
relays.mail-abuse.org, dialups.mail-abuse.org, and ws.surbl.org, pretty much 
in that order. The lower ranked ones rarely trigger (and they're probably 
redundant, but I don't really care). 



RE: General assistance

2006-02-10 Thread Joey
Dave,

What paramters are you using for logging with the caching name server?
I currently use this:

logging {
category lame-servers { null; };
};

Thanks,

Joey
  

-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:28 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:
> If everyone would indulge me I would like to put forth the setup I am
> utilizing and get some feedback.   I have a box that I have been using for
> some time which acts as a pass-through filter for many domains 
> (currently about 100) for spam, this is a fairly high traffic server 
> processing about 150,000 to 200,000 messages per day.  I use the following
method.
> 
> Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.
> 
> Qmail runs which accepts the email from the world (with a 
> concurrencyincoming of 100) and passes it through qmail-scanner (which 
> calls
> spamd) and spamassassin which checks the email and writes spam status 
> to the header.  Each message gets then passed through a procmail 
> filter which will delete it if it is spam.  The procmail filter is:
> 
> :0
> * ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\* {
> EXITCODE=99
> :0
> /dev/null
> }
> 
> :0
> * ^X-Spam-Status: Yes
> {
> EXITCODE=99
> :0
> /dev/null
> }
> 
> :0
> * ^^rom[ ]
> {
>   LOG="*** Dropped F off From_ header! Fixing up. "
>   
>   :0 fhw
>   | sed -e '1s/^/F/'
> }
> 
> :0
> /dev/null
> 
> Mail that is clean gets passed off to a second qmail install which 
> then delivers the mail to our servers using smtproutes.
> 
> This has been working fine for a few years now, but recently we have 
> experienced major delays in the processing of email.  Due to the very 
> high volume pretty much all the time the system is handling 100 
> concurrent incoming pieces of email.  Of course with everything else 
> going on it is not uncommon for this system to have up to 400 
> processes running.  Sometimes mail can take hours to get through to 
> its destination.  What I have discovered is that if I disable RBL 
> checks and razor, pyzor and dcc the delay goes away.  However, the
effectiveness of the filter reduces.
> 
> Am I completely off base in the way I have this all setup?  I have 
> went with a higher speed HD to increase the threshold on file I/O.  
> Can I tune the performance of razor etc while maintaining delivery 
> time?  Is there anything else I should be considering?  If I have not 
> explained things well or more information is needed I will certainly
provide anything.
> 
> Thanks

Since you are running qmail, consider doing your rbl checks in qmail-smtpd.
No sense scanning a message if you can drop it at the door first.

Also, are your running dnscache? I run dnscache on all my servers, web,
webmail, toasters, etc. It can speed things up considerably as it will cache
your RBL lookups, SURBL lookups, etc. It's a nice thing to do for the URIBL
and SURBL folks too.

If you do run dnscache, consider turning logging off once you are configured
and satisfied it works as intended. dnscache can keep a disk pretty busy
with it's potential to log a lot of data.

DAve




RE: General assistance

2006-02-10 Thread Ed Russell
I was doing some reading and I am beginning to look into Rules Du Jour.  I
see there are quite a large number of rulesets to choose from when utilizing
this.  Does anyone have any advice on what ones would be safe?

Ed


---

 Talk is cheap since supply always exceeds demand.

---
 

-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 4:30 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

Bowie Bailey wrote:
> DAve wrote:
> 
>>Ed Russell wrote:
>>
>>>2.   Once this is in place should I re-activate pzyor, dcc or razor? 
>>>Is one better than the other?  Are there advantages to either?
>>
>>I use neither, though I think I am in the minority. I routinely check
>>  my spam and I have found that bayes, rayzor, dcc, and most of the
>>SARE rules catch little if any spam "for me". So I don't run them and
>>save the CPU for additional spamd processes.
> 
> 
> That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out my
> stats from today:
> 
> TOP SPAM RULES FIRED
> 
> RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
> %OFHAM
> 
>1RAZOR2_CF_RANGE_51_100   1280 5.02   48.05   83.33
> 0.98
>2RAZOR2_CHECK 1259 4.94   47.26   81.97
> 1.15
>3RAZOR2_CF_RANGE_E8_51_1001164 4.56   43.69   75.78
> 0.27

> 
> 
> Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.
> 

They tagged plenty of spam for me, no doubt about that. But they caught 
only a few spam that SA wouldn't have caught without them. It is rare 
that bayes points on top of existing points ever made the score squeek 
over the threshold.

Not using them however, dropped my CPU, network, and memory requirements 
so much I could run twice as many spamd processes. Processing time went 
from an average of 10 seconds (with all SARE rules, bayes, DCC, Razor) 
to 2 seconds (limited SARE, no bayes, no razor, no dcc).

All the SARE rules loaded makes spamd run about 45-75mb each, selective 
SARE rules and I can see spamd drop to 23-35mb. More spamd, faster spamd.

Of course tommorrow, everything could change ;^)

DAve





Re: General assistance

2006-02-10 Thread DAve

Bowie Bailey wrote:

DAve wrote:


Ed Russell wrote:

2.	Once this is in place should I re-activate pzyor, dcc or razor? 
Is one better than the other?  Are there advantages to either?


I use neither, though I think I am in the minority. I routinely check
 my spam and I have found that bayes, rayzor, dcc, and most of the
SARE rules catch little if any spam "for me". So I don't run them and
save the CPU for additional spamd processes.



That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out my
stats from today:

TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1RAZOR2_CF_RANGE_51_100   1280 5.02   48.05   83.33
0.98
   2RAZOR2_CHECK 1259 4.94   47.26   81.97
1.15
   3RAZOR2_CF_RANGE_E8_51_1001164 4.56   43.69   75.78
0.27





Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.



They tagged plenty of spam for me, no doubt about that. But they caught 
only a few spam that SA wouldn't have caught without them. It is rare 
that bayes points on top of existing points ever made the score squeek 
over the threshold.


Not using them however, dropped my CPU, network, and memory requirements 
so much I could run twice as many spamd processes. Processing time went 
from an average of 10 seconds (with all SARE rules, bayes, DCC, Razor) 
to 2 seconds (limited SARE, no bayes, no razor, no dcc).


All the SARE rules loaded makes spamd run about 45-75mb each, selective 
SARE rules and I can see spamd drop to 23-35mb. More spamd, faster spamd.


Of course tommorrow, everything could change ;^)

DAve




RE: General assistance

2006-02-10 Thread Bowie Bailey
DAve wrote:
> Ed Russell wrote:
> > 
> > 2.  Once this is in place should I re-activate pzyor, dcc or razor? 
> > Is one better than the other?  Are there advantages to either?
> 
> I use neither, though I think I am in the minority. I routinely check
>   my spam and I have found that bayes, rayzor, dcc, and most of the
> SARE rules catch little if any spam "for me". So I don't run them and
> save the CPU for additional spamd processes.

That's odd.  Bayes, Razor2, DCC work quite well for me.  Check out my
stats from today:

TOP SPAM RULES FIRED

RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM
%OFHAM

   1RAZOR2_CF_RANGE_51_100   1280 5.02   48.05   83.33
0.98
   2RAZOR2_CHECK 1259 4.94   47.26   81.97
1.15
   3RAZOR2_CF_RANGE_E8_51_1001164 4.56   43.69   75.78
0.27
   4URIBL_BLACK  1147 4.50   43.06   74.67
0.44
   5HTML_MESSAGE 1071 4.20   40.20   69.73
44.50
   6DCC_CHECK1046 4.10   39.26   68.10
6.56
   7BAYES_99  985 3.86   36.97   64.13
0.44
   8DIGEST_MULTIPLE   937 3.67   35.17   61.00
0.35
   9URIBL_JP_SURBL927 3.63   34.80   60.35
0.09
  10URIBL_SBL 903 3.54   33.90   58.79
0.35
  11URIBL_WS_SURBL797 3.12   29.92   51.89
0.27
  12RCVD_IN_XBL   719 2.82   26.99   46.81
0.00
  13RCVD_IN_BL_SPAMCOP_NET669 2.62   25.11   43.55
0.98
  14URIBL_OB_SURBL653 2.56   24.51   42.51
0.09
  15URIBL_SC_SURBL552 2.16   20.72   35.94
0.00
  16RAZOR2_CF_RANGE_E4_51_100 550 2.16   20.65   35.81
0.71
  17RCVD_IN_SORBS_DUL 448 1.76   16.82   29.17
0.27
  18MIME_HTML_ONLY438 1.72   16.44   28.52
7.18
  19RCVD_IN_NJABL_DUL 348 1.36   13.06   22.66
0.27
  20RCVD_IN_SBL   330 1.29   12.39   21.48
0.09


Razor2 caught 83% of the spam, DCC caught 68%, and Bayes got 64%.

> Bottom line, my clients would rather have 95% of the spam stopped and
> a 20 second delivery time than 100% of spam caught and a two minute
> delivery time. As always ;^) YMMV. Setup a honeypot account and check
> it's contents daily. That will tell you if the choices you make are
> correct or not.
> 
> DAve
> 
> PS. While bayes/rayzor/dcc don't provide a benefit for me, I find
> URIBL and SURBL are responsible for catching at the very least 70% of
> my spam and at times 90%+. I also move SARE rules and custom rules in
> and out weekly, depends on the type of traffic I see. Right now
> SARE_OEM and SARE_STOCK are helping out. Next week it might be
> SARE_ADULT. 

Agreed on URIBL and SURBL.  Both of those have good showings in my stats
as well.

I don't swap out the SARE rules.  I use most of them and just let them
run.  My server doesn't see quite enough traffic for it to create a
problem.  They don't catch as much as the net rules, but they do help
out from time to time.

-- 
Bowie


Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

My homework is:

1.  Install and configure dnscache.
2.  Look into RBL at the MTA.
3.  Begin to investigate user authentication at the MTA.

Some questions,

1.  Does anyone have an opinion as to what RBL to contact?  I know there
are quite a few.


I have tried several with different levels of success. We have clients 
who get a lot of mail from self-administered servers on DSL, Pacific 
Rim, Eastern Europe, etc. So I have to be careful what RBL I choose.


I have been using http://ordb.org and http://sbl-xbl.spamhaus.org for 
the past year and have no complaints (I should say my clients have no 
complaints). I believe RBLs are like spam rules and whats works for me 
may not work for you. Your mail will determine if a particular RBL is a 
good fit or not.




2.  Once this is in place should I re-activate pzyor, dcc or razor?  Is
one better than the other?  Are there advantages to either?


I use neither, though I think I am in the minority. I routinely check my 
 spam and I have found that bayes, rayzor, dcc, and most of the SARE 
rules catch little if any spam "for me". So I don't run them and save 
the CPU for additional spamd processes.


Bottom line, my clients would rather have 95% of the spam stopped and a 
20 second delivery time than 100% of spam caught and a two minute 
delivery time. As always ;^) YMMV. Setup a honeypot account and check 
it's contents daily. That will tell you if the choices you make are 
correct or not.


DAve

PS. While bayes/rayzor/dcc don't provide a benefit for me, I find URIBL 
and SURBL are responsible for catching at the very least 70% of my spam 
and at times 90%+. I also move SARE rules and custom rules in and out 
weekly, depends on the type of traffic I see. Right now SARE_OEM and 
SARE_STOCK are helping out. Next week it might be SARE_ADULT.




-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 1:19 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:


User validation is going to be tough or all but impossible.  This box
forwards off the mail to an NT box running SL Mail.  There is no easy way


to


get a userlist out of this product.  In addition the users change daily


and

some even use multi-drops.  



You don't need to get a user list, you just need to ask the destination 
server if the user exists before accepting the message. This is what 
milter-ahead does on my MailScanner servers. I process and forward to 
servers running qmail(my toasters) and Exchange, GroupMail, Groupwise, 
Sendmail(my clients servers). All respond correctly to milter-ahead. I 
do not know of a way to duplicate milter-ahead in qmail without 
requiring something like vpopmail or LDAP.


Did you look at using dnscache? That might buy you enough breathing room 
to shop around for a solution to user verification.


DAve





Ed


---

Talk is cheap since supply always exceeds demand.

---


-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:39 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:



[EMAIL PROTECTED] smtpd]# spamassassin --version
SpamAssassin version 3.1.0
running on Perl version 5.8.7


Spamd running with:
  OPTIONS="-L -x -d -u nobody -m 45"

No user verification or RBL at the MTA level.



Absolutely do user verification. I can throw out from 20% to 80% of my 
traffic depending on the current level of dictionary and Joe-Job 
attacks. Since you are processing ahead of your clients Exchange boxes 
I'm not sure how you can do that with qmail. I do it on my gateways 
running MailScanner via milter-ahead, and on my toasters via checkuser 
in vpopmail.


There might be a way to get qmail to check with an Exchange box to 
validate a user without running vpopmail, but I won't know it.


DAve




12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
buff
Swap: 2097136K av,   0K used, 2097136K free  225380K
cached

As you can see I have loads of head room as far as memory goes.  I was
looking into integrating RBL into Qmail, but with the very high volume I


am



quite concerned that this will introduce a slowdown.  If I increase the
inbound concurrent rate I eventually run into qmail-scanner problems with
reformime.  Is there anything else I need consider?

Ed

---

Talk is cheap since supply always exceeds demand.

---


-Original Message-
From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
Se

RE: General assistance

2006-02-10 Thread Matthew.van.Eerde
Ed Russell wrote:
> 1.Does anyone have an opinion as to what RBL to contact?  I know
> there are quite a few.

openrbl.org has a reasonably comprehensive list.

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


RE: General assistance

2006-02-10 Thread Kristopher Austin
> -Original Message-
> From: Ed Russell [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 10, 2006 12:32 PM
> To: users@spamassassin.apache.org
> Subject: RE: General assistance
> 
> My homework is:
> 
> 1.Install and configure dnscache.
> 2.Look into RBL at the MTA.
> 3.Begin to investigate user authentication at the MTA.
> 
> Some questions,
> 
> 1.Does anyone have an opinion as to what RBL to contact?  I know
there
> are quite a few.
>

We use sbl-xbl.spamhaus.org and I know a lot of others on this list do
the same.  However, I do know that there are FPs mentioned on this list
concerning this RBL.  I have never encountered one.  It is a popular
enough 
list that if someone is on it they usually work quickly to get off of
it.

If there were any list to choose that most people probably use SBL+XBL
is definitely it. Go to http://www.spamhaus.org for more info.

Kris 


RE: General assistance

2006-02-10 Thread Ed Russell
My homework is:

1.  Install and configure dnscache.
2.  Look into RBL at the MTA.
3.  Begin to investigate user authentication at the MTA.

Some questions,

1.  Does anyone have an opinion as to what RBL to contact?  I know there
are quite a few.

2.  Once this is in place should I re-activate pzyor, dcc or razor?  Is
one better than the other?  Are there advantages to either?

-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 1:19 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:
> User validation is going to be tough or all but impossible.  This box
> forwards off the mail to an NT box running SL Mail.  There is no easy way
to
> get a userlist out of this product.  In addition the users change daily
and
> some even use multi-drops.  

You don't need to get a user list, you just need to ask the destination 
server if the user exists before accepting the message. This is what 
milter-ahead does on my MailScanner servers. I process and forward to 
servers running qmail(my toasters) and Exchange, GroupMail, Groupwise, 
Sendmail(my clients servers). All respond correctly to milter-ahead. I 
do not know of a way to duplicate milter-ahead in qmail without 
requiring something like vpopmail or LDAP.

Did you look at using dnscache? That might buy you enough breathing room 
to shop around for a solution to user verification.

DAve



> 
> Ed
> 
> 
> ---
> 
>  Talk is cheap since supply always exceeds demand.
> 
> ---
>  
> 
> -Original Message-
> From: DAve [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 10, 2006 12:39 PM
> To: users@spamassassin.apache.org
> Subject: Re: General assistance
> 
> Ed Russell wrote:
> 
>>[EMAIL PROTECTED] smtpd]# spamassassin --version
>>SpamAssassin version 3.1.0
>>  running on Perl version 5.8.7
>>
>>
>>Spamd running with:
>>OPTIONS="-L -x -d -u nobody -m 45"
>>
>>No user verification or RBL at the MTA level.
> 
> 
> Absolutely do user verification. I can throw out from 20% to 80% of my 
> traffic depending on the current level of dictionary and Joe-Job 
> attacks. Since you are processing ahead of your clients Exchange boxes 
> I'm not sure how you can do that with qmail. I do it on my gateways 
> running MailScanner via milter-ahead, and on my toasters via checkuser 
> in vpopmail.
> 
> There might be a way to get qmail to check with an Exchange box to 
> validate a user without running vpopmail, but I won't know it.
> 
> DAve
> 
> 
>>
>>12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
>>313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
>>CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
>>Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
>>buff
>>Swap: 2097136K av,   0K used, 2097136K free  225380K
>>cached
>>
>>As you can see I have loads of head room as far as memory goes.  I was
>>looking into integrating RBL into Qmail, but with the very high volume I
> 
> am
> 
>>quite concerned that this will introduce a slowdown.  If I increase the
>>inbound concurrent rate I eventually run into qmail-scanner problems with
>>reformime.  Is there anything else I need consider?
>>
>>Ed
>>
>>---
>>
>> Talk is cheap since supply always exceeds demand.
>>
>>---
>> 
>>
>>-Original Message-
>>From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
>>Sent: Friday, February 10, 2006 12:06 PM
>>To: [EMAIL PROTECTED]; users@spamassassin.apache.org
>>Subject: RE: General assistance
>>
>>
>>
>>>-Original Message-
>>>From: Ed Russell [mailto:[EMAIL PROTECTED]
>>>Sent: Friday, February 10, 2006 10:51 AM
>>>To: users@spamassassin.apache.org
>>>Subject: General assistance
>>>
>>>Am I completely off base in the way I have this all setup?  I have
>>
>>went
>>
>>
>>>with
>>>a higher speed HD to increase the threshold on file I/O.  Can I tune
>>
>>the
>>
>>
>>>performance of razor etc while maintaining delivery time?  Is there
>>>anything
>>>else I should be considering?  If I have not explained things well or
>>
>>more
>>
>>
>>>information is needed I will certainly provide anything.
>>>
>>
>>
>>A few quest

Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

User validation is going to be tough or all but impossible.  This box
forwards off the mail to an NT box running SL Mail.  There is no easy way to
get a userlist out of this product.  In addition the users change daily and
some even use multi-drops.  


You don't need to get a user list, you just need to ask the destination 
server if the user exists before accepting the message. This is what 
milter-ahead does on my MailScanner servers. I process and forward to 
servers running qmail(my toasters) and Exchange, GroupMail, Groupwise, 
Sendmail(my clients servers). All respond correctly to milter-ahead. I 
do not know of a way to duplicate milter-ahead in qmail without 
requiring something like vpopmail or LDAP.


Did you look at using dnscache? That might buy you enough breathing room 
to shop around for a solution to user verification.


DAve





Ed


---

 Talk is cheap since supply always exceeds demand.

---
 


-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:39 PM

To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:


[EMAIL PROTECTED] smtpd]# spamassassin --version
SpamAssassin version 3.1.0
 running on Perl version 5.8.7


Spamd running with:
   OPTIONS="-L -x -d -u nobody -m 45"

No user verification or RBL at the MTA level.



Absolutely do user verification. I can throw out from 20% to 80% of my 
traffic depending on the current level of dictionary and Joe-Job 
attacks. Since you are processing ahead of your clients Exchange boxes 
I'm not sure how you can do that with qmail. I do it on my gateways 
running MailScanner via milter-ahead, and on my toasters via checkuser 
in vpopmail.


There might be a way to get qmail to check with an Exchange box to 
validate a user without running vpopmail, but I won't know it.


DAve




12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
buff
Swap: 2097136K av,   0K used, 2097136K free  225380K
cached

As you can see I have loads of head room as far as memory goes.  I was
looking into integrating RBL into Qmail, but with the very high volume I


am


quite concerned that this will introduce a slowdown.  If I increase the
inbound concurrent rate I eventually run into qmail-scanner problems with
reformime.  Is there anything else I need consider?

Ed

---

Talk is cheap since supply always exceeds demand.

---


-Original Message-
From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:06 PM

To: [EMAIL PROTECTED]; users@spamassassin.apache.org
Subject: RE: General assistance




-Original Message-
From: Ed Russell [mailto:[EMAIL PROTECTED]
Sent: Friday, February 10, 2006 10:51 AM
To: users@spamassassin.apache.org
Subject: General assistance

Am I completely off base in the way I have this all setup?  I have


went



with
a higher speed HD to increase the threshold on file I/O.  Can I tune


the



performance of razor etc while maintaining delivery time?  Is there
anything
else I should be considering?  If I have not explained things well or


more



information is needed I will certainly provide anything.




A few questions I have:
What SA version are you running? spamassassin --version
What do you have --max-children set to?
How much memory do you have free when the box is fully loaded?

I'm trying to see if you have any headroom left to have more spamd
children running.  It sounds like your problem is with waiting on DNS
returns.  This should mean that you have plenty of processing power
remaining just not enough children to handle the requests.

Other things to consider:
Do you use RBLs at the MTA level?
Do you have user verification at the MTA level?

Look for messages your MTA can drop before sending to SA.

Kris












Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

I think there is some confusion, this box does not act as a gateway to
Exchange.  I do not use this product in this scenario, but in others.



I used Exchange as an example, and did state as such. My response made 
it apear you were looking for an Exchange solution.


My appologies.

DAve


Ed


---

 Talk is cheap since supply always exceeds demand.

---
 


-Original Message-
From: Bowie Bailey [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:51 PM

To: users@spamassassin.apache.org
Subject: RE: General assistance

DAve wrote:


Ed Russell wrote:


No user verification or RBL at the MTA level.


Absolutely do user verification. I can throw out from 20% to 80% of my
traffic depending on the current level of dictionary and Joe-Job
attacks. Since you are processing ahead of your clients Exchange boxes
I'm not sure how you can do that with qmail. I do it on my gateways
running MailScanner via milter-ahead, and on my toasters via checkuser
in vpopmail.

There might be a way to get qmail to check with an Exchange box to
validate a user without running vpopmail, but I won't know it.



IIRC, Exchange can act as an LDAP server, so you may be able to do user
verification via LDAP lookups.





RE: General assistance

2006-02-10 Thread Ed Russell
I think there is some confusion, this box does not act as a gateway to
Exchange.  I do not use this product in this scenario, but in others.

Ed


---

 Talk is cheap since supply always exceeds demand.

---
 

-Original Message-
From: Bowie Bailey [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:51 PM
To: users@spamassassin.apache.org
Subject: RE: General assistance

DAve wrote:
> Ed Russell wrote:
> > 
> > No user verification or RBL at the MTA level.
> 
> Absolutely do user verification. I can throw out from 20% to 80% of my
> traffic depending on the current level of dictionary and Joe-Job
> attacks. Since you are processing ahead of your clients Exchange boxes
> I'm not sure how you can do that with qmail. I do it on my gateways
> running MailScanner via milter-ahead, and on my toasters via checkuser
> in vpopmail.
> 
> There might be a way to get qmail to check with an Exchange box to
> validate a user without running vpopmail, but I won't know it.

IIRC, Exchange can act as an LDAP server, so you may be able to do user
verification via LDAP lookups.

-- 
Bowie



RE: General assistance

2006-02-10 Thread Ed Russell
User validation is going to be tough or all but impossible.  This box
forwards off the mail to an NT box running SL Mail.  There is no easy way to
get a userlist out of this product.  In addition the users change daily and
some even use multi-drops.  

Ed


---

 Talk is cheap since supply always exceeds demand.

---
 

-Original Message-
From: DAve [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:39 PM
To: users@spamassassin.apache.org
Subject: Re: General assistance

Ed Russell wrote:
> [EMAIL PROTECTED] smtpd]# spamassassin --version
> SpamAssassin version 3.1.0
>   running on Perl version 5.8.7
> 
> 
> Spamd running with:
> OPTIONS="-L -x -d -u nobody -m 45"
> 
> No user verification or RBL at the MTA level.

Absolutely do user verification. I can throw out from 20% to 80% of my 
traffic depending on the current level of dictionary and Joe-Job 
attacks. Since you are processing ahead of your clients Exchange boxes 
I'm not sure how you can do that with qmail. I do it on my gateways 
running MailScanner via milter-ahead, and on my toasters via checkuser 
in vpopmail.

There might be a way to get qmail to check with an Exchange box to 
validate a user without running vpopmail, but I won't know it.

DAve

> 
> 
> 12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
> 313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
> CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
> Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
> buff
> Swap: 2097136K av,   0K used, 2097136K free  225380K
> cached
> 
> As you can see I have loads of head room as far as memory goes.  I was
> looking into integrating RBL into Qmail, but with the very high volume I
am
> quite concerned that this will introduce a slowdown.  If I increase the
> inbound concurrent rate I eventually run into qmail-scanner problems with
> reformime.  Is there anything else I need consider?
> 
> Ed
> 
> ---
> 
>  Talk is cheap since supply always exceeds demand.
> 
> ---
>  
> 
> -Original Message-
> From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 10, 2006 12:06 PM
> To: [EMAIL PROTECTED]; users@spamassassin.apache.org
> Subject: RE: General assistance
> 
> 
>>-Original Message-
>>From: Ed Russell [mailto:[EMAIL PROTECTED]
>>Sent: Friday, February 10, 2006 10:51 AM
>>To: users@spamassassin.apache.org
>>Subject: General assistance
>>
>>Am I completely off base in the way I have this all setup?  I have
> 
> went
> 
>>with
>>a higher speed HD to increase the threshold on file I/O.  Can I tune
> 
> the
> 
>>performance of razor etc while maintaining delivery time?  Is there
>>anything
>>else I should be considering?  If I have not explained things well or
> 
> more
> 
>>information is needed I will certainly provide anything.
>>
> 
> 
> A few questions I have:
> What SA version are you running? spamassassin --version
> What do you have --max-children set to?
> How much memory do you have free when the box is fully loaded?
> 
> I'm trying to see if you have any headroom left to have more spamd
> children running.  It sounds like your problem is with waiting on DNS
> returns.  This should mean that you have plenty of processing power
> remaining just not enough children to handle the requests.
> 
> Other things to consider:
> Do you use RBLs at the MTA level?
> Do you have user verification at the MTA level?
> 
> Look for messages your MTA can drop before sending to SA.
> 
> Kris
> 
> 
> 



RE: General assistance

2006-02-10 Thread Bowie Bailey
DAve wrote:
> Ed Russell wrote:
> > 
> > No user verification or RBL at the MTA level.
> 
> Absolutely do user verification. I can throw out from 20% to 80% of my
> traffic depending on the current level of dictionary and Joe-Job
> attacks. Since you are processing ahead of your clients Exchange boxes
> I'm not sure how you can do that with qmail. I do it on my gateways
> running MailScanner via milter-ahead, and on my toasters via checkuser
> in vpopmail.
> 
> There might be a way to get qmail to check with an Exchange box to
> validate a user without running vpopmail, but I won't know it.

IIRC, Exchange can act as an LDAP server, so you may be able to do user
verification via LDAP lookups.

-- 
Bowie


RE: General assistance

2006-02-10 Thread Bowie Bailey
Ed Russell wrote:
> 
> No user verification or RBL at the MTA level.

You really should consider finding a way to do user verification at the
MTA level.  You can greatly reduce your server's load if you don't
accept mail for nonexistent users.

To give you an example, so far today my server has rejected 7000
messages to unknown users and only delivered 2000 messages.  That's 7000
messages that SA and ClamAV didn't have to scan.  Also, if I had
accepted those messages, I would have had to drop another 7000
non-delivery messages into my delivery queue (most of which would have
sat in the queue for a week before double-bouncing back to postmaster).

-- 
Bowie


Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

[EMAIL PROTECTED] smtpd]# spamassassin --version
SpamAssassin version 3.1.0
  running on Perl version 5.8.7


Spamd running with:
OPTIONS="-L -x -d -u nobody -m 45"

No user verification or RBL at the MTA level.


Absolutely do user verification. I can throw out from 20% to 80% of my 
traffic depending on the current level of dictionary and Joe-Job 
attacks. Since you are processing ahead of your clients Exchange boxes 
I'm not sure how you can do that with qmail. I do it on my gateways 
running MailScanner via milter-ahead, and on my toasters via checkuser 
in vpopmail.


There might be a way to get qmail to check with an Exchange box to 
validate a user without running vpopmail, but I won't know it.


DAve




12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
buff
Swap: 2097136K av,   0K used, 2097136K free  225380K
cached

As you can see I have loads of head room as far as memory goes.  I was
looking into integrating RBL into Qmail, but with the very high volume I am
quite concerned that this will introduce a slowdown.  If I increase the
inbound concurrent rate I eventually run into qmail-scanner problems with
reformime.  Is there anything else I need consider?

Ed

---

 Talk is cheap since supply always exceeds demand.

---
 


-Original Message-
From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:06 PM

To: [EMAIL PROTECTED]; users@spamassassin.apache.org
Subject: RE: General assistance



-Original Message-
From: Ed Russell [mailto:[EMAIL PROTECTED]
Sent: Friday, February 10, 2006 10:51 AM
To: users@spamassassin.apache.org
Subject: General assistance

Am I completely off base in the way I have this all setup?  I have


went


with
a higher speed HD to increase the threshold on file I/O.  Can I tune


the


performance of razor etc while maintaining delivery time?  Is there
anything
else I should be considering?  If I have not explained things well or


more


information is needed I will certainly provide anything.




A few questions I have:
What SA version are you running? spamassassin --version
What do you have --max-children set to?
How much memory do you have free when the box is fully loaded?

I'm trying to see if you have any headroom left to have more spamd
children running.  It sounds like your problem is with waiting on DNS
returns.  This should mean that you have plenty of processing power
remaining just not enough children to handle the requests.

Other things to consider:
Do you use RBLs at the MTA level?
Do you have user verification at the MTA level?

Look for messages your MTA can drop before sending to SA.

Kris







Re: General assistance

2006-02-10 Thread DAve

Ed Russell wrote:

If everyone would indulge me I would like to put forth the setup I am
utilizing and get some feedback.   I have a box that I have been using for
some time which acts as a pass-through filter for many domains (currently
about 100) for spam, this is a fairly high traffic server processing about
150,000 to 200,000 messages per day.  I use the following method.

Based upon a redhat 6.2 box running kernel 2.2.26, PIV with 2 Gigs of RAM.

Qmail runs which accepts the email from the world (with a
concurrencyincoming of 100) and passes it through qmail-scanner (which calls
spamd) and spamassassin which checks the email and writes spam status to the
header.  Each message gets then passed through a procmail filter which will
delete it if it is spam.  The procmail filter is:

:0
* ^X-Spam-Level: \*\*\*\*\*\*\*\*\*\*\*\*\*\*\*
{
EXITCODE=99
:0
/dev/null
}

:0
* ^X-Spam-Status: Yes
{
EXITCODE=99
:0
/dev/null
}

:0
* ^^rom[ ]
{
  LOG="*** Dropped F off From_ header! Fixing up. "
  
  :0 fhw

  | sed -e '1s/^/F/'
}

:0
/dev/null

Mail that is clean gets passed off to a second qmail install which then
delivers the mail to our servers using smtproutes.

This has been working fine for a few years now, but recently we have
experienced major delays in the processing of email.  Due to the very high
volume pretty much all the time the system is handling 100 concurrent
incoming pieces of email.  Of course with everything else going on it is not
uncommon for this system to have up to 400 processes running.  Sometimes
mail can take hours to get through to its destination.  What I have
discovered is that if I disable RBL checks and razor, pyzor and dcc the
delay goes away.  However, the effectiveness of the filter reduces.  


Am I completely off base in the way I have this all setup?  I have went with
a higher speed HD to increase the threshold on file I/O.  Can I tune the
performance of razor etc while maintaining delivery time?  Is there anything
else I should be considering?  If I have not explained things well or more
information is needed I will certainly provide anything.

Thanks


Since you are running qmail, consider doing your rbl checks in 
qmail-smtpd. No sense scanning a message if you can drop it at the door 
first.


Also, are your running dnscache? I run dnscache on all my servers, web, 
webmail, toasters, etc. It can speed things up considerably as it will 
cache your RBL lookups, SURBL lookups, etc. It's a nice thing to do for 
the URIBL and SURBL folks too.


If you do run dnscache, consider turning logging off once you are 
configured and satisfied it works as intended. dnscache can keep a disk 
pretty busy with it's potential to log a lot of data.


DAve


RE: General assistance

2006-02-10 Thread Ed Russell
[EMAIL PROTECTED] smtpd]# spamassassin --version
SpamAssassin version 3.1.0
  running on Perl version 5.8.7


Spamd running with:
OPTIONS="-L -x -d -u nobody -m 45"

No user verification or RBL at the MTA level.


12:20pm  up  4:05,  1 user,  load average: 9.49, 9.23, 9.23
313 processes: 300 sleeping, 12 running, 1 zombie, 0 stopped
CPU states: 18.9% user, 16.6% system,  0.0% nice, 64.4% idle
Mem:  2009856K av,  711560K used, 1298296K free,  353776K shrd,  129268K
buff
Swap: 2097136K av,   0K used, 2097136K free  225380K
cached

As you can see I have loads of head room as far as memory goes.  I was
looking into integrating RBL into Qmail, but with the very high volume I am
quite concerned that this will introduce a slowdown.  If I increase the
inbound concurrent rate I eventually run into qmail-scanner problems with
reformime.  Is there anything else I need consider?

Ed

---

 Talk is cheap since supply always exceeds demand.

---
 

-Original Message-
From: Kristopher Austin [mailto:[EMAIL PROTECTED] 
Sent: Friday, February 10, 2006 12:06 PM
To: [EMAIL PROTECTED]; users@spamassassin.apache.org
Subject: RE: General assistance

> -Original Message-
> From: Ed Russell [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 10, 2006 10:51 AM
> To: users@spamassassin.apache.org
> Subject: General assistance
> 
> Am I completely off base in the way I have this all setup?  I have
went
> with
> a higher speed HD to increase the threshold on file I/O.  Can I tune
the
> performance of razor etc while maintaining delivery time?  Is there
> anything
> else I should be considering?  If I have not explained things well or
more
> information is needed I will certainly provide anything.
> 

A few questions I have:
What SA version are you running? spamassassin --version
What do you have --max-children set to?
How much memory do you have free when the box is fully loaded?

I'm trying to see if you have any headroom left to have more spamd
children running.  It sounds like your problem is with waiting on DNS
returns.  This should mean that you have plenty of processing power
remaining just not enough children to handle the requests.

Other things to consider:
Do you use RBLs at the MTA level?
Do you have user verification at the MTA level?

Look for messages your MTA can drop before sending to SA.

Kris



RE: General assistance

2006-02-10 Thread Kristopher Austin
> -Original Message-
> From: Ed Russell [mailto:[EMAIL PROTECTED]
> Sent: Friday, February 10, 2006 10:51 AM
> To: users@spamassassin.apache.org
> Subject: General assistance
> 
> Am I completely off base in the way I have this all setup?  I have
went
> with
> a higher speed HD to increase the threshold on file I/O.  Can I tune
the
> performance of razor etc while maintaining delivery time?  Is there
> anything
> else I should be considering?  If I have not explained things well or
more
> information is needed I will certainly provide anything.
> 

A few questions I have:
What SA version are you running? spamassassin --version
What do you have --max-children set to?
How much memory do you have free when the box is fully loaded?

I'm trying to see if you have any headroom left to have more spamd
children running.  It sounds like your problem is with waiting on DNS
returns.  This should mean that you have plenty of processing power
remaining just not enough children to handle the requests.

Other things to consider:
Do you use RBLs at the MTA level?
Do you have user verification at the MTA level?

Look for messages your MTA can drop before sending to SA.

Kris