Re: Question regarding meta rule handling

2005-08-02 Thread Theo Van Dinter
On Wed, Aug 03, 2005 at 08:18:16AM +0200, Sven Riedel wrote:
> header __X Content-Type =~ /^(message|multipart)/i
> rawbody __Y /\S/
> meta Z ( !X && !Y )
> 
> and yet the rule triggers for me. Doing a

Of course.  __X != X ... :)

-- 
Randomly Generated Tagline:
"The question is to what extent parasites like Microsoft should be parasites
 off the public system, or should be granted any rights at all."
   - Noam Chomsky


pgp1hmiHdeI2o.pgp
Description: PGP signature


Question regarding meta rule handling

2005-08-02 Thread Sven Riedel
Hi,

a while back someone kindly posted a rule here that matches on 
empty mails:

header __X Content-Type =~ /^(message|multipart)/i
rawbody __Y /\S/
meta Z ( !X && !Y )

Now I find that Z matches on all mails - investigation shows
that Y matches on all non-whitespaces as it should, and X 
doesn't match. So I would assume that 

( !0 && !1 ) = ( 1 && 0 ) = 0

and yet the rule triggers for me. Doing a

spamassassin -t -D < testmessage

doesn't show anything of use to me why this rule triggers. 
Any ideas?

Oh, and then I tried to disable the rule by assigning as score 
of 0 (the wiki page on writing rules states that rules with a 
score of 0 aren't processed). And yet this rule keep on
turning up in my X-Spam-Status header. I'm a bit puzzled at this
point.

Regs,
Sven



--

BAGHUS GmbH 
EDV und Internetdienstleistungen

Staffelseestr. 2
81477 München

Tel.: 0 89 / 8 71 81 - 4 84
Fax.: 0 89 / 8 71 81 - 4 88

www.baghus.net, [EMAIL PROTECTED]
HRB: 144283, USt-IdNr: DE224865405

Junkmail Catcher, do not use: [EMAIL PROTECTED]

--
 


RE: Load balancing spamd

2005-08-02 Thread Gary W. Smith


> 
> How do you (make and) balance the calls to the AV servers?  How do you
> (make
> and) balance the calls to the spamd machines?  I am very interested in
> these
> details!

We just call them in order case on the connection line.  On two of the 4
SMTP gateways we use node 1 as the primary and node 2 as the secondary,
on the other two, just the opposite.  I know this is the poor mans way
of doing this but we are lazy and haven't made our way to using
something like LVS.

> 
> 
> We are edging up to 95K a day now on only two machines.  You can
imagine
> we
> are anxious to start using the other boxes we have rarin' to go!

Ironically, when we first started this we had everything running on 4
machines and it started choking.  So, we went with the two backend ends.
It chocked.  Then we kicked the -m from 30 to 6.  6 is a small number
but it seems to be working fine.  We have found for our environment that
6 to 8 works well.
 
> 
> > We
> > recently upgrade all of the hardware to Dell Dimension 4700's with
1.5gb
> > ram each.  Budget was $5200.
> >
> > Machines are idle.
> 
> Sweet.  ;)
> 

And it was overall cheap

> Why?  Because your DNS costs to query your RBL list in Postfix is very
> heavy/slowing you down?  Are you going to mirror just one chosen RBL
out
> there or a combination of several??
> 
> Do you run DCC in your SA environment?  If so, you are over their
> recommended
> limit for hosting a DCC server (we are nearing it - 100K a day I
think).
> Do
> you run a DCC server for yourself?  Any issues to be aware of?
> 

It's on the TODO list.  Item 629 I believe... :)  There are other
pressing items to fix/work on.  This is working great but will be
readdressed during the next maintenance upgrade (which is about every 90
days).

Gary


Re: Bayes: not enough usable tokens found

2005-08-02 Thread Loren Wilton
Hum.  I'm a little confused by that SA score stuff on the bottom of the
message.  If it refers to a message that should be spam you have two serious
problems.  If it referred to a message from this list you may have a serious
problem and a less serious problem.

 pts rule name  description
 -- 
--
-3.3 ALL_TRUSTEDDid not pass through any untrusted hosts
 0.0 HTML_30_40 BODY: Message is 30% to 40% HTML
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.0 HTML_TITLE_EMPTY   BODY: HTML title contains no text
-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%

In general ALL_TRUSTED shouldn't be firing for messages coming from an
external source.  This makes me wonder if you have trusted_hosts and
trusted_networks set correctly.

In general SA (and especially Bayes) shouldn't be seeing this list, since it
has a lot of real spam floating through it, and other spammy tokens.  It is
far better to use postfix or whatever your router is to bypass this list
around SA.

If that header referred to a spam, BAYES_00 says that Bayes thought it was
guaranteed ham.  That would be a sign that you have a corrupted bayes
database.

Loren



Re: Bayes: not enough usable tokens found

2005-08-02 Thread Daryl C. W. O'Shea

Mike Cavanagh wrote:
Hum.  I can see some messages are being caught via the Bayes test, but I 
would think Bayes would find many more as I have close to 5000 SPAM in 
the Bayes system.
I get at most 15 messages a day flagged as SPAM while I receive approx. 
100 messages a day as non-SPAM but should be flagged as SPAM.


I have started to include the Spamassassin footer on all messages to get 
a handle on what passes in the "non-Spam" messages.


Any thoughts on how to improve this would be helpful.





 pts rule name  description
 -- --
-3.3 ALL_TRUSTEDDid not pass through any untrusted hosts



http://wiki.apache.org/spamassassin/TrustPath



OT: RBL for dynamic "no reverse DNS" lookups

2005-08-02 Thread Rob McEwen
OT: RBL for dynamic "no reverse DNS" lookups

I'm trying to find an RBL which will return a standard RBL return code (like
"127.0.0.2") if/when the IP passed to the RBL doesn't have a reverse DNS
entry.

(1) I know that SA doesn't have a need for this as another function is
already available in SA for this. But I need this for a **different**
utility, not SA (which is why I said, "OT").

(2) This other utility doesn't have the option to check for "no reverse
DNS", but CAN do whatever general RBL lookups I tell it to do. Also, I don't
have access to this utility's source code.  However, if I can find this kind
of RBL I mentioned, then I can use this utility's RBL lookups against that
kind of RBL to accomplish checking a message's sending server for "no
reverse DNS". But, again, doing lookups on (reversedIP).in-addr.arpa is NOT
an option in this utility because it **only** works with the traditional RBL
responses, which are always numeric, unlike reverse DNS lookups.

(3) I know that some aggressive RBLs factor in "no reverse DNS"... but,
instead, I'm looking for an RBL which would do a DYNAMIC lookup to see if
there is "no reverse DNS", even if that RBL hasn't checked that IP before or
hasn't previously added that IP to it's "no reverse DNS" nameserver
database.

(4) And, of course, I understand that it is NOT a good idea to block
**solely** due to a sending server's IP not having a reverse DNS lookup.
Rather, I'm using this for auditing, testing, and other things.

Thanks,

Rob McEwen
PowerView Systems



Re: Bayes: not enough usable tokens found

2005-08-02 Thread Mike Cavanagh




Hum.  I can see some messages are being caught via the Bayes test, but
I would think Bayes would find many more as I have close to 5000 SPAM
in the Bayes system.
I get at most 15 messages a day flagged as SPAM while I receive approx.
100 messages a day as non-SPAM but should be flagged as SPAM.

I have started to include the Spamassassin footer on all messages to
get a handle on what passes in the "non-Spam" messages.

Any thoughts on how to improve this would be helpful.

Thanks,
Mike


Loren Wilton wrote:

  
What does this message mean??
debug: cannot use bayes on this message; not enough usable tokens

  
  found
  
  
debug: bayes: not scoring message, returning undef

  
  
Unless you are seeing this a whole lot, I don't think you are doing anything
wrong.  I think this just means that the particular mail didn't much match
anything Bayes had seen before, so it didn't feel competent to assign a
score to it.  I would have expected that to be a bayes_50 case, but it looks
like it just decided to bypass the message.

Loren

  



Spam detection software, running on the system "fred.5cs.com", has
identified this incoming email as possible spam.  The original message
has been attached to this so you can view it (if it isn't spam) or label
similar future email.  If you have any questions, see
[EMAIL PROTECTED] for details.

Content preview:  Hum. I can see some messages are being caught via the
  Bayes test, but I would think Bayes would find many more as I have
  close to 5000 SPAM in the Bayes system. I get at most 15 messages a day
  flagged as SPAM while I receive approx. 100 messages a day as non-SPAM
  but should be flagged as SPAM. [...] 

Content analysis details:   (-5.9 points, 10.0 required)

 pts rule name  description
 -- --
-3.3 ALL_TRUSTEDDid not pass through any untrusted hosts
 0.0 HTML_30_40 BODY: Message is 30% to 40% HTML
 0.0 HTML_MESSAGE   BODY: HTML included in message
 0.0 HTML_TITLE_EMPTY   BODY: HTML title contains no text
-2.6 BAYES_00   BODY: Bayesian spam probability is 0 to 1%
[score: 0.]




Re: Bayes: not enough usable tokens found

2005-08-02 Thread Loren Wilton
> What does this message mean??
> debug: cannot use bayes on this message; not enough usable tokens
found
> debug: bayes: not scoring message, returning undef

Unless you are seeing this a whole lot, I don't think you are doing anything
wrong.  I think this just means that the particular mail didn't much match
anything Bayes had seen before, so it didn't feel competent to assign a
score to it.  I would have expected that to be a bayes_50 case, but it looks
like it just decided to bypass the message.

Loren



RE: Load balancing spamd

2005-08-02 Thread email builder


--- "Gary W. Smith" <[EMAIL PROTECTED]> wrote:

> We have 4 front end servers running postfix.  These servers call and AV
> process on two additional AV servers behind the wall.  Then these
> servers

"these" being the AV server calls spamd or it goes back to the MTA first?

How do you (make and) balance the calls to the AV servers?  How do you (make
and) balance the calls to the spamd machines?  I am very interested in these
details!

> call spamd on two additional servers behind the wall.  Those two
> servers have a simple MySQL cluster (running Linux-HA and DRBD).  
> 
> In all we have 8 boxes that handle all of our email for our clients.  We
> are generating about 170k emails per day coming into the network.

We are edging up to 95K a day now on only two machines.  You can imagine we
are anxious to start using the other boxes we have rarin' to go!

> We
> recently upgrade all of the hardware to Dell Dimension 4700's with 1.5gb
> ram each.  Budget was $5200.  
> 
> Machines are idle.  

Sweet.  ;)
 
> Something new we have been looking at as well.  We are looking at
> setting up simple relays that will run RBL on the front end and then
> just hand them off to our 4 backend servers.  But since it works right
> now we're not going to fix it.

Why?  Because your DNS costs to query your RBL list in Postfix is very
heavy/slowing you down?  Are you going to mirror just one chosen RBL out
there or a combination of several??

Do you run DCC in your SA environment?  If so, you are over their recommended
limit for hosting a DCC server (we are nearing it - 100K a day I think).  Do
you run a DCC server for yourself?  Any issues to be aware of?

Thanks a TON!!


> 
> > -Original Message-
> > From: email builder [mailto:[EMAIL PROTECTED]
> > Sent: Tuesday, August 02, 2005 5:19 PM
> > To: Jason Frisvold
> > Cc: Gary W. Smith; users@spamassassin.apache.org
> > Subject: Re: Load balancing spamd
> > 
> > 
> > 
> > --- Jason Frisvold <[EMAIL PROTECTED]> wrote:
> > 
> > > On 8/1/05, email builder <[EMAIL PROTECTED]> wrote:
> > > > Even if I had forgotten the -A, I think I would have been seeing
> > > connection
> > > > refused notices, but right now, it just seems to time out.  I'm
> pretty
> > > sure
> > > > this is a LVS question more than a spamc/d question, since I've no
> > > problems
> > > > with the latter -- I am only asking here to see if anyone else
> does SA
> > > > weighted load balancing.
> > >
> > > I kinda went the other way around..  I have multiple mail machines,
> > > each with their own instance of spamd.  I use a Cisco 7206 VXR to do
> > > the load balancing.  Works like a charm.
> > 
> > Wow, a bit out of our price range here.  :)
> > 
> > We have also considered just continuing to build out MTA boxes each
> with
> > an
> > Amavis/Clamd and SA on them to share our increasing load (just use LVS
> to
> > balance the incoming SMTP traffic and there is little reason to worry
> > about
> > balancing SA or Amavis/Clam), but our first choice is to split the
> > "layers"
> > -- have a couple separate machines that just do MTA-ish things, and a
> > separate set of boxes that serve as a "SA (and Clam-av) farm".  The
> thing
> > that's better about doing it that way is the redundancy that you don't
> get
> > if
> > you aren't sharing spamd instances across all your MTA machines.
> > 
> > Technically, this should be feasible with just plain DNS load
> balancing,
> > but
> > in our current medium/low budget scenario, we don't have the rackspace
> to
> > have numerous boxes that are dedicated ONLY to SA/clam, thus our
> desire is
> > to
> > figure out a way to *WEIGHT* our spamd balancing.
> > 
> > I'm surprised there's not a lot of folks out there who have done this
> > before?
> > 
> > Thanks again!
> > 
> > 
> > 
> > 
> > 
> > Start your day with Yahoo! - make it your home page
> > http://www.yahoo.com/r/hs
> > 
> 




__ 
Yahoo! Mail for Mobile 
Take Yahoo! Mail with you! Check email on your mobile phone. 
http://mobile.yahoo.com/learn/mail 


Bayes: not enough usable tokens found

2005-08-02 Thread Mike Cavanagh

What does this message mean??
   debug: cannot use bayes on this message; not enough usable tokens found
   debug: bayes: not scoring message, returning undef

I am using MimeDefang Ver. 2.52 and SpamAssassin Ver. 3.0.4

Below is:
   current status of bayes database (sa-learn --dump=magic)
   sa-mimedefang.cf
   spamassassin --lint --debug

What am I doing wrong?  I am sure this is something simple, I just can't 
seem to see it.

Thanks,
Mike.

*
SA-LEARN Status:
/usr/local/bin/sa-learn --username=mimedefang 
--siteconfigpath=/etc/mail/spamassassin --dump=magic

0.000  0  3  0  non-token data: bayes db version
0.000  0   4275  0  non-token data: nspam
0.000  0765  0  non-token data: nham
0.000  0 148928  0  non-token data: ntokens
0.000  0 1120235107  0  non-token data: oldest atime
0.000  0 1123040192  0  non-token data: newest atime
0.000  0 1123030366  0  non-token data: last journal 
sync atime

0.000  0 1123000571  0  non-token data: last expiry atime
0.000  02764800  0  non-token data: last expire 
atime delta
0.000  0   2580  0  non-token data: last expire 
reduction count


*
Sa-mimedefang.cf:
required_hits   10
ok_locales  en, zh
skip_rbl_checks 0   # Go ahead and check anyways
use_bayes 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 0.1
bayes_auto_learn_threshold_spam 12.0
bayes_learn_during_report 1
bayes_path /etc/mail/spamassassin/bayes
bayes_file_mode 0700
bayes_min_ham_num 200
bayes_min_spam_num 200
bayes_use_hapaxes 1
bayes_use_chi2_combining 1
bayes_auto_expire 1
bayes_learn_to_journal 0
bayes_journal_max_size 102400
use_dcc 1
use_pyzor 1
use_razor2 1

*
Spamassassin Lint:
spamassassin -D --lint --siteconfigpath=/etc/mail/spamassassin
debug: SpamAssassin version 3.0.4
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/ccs/bin', keeping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/opt/sfw/bin', keeping.
debug: Final PATH set to: 
/usr/sbin:/usr/bin:/usr/ccs/bin:/usr/local/bin:/opt/sfw/bin


debug: diag: module not installed: DBI ('require' failed)

debug: diag: module installed: DB_File, version 1.811
debug: diag: module installed: Digest::SHA1, version 2.07
debug: diag: module installed: IO::Socket::UNIX, version 1.21
debug: diag: module installed: MIME::Base64, version 3.03
debug: diag: module installed: Net::DNS, version 0.46

debug: diag: module not installed: Net::LDAP ('require' failed)

debug: diag: module installed: Razor2::Client::Agent, version 2.40
debug: diag: module installed: Storable, version 2.09
debug: diag: module installed: URI, version 1.30
debug: ignore: using a test message to lint rules
debug: using "/opt/sfw/share/spamassassin" for default rules dir
debug: config: read file /opt/sfw/share/spamassassin/10_misc.cf
debug: config: read file /opt/sfw/share/spamassassin/20_anti_ratware.cf
debug: config: read file /opt/sfw/share/spamassassin/20_body_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_compensate.cf
debug: config: read file /opt/sfw/share/spamassassin/20_dnsbl_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_drugs.cf
debug: config: read file /opt/sfw/share/spamassassin/20_fake_helo_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_head_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_html_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_meta_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/20_phrases.cf
debug: config: read file /opt/sfw/share/spamassassin/20_porn.cf
debug: config: read file /opt/sfw/share/spamassassin/20_ratware.cf
debug: config: read file /opt/sfw/share/spamassassin/20_uri_tests.cf
debug: config: read file /opt/sfw/share/spamassassin/23_bayes.cf
debug: config: read file /opt/sfw/share/spamassassin/25_body_tests_es.cf
debug: config: read file /opt/sfw/share/spamassassin/25_hashcash.cf
debug: config: read file /opt/sfw/share/spamassassin/25_spf.cf
debug: config: read file /opt/sfw/share/spamassassin/25_uribl.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_de.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_fr.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_nl.cf
debug: config: read file /opt/sfw/share/spamassassin/30_text_pl.cf
debug: config: read file /opt/sfw/share/spamassassin/50_scores.cf
debug: config: read file /opt/sfw/share/spamassassin/60_whitelist.cf
debug: using "/etc/mail/spamassassin

Re: Load balancing spamd

2005-08-02 Thread email builder


--- Charles Sprickman <[EMAIL PROTECTED]> wrote:

> On Tue, 2 Aug 2005, email builder wrote:
> 
> > Technically, this should be feasible with just plain DNS load balancing,
> but
> > in our current medium/low budget scenario, we don't have the rackspace to
> > have numerous boxes that are dedicated ONLY to SA/clam, thus our desire
> is to
> > figure out a way to *WEIGHT* our spamd balancing.
> 
> I've been very happy with DNS load balancing.  The frontend mxer runs 
> tinydns on a local zone "blah.local.domain.com", and an instance of 
> dnscache with the round-robin patch is pointed to in resolv.conf.  While I 
> thought that the load balancing would be a little "rough", looking at the 
> stats I sent 17011 messages through #1, 17025 through #2, and 17016 
> through #3 yesterday.  I can also weight this by having multiple records, 
> ie:
> 
> spamd1 gets three identical entries in tinydns
> spamd2 gets three identical entries in tinydns
> spamd3 gets three identical entries in tinydns
> spamd4 gets one entry

O, some good bits!  We have always been plenty satisfied with Bind, but
maybe this is the straw that broke the camel's back  unless anyone knows
if Bind will behave the same way if we have multiple entries for one host??

 
> that will leave spamd4 seeing about 1/3 the load of the other boxes.  It's 
> not "clustering", but when using the "-d" flag:
> 
> -d host
>Connect to spamd server on given host.  If host resolves to multi-
>ple addresses, then spamc will fail-over to the other addresses, if
>the first one cannot be connected to.
> 
> it should hit another box if one goes down.  Or some easy scripting could 
> remove the appropriate entries from tinydns if one machine stops 
> responding.
> 
> Speaking of low budget, we have three SA boxes, each of which has a 2GHz 
> AMD processor, 1GB RAM.  The first two cost about $550, the last one about 
> $425.  They are pretty crappy boxes with no RAID, etc., but it's cheaper 
> for me to keep one more box than needed in the equation than to build out 
> a few "uber spamd" boxes.  They are in mini-atx cases, so they barely take 
> up more room than an equivalent number of 1U boxes. I spawn 30 spamd 
> children on each.  I have been very happy with the performance so far.
> 
> > I'm surprised there's not a lot of folks out there who have done this
> > before?
> 
> Maybe they're all cheap like me. :)

Awesome!  Thanks for the advice!!!

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


RE: Load balancing spamd

2005-08-02 Thread Gary W. Smith
We have 4 front end servers running postfix.  These servers call and AV
process on two additional AV servers behind the wall.  Then these
servers call spamd on two additional servers behind the wall.  Those two
servers have a simple MySQL cluster (running Linux-HA and DRBD).  

In all we have 8 boxes that handle all of our email for our clients.  We
are generating about 170k emails per day coming into the network.  We
recently upgrade all of the hardware to Dell Dimension 4700's with 1.5gb
ram each.  Budget was $5200.  

Machines are idle.  

Something new we have been looking at as well.  We are looking at
setting up simple relays that will run RBL on the front end and then
just hand them off to our 4 backend servers.  But since it works right
now we're not going to fix it.

> -Original Message-
> From: email builder [mailto:[EMAIL PROTECTED]
> Sent: Tuesday, August 02, 2005 5:19 PM
> To: Jason Frisvold
> Cc: Gary W. Smith; users@spamassassin.apache.org
> Subject: Re: Load balancing spamd
> 
> 
> 
> --- Jason Frisvold <[EMAIL PROTECTED]> wrote:
> 
> > On 8/1/05, email builder <[EMAIL PROTECTED]> wrote:
> > > Even if I had forgotten the -A, I think I would have been seeing
> > connection
> > > refused notices, but right now, it just seems to time out.  I'm
pretty
> > sure
> > > this is a LVS question more than a spamc/d question, since I've no
> > problems
> > > with the latter -- I am only asking here to see if anyone else
does SA
> > > weighted load balancing.
> >
> > I kinda went the other way around..  I have multiple mail machines,
> > each with their own instance of spamd.  I use a Cisco 7206 VXR to do
> > the load balancing.  Works like a charm.
> 
> Wow, a bit out of our price range here.  :)
> 
> We have also considered just continuing to build out MTA boxes each
with
> an
> Amavis/Clamd and SA on them to share our increasing load (just use LVS
to
> balance the incoming SMTP traffic and there is little reason to worry
> about
> balancing SA or Amavis/Clam), but our first choice is to split the
> "layers"
> -- have a couple separate machines that just do MTA-ish things, and a
> separate set of boxes that serve as a "SA (and Clam-av) farm".  The
thing
> that's better about doing it that way is the redundancy that you don't
get
> if
> you aren't sharing spamd instances across all your MTA machines.
> 
> Technically, this should be feasible with just plain DNS load
balancing,
> but
> in our current medium/low budget scenario, we don't have the rackspace
to
> have numerous boxes that are dedicated ONLY to SA/clam, thus our
desire is
> to
> figure out a way to *WEIGHT* our spamd balancing.
> 
> I'm surprised there's not a lot of folks out there who have done this
> before?
> 
> Thanks again!
> 
> 
> 
> 
> 
> Start your day with Yahoo! - make it your home page
> http://www.yahoo.com/r/hs
> 


Re: Runaway processes

2005-08-02 Thread Frank M. Cook

Pretty much answered in my following mail.  In general each child might us
30-60mb under NORMAL circumstances, so the amount of memory on your 
machine

will determine the upper limit for number of children.


so 8 would be max on a 512meg system (what I have).  I still have free ram 
after firing off 15 but I'll take it back down and see what happens.



In most cases you shouldn't really need less than about 20 connections

sounds like a place to start.  thanks.


PS: Could you post plain text rather than html if convenient?

sure

Frank M. Cook
Association Computer Services, Inc.
http://www.acsplus.com 



Re: Load balancing spamd

2005-08-02 Thread Charles Sprickman

On Tue, 2 Aug 2005, email builder wrote:


Technically, this should be feasible with just plain DNS load balancing, but
in our current medium/low budget scenario, we don't have the rackspace to
have numerous boxes that are dedicated ONLY to SA/clam, thus our desire is to
figure out a way to *WEIGHT* our spamd balancing.


I've been very happy with DNS load balancing.  The frontend mxer runs 
tinydns on a local zone "blah.local.domain.com", and an instance of 
dnscache with the round-robin patch is pointed to in resolv.conf.  While I 
thought that the load balancing would be a little "rough", looking at the 
stats I sent 17011 messages through #1, 17025 through #2, and 17016 
through #3 yesterday.  I can also weight this by having multiple records, 
ie:


spamd1 gets three identical entries in tinydns
spamd2 gets three identical entries in tinydns
spamd3 gets three identical entries in tinydns
spamd4 gets one entry

that will leave spamd4 seeing about 1/3 the load of the other boxes.  It's 
not "clustering", but when using the "-d" flag:


-d host
  Connect to spamd server on given host.  If host resolves to multi-
  ple addresses, then spamc will fail-over to the other addresses, if
  the first one cannot be connected to.

it should hit another box if one goes down.  Or some easy scripting could 
remove the appropriate entries from tinydns if one machine stops 
responding.


Speaking of low budget, we have three SA boxes, each of which has a 2GHz 
AMD processor, 1GB RAM.  The first two cost about $550, the last one about 
$425.  They are pretty crappy boxes with no RAID, etc., but it's cheaper 
for me to keep one more box than needed in the equation than to build out 
a few "uber spamd" boxes.  They are in mini-atx cases, so they barely take 
up more room than an equivalent number of 1U boxes. I spawn 30 spamd 
children on each.  I have been very happy with the performance so far.



I'm surprised there's not a lot of folks out there who have done this
before?


Maybe they're all cheap like me. :)

Charles


Thanks again!





Start your day with Yahoo! - make it your home page
http://www.yahoo.com/r/hs




Re: Load balancing spamd

2005-08-02 Thread email builder


--- Jason Frisvold <[EMAIL PROTECTED]> wrote:

> On 8/1/05, email builder <[EMAIL PROTECTED]> wrote:
> > Even if I had forgotten the -A, I think I would have been seeing
> connection
> > refused notices, but right now, it just seems to time out.  I'm pretty
> sure
> > this is a LVS question more than a spamc/d question, since I've no
> problems
> > with the latter -- I am only asking here to see if anyone else does SA
> > weighted load balancing.
> 
> I kinda went the other way around..  I have multiple mail machines,
> each with their own instance of spamd.  I use a Cisco 7206 VXR to do
> the load balancing.  Works like a charm.

Wow, a bit out of our price range here.  :)  

We have also considered just continuing to build out MTA boxes each with an
Amavis/Clamd and SA on them to share our increasing load (just use LVS to
balance the incoming SMTP traffic and there is little reason to worry about
balancing SA or Amavis/Clam), but our first choice is to split the "layers"
-- have a couple separate machines that just do MTA-ish things, and a
separate set of boxes that serve as a "SA (and Clam-av) farm".  The thing
that's better about doing it that way is the redundancy that you don't get if
you aren't sharing spamd instances across all your MTA machines.  

Technically, this should be feasible with just plain DNS load balancing, but
in our current medium/low budget scenario, we don't have the rackspace to
have numerous boxes that are dedicated ONLY to SA/clam, thus our desire is to
figure out a way to *WEIGHT* our spamd balancing.

I'm surprised there's not a lot of folks out there who have done this
before?

Thanks again!





Start your day with Yahoo! - make it your home page 
http://www.yahoo.com/r/hs 
 


Re: Increase Performance howto

2005-08-02 Thread jdow
From: "Dhanny Kosasih" <[EMAIL PROTECTED]>

> I tested my qmail wtih more than 14000 spam (i used qmail-inject in my
> script). If i use QSheff + ClamAV + SpamAssassin, my server process
> 14000 emails in 1 hour, and if i only use qmail my server process 14000
> emails in 1/3 hours. How can i increase my server performance ? I don't
> understand what 'max-connection' and '-m' for, can u tell me what is that
?

If you do not already have a large amount of memory then adding
memory is one of the sovereign cures for slow SpamAssassin. As soon
as it goes to swap you're dead.

More processor also helps.

Fewer rule sets leads to poorer filtering and faster operation.

You are already processing much faster than my 1GHz Athlon which has
a gigabyte of ram. With all the rules I run it takes on the order of
a second and a half to scan for single messages. With multiple messages
at once there is some net advantage to the multiprocessing that happens.

It may be time to split the server into two machines.
{^_^}




Re: Forwarding mail address

2005-08-02 Thread jdow
From: "Alexandre Cruz" <[EMAIL PROTECTED]>

> Hi all,
>  
> I do understand that this can sound as a very newbie question, however i
> have a doubt that i can't find an answer. We are using Spamassassin with
> procmail/sendmail. It is working fine, however, spam mail is being
> forwarded for a mail account, which is no longer valid. I've been
> looking where this address is in the configuration, in order to forward
> those mails to another account, but no luck. Any suggestion?

Is fetchmail involved? If so then you might have to change contents
of either /etc/fetchmailrc or that person's account's .fetchmailrc
file.

For the /etc/fetchmailrc case all you need to do is redirect that
person's email by changing the local address stanza. If a fetchmail
is running for each account then you would need to disable that
account's fetchmail startup, where ever that happens. Then add
lines to some other account's .fetchmailrc to poll for and
receive the mail instead.

If you are not using fetchmail you need to punt a little. The sendmail
(or substitute) files might need an alias on that account. Others can
suggest tactics for this case.

{^_^}



Re: Runaway processes

2005-08-02 Thread Loren Wilton
> is it better to run five children with 20 connections each, or 20 children
with five connections each?

Pretty much answered in my following mail.  In general each child might us
30-60mb under NORMAL circumstances, so the amount of memory on your machine
will determine the upper limit for number of children.

In most cases you shouldn't really need less than about 20 connections
(mails processed before dying) per child.  If you do it may be a sign of
other configuration problems in the system, such as not limiting the size of
large mails going through SA.

Loren

PS: Could you post plain text rather than html if convenient?  OE makes
quoting from HTML a bloody pain.  :-(



Re: Increase Performance howto

2005-08-02 Thread Loren Wilton
> I tested my qmail wtih more than 14000 spam (i used qmail-inject in my
> script). If i use QSheff + ClamAV + SpamAssassin, my server process
> 14000 emails in 1 hour, and if i only use qmail my server process 14000
> emails in 1/3 hours. How can i increase my server performance ? I don't
> understand what 'max-connection' and '-m' for, can u tell me what is that
?

I just did a long reply on ths subject, look at the trhead 'runaway
processes'.

Loren



Re: Runaway processes

2005-08-02 Thread Loren Wilton
> so you are running 30 per child and 6 children?  180 total.  how many
messages a day are you handling.  I upped my children from 5 to 15 thinking
that would help but it hasn't.  I was thinking of taken connections down to
5 or 6 on 15 children.  maybe I have it backwards?  I don't have anything
else running on this computer at all so I was thinking I wanted to use up
all the memory with children. is that off?

30 connections on 6 children is a reasonable number for many smaller sites,
the type that average probably less than 10K mails/day, at a guess.  It
should work reasonably well on the typical system with at least 500MB of
memory and a 500MHz or faster processor.

With a slower processor, or certainly with less memory, you might want to
take the number of children down, and possibly the number of connections.

Simple description on how this stuff works:

spamd fires off some number of children determined by -m, with the default
of 5.

Each child takes some amount of memory.  This is typically 30-60MB *per
child* depending on the number of rules files you have.  It will start a bit
smaller than that, and will typically grow over the first dozen or so mails.

If you have a lot of rules so your spamd children are taking 60MB each, 5 *
60 = 300MB.  You better have a 512MB system or larger or you will be in heap
big trouble.  Even at 30MB, 30 * 5 = 150MB.  This would probably work in a
256M system, but maybe not.  You might want -m 3 or so in this case.

Each child will process --max-conn-per-child messages before it dies and a
new child is created in its place.  If all mail was pretty much the same,
and if the children did nothing but process mail, this really shouldn't
matter.

But the real fact is that all mail isn't the same.  Some are very large.
They should be limited to 250K or so, but some programs like qmail don't
necessarily limit the mail size in the standard configuration.

It is NOT a direct relation from mail size to spamd child size!  A 250KB
mail might easily crank a child up to 250MB!

Once the child gets big, it just stays that way.  If you feed large mails to
SA, you cen get some really fat children.  5 children at 250MB each aren't
going to fit well in a 512MB system.

If you only let each child process a few messages before dying, if it
happens to process one large message and gets big, it will only stay big for
a few messages before going away.  Chances are relatively small that all the
children will manage to get fat at the same time, so you will probably
survive just fine.  With a large value of max con per child (like the
default) it is pretty easy to get all the children fat at once.

Spamd children also do other things than just process mail.  Like doing
database expiration runs.  These tend to get the children very fat,
especially have you have a database that has somehow gotten out of control.
Again, this causes Bad Things(tm) if it happens to a lot of the children at
once.

Loren



Re: Personal Bayes Score

2005-08-02 Thread Dhanny Kosasih

Matthew Yette wrote:


Dankos,

Put this into your /etc/mail/spamassassin/local.cf:

user_scores_sql_custom_querySELECT preference, value FROM _TABLE_
WHERE username = _USERNAME_ OR username = '@GLOBAL' OR username =
_DOMAIN_ ORDER BY username ASC

That will make per-user preferences priority, and then roll back to the
GLOBAL if the user doesn't have a preference specified.

 

If i running spamd with -u [user] option and use your configuration, 
GLOBAL configuration never used, is that correct ? If no, what is the 
correct parameter i must use ?


Regards,
dankos.




___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com


Re: Runaway processes

2005-08-02 Thread Frank M. Cook




 
is it better to run five children with 20 connections each, or 20 children 
with five connections each?
Frank M. CookAssociation Computer Services, Inc.http://www.acsplus.com
 


Increase Performance howto

2005-08-02 Thread Dhanny Kosasih
I tested my qmail wtih more than 14000 spam (i used qmail-inject in my 
script). If i use QSheff + ClamAV + SpamAssassin, my server process 
14000 emails in 1 hour, and if i only use qmail my server process 14000 
emails in 1/3 hours. How can i increase my server performance ? I don't 
understand what 'max-connection' and '-m' for, can u tell me what is that ?


Regards,
dankos.





___ 
Yahoo! Messenger - NEW crystal clear PC to PC calling worldwide with voicemail http://uk.messenger.yahoo.com


Re: Runaway processes

2005-08-02 Thread Loren Wilton
Most strange.  Could you give us the listing frop top or the like?

The normal case, as you are probably aware, is that the children get fat
(use a lot of memory) and your system goes into thraashing.

This sounds like you have some other problem.

Are you using awl (it is on by default in 3.x) or bayes?  Possibly they are
all trying to do expire runs on a huge database that has somehow managed to
grow out of control.

The other parameter you might want to set, if you haven't, and if the kids
are getting fat, is --max-con-per-child.  It defaults to something quite
large, and setting it down to 20 or so has helped many people.

Loren



RE: Runaway processes

2005-08-02 Thread Pierre Thomson
Herb Martin wrote:
>> When people ask why I haven't upgraded from 2.64 yet... I'm waiting
>> until a week goes by without a new thread about runaway / way-slow /
>> resource-eating SA 3.0.X processes!  :-)
>> 
> 
> I suspect your wait is over 3.10 (due any day now) + 1 week
> should make you happy.
> 
> Improved thread handling and for me it works even in pre-Release.

That's great news.  I'll give it a try after the initial shakedown.  I must 
add, however, that SA 2.64 with "Spamcop URI" (SURBL), Bayes, DCC and a dash of 
SARE has been doing a great job here, 98% - 99% of spam caught with minimal 
FP's.  Together with MailScanner and a virus scanner it's handling 15,000 
emails per day on an old 800MHz PIII box, with the load average usually in the 
0.30 range.  And it's rock-solid; I've never needed to kill an SA process in 
over a year of uptime.

Pierre


Re: userpref with mysql does not work

2005-08-02 Thread Michael Parker
Martin Tanzer wrote:

> My setup:
> Debian 3.1 (sarge) with the provided spamassassin package (3.0.3-2)
> Postfix, spamassassin bound to postfix (no amavisd-new)
> There are no users on the machine, all mails are forwarded to another
> mailserver trough the transport file.
>
> Any ideas?

It seems pretty clear what is happening.  In your "test" above you did
the right thing, calling spamc with -u  and of
course it worked correctly.  Now, when you are calling it via postfiix
you are no longer sending the correct address to spamc, either by not
using the -u command line param at all or just simply sending spam as
the username.  Fix how you are calling spamc and all will be well.

Michael



signature.asc
Description: OpenPGP digital signature


Re: Forwarding mail address

2005-08-02 Thread Mike Jackson

I do understand that this can sound as a very newbie question, however i
have a doubt that i can't find an answer. We are using Spamassassin with
procmail/sendmail. It is working fine, however, spam mail is being
forwarded for a mail account, which is no longer valid. I've been
looking where this address is in the configuration, in order to forward
those mails to another account, but no luck. Any suggestion?


Track the mail through every step it would take through your system at each 
step where it could change system usernames and/or be forwarded to another 
address:


1. virtusertable
2. aliases file(s)
3. .forward file in user's home directory
4. System-wide procmailrc
5. User-specific .procmailrc

Mike Jackson
Tech Administrator, Datahost
www.datahost.com 



Re: Forwarding mail address

2005-08-02 Thread Evan Platt

At 09:00 AM 8/2/2005, you wrote:

Hi all,

I do understand that this can sound as a very newbie question, 
however i have a doubt that i can't find an answer. We are using 
Spamassassin with procmail/sendmail. It is working fine, however, 
spam mail is being forwarded for a mail account, which is no longer 
valid. I've been looking where this address is in the configuration, 
in order to forward those mails to another account, but no luck. Any 
suggestion?



You won't find it.

Spamassassin doesn't forward mail. It scans mail. This is something 
that would need to be done on your mailer, or with a procmail recipe, 
depending on your mail setup. 



Re: Forwarding mail address

2005-08-02 Thread Matt Kettler
Alexandre Cruz wrote:
> Hi all,
> 
>  
> 
> I do understand that this can sound as a very newbie question, however i
> have a doubt that i can’t find an answer. We are using Spamassassin with
> procmail/sendmail. It is working fine, however, spam mail is being
> forwarded for a mail account, which is no longer valid. I’ve been
> looking where this address is in the configuration, in order to forward
> those mails to another account, but no luck. Any suggestion?
> 
>  
> 
> Best regards,

Look at your procmailrc.

SpamAssassin itself can't forward mail, so it's not going to be in any of the SA
config files.



Forwarding mail address

2005-08-02 Thread Alexandre Cruz








Hi all,

 

I do
understand that this can sound as a very newbie question, however i have a
doubt that i can’t find an answer. We are using Spamassassin with
procmail/sendmail. It is working fine, however, spam mail is being forwarded
for a mail account, which is no longer valid. I’ve been looking where this
address is in the configuration, in order to forward those mails to another
account, but no luck. Any suggestion?

 

Best regards,

Alexandre
Cruzx 

 








Re: runaway processes

2005-08-02 Thread Tom Gwilt

My setup is as follows:

FreeBSD 4.10, SpamAssassin 3.0.4, Perl 5.8

Using Bayes and a pile 'o SARE rules.

It scanned 34484 messages last night and the only time we see lags is when 
the bayes database is expiring.


The startup script is as follows:

/usr/local/bin/spamd --max-children=6 --max-conn-per-child=20 -d -x -u
daemon -s local0"

HTH,

Tom


RE: Runaway processes

2005-08-02 Thread Herb Martin
> > When people ask why I haven't upgraded from 2.64 yet... I'm waiting 
> > until a week goes by without a new thread about runaway / 
> way-slow / 
> > resource-eating SA 3.0.X processes!  :-)
> >

I suspect your wait is over 3.10 (due any day now) + 1 week
should make you happy.

Improved thread handling and for me it works even in pre-Release.

--
Herb Martin




Re: Runaway processes

2005-08-02 Thread Mike Jackson
Sorry, no, that didn't come out right. There's only six children running at 
any time. Each will process 30 messages, then restart. The machine processed 
about 3200 messages yesterday, so each child restarted about once every 
2.5-3 hours.


Mike Jackson
Tech Administrator, Datahost
www.datahost.com


- Original Message - 
From: "Frank M. Cook" <[EMAIL PROTECTED]>

To: "Mike Jackson" <[EMAIL PROTECTED]>
Cc: 
Sent: Tuesday, August 02, 2005 08:21
Subject: Re: Runaway processes


so you are running 30 per child and 6 children?  180 total.  how many 
messages a day are you handling.  I upped my children from 5 to 15 thinking 
that would help but it hasn't.  I was thinking of taken connections down to 
5 or 6 on 15 children.  maybe I have it backwards?  I don't have anything 
else running on this computer at all so I was thinking I wanted to use up 
all the memory with children. is that off?


Frank M. Cook
Association Computer Services, Inc.
http://www.acsplus.com



Re: Runaway processes

2005-08-02 Thread Frank M. Cook



so you are running 30 per child and 6 children?  180 total.  how 
many messages a day are you handling.  I upped my children from 5 to 15 
thinking that would help but it hasn't.  I was thinking of taken 
connections down to 5 or 6 on 15 children.  maybe I have it 
backwards?  I don't have anything else running on this computer at all so I 
was thinking I wanted to use up all the memory with children. is that off?
 
Frank M. CookAssociation Computer Services, Inc.http://www.acsplus.com
 


Re: Runaway processes

2005-08-02 Thread nick

Pierre Thomson wrote:

I'm running SA 3.0.4 on OpenBSD with Perl 5.8.6 & Exim V4.52.

I'm noticing that SA seems to have a big problem with child
processes just "running away", never terminating and eating CPU.

My mailservers can't cope, and I'm looking at having to switch
off SA. (Not something I really want to do..)

No matter what I set "-m" to spamd, they all just go into this
endless death spiral..



When people ask why I haven't upgraded from 2.64 yet... I'm waiting until a 
week goes by without a new thread about runaway / way-slow / resource-eating SA 
3.0.X processes!  :-)

Pierre



It's good to know I'm not the only one with this issue.


RE: Runaway processes

2005-08-02 Thread Pierre Thomson
> I'm running SA 3.0.4 on OpenBSD with Perl 5.8.6 & Exim V4.52.
>
> I'm noticing that SA seems to have a big problem with child
> processes just "running away", never terminating and eating CPU.
>
> My mailservers can't cope, and I'm looking at having to switch
> off SA. (Not something I really want to do..)
>
> No matter what I set "-m" to spamd, they all just go into this
> endless death spiral..

When people ask why I haven't upgraded from 2.64 yet... I'm waiting until a 
week goes by without a new thread about runaway / way-slow / resource-eating SA 
3.0.X processes!  :-)

Pierre




Re: Runaway processes

2005-08-02 Thread Frank M. Cook
I've been fighting a problem which may turn out to be similar.  my 
spamassassin just starts falling behind and runaway threads could be the 
cause.  I'm going to try adjusting --max connections per child (check docs 
for exact syntax).  the default is 200.  maybe someone else will jump in 
with a recommended number but I'm thinking the default may be way too high. 
a lower number will cause each child to shut down sooner.  when the max 
number is reached the thread is stopped and a new one is created.


Frank M. Cook
Association Computer Services, Inc.
http://www.acsplus.com



Re: Load balancing spamd

2005-08-02 Thread Jason Frisvold
On 8/1/05, email builder <[EMAIL PROTECTED]> wrote:
> Even if I had forgotten the -A, I think I would have been seeing connection
> refused notices, but right now, it just seems to time out.  I'm pretty sure
> this is a LVS question more than a spamc/d question, since I've no problems
> with the latter -- I am only asking here to see if anyone else does SA
> weighted load balancing.

I kinda went the other way around..  I have multiple mail machines,
each with their own instance of spamd.  I use a Cisco 7206 VXR to do
the load balancing.  Works like a charm.

> Thanks!


-- 
Jason 'XenoPhage' Frisvold
[EMAIL PROTECTED]


Runaway processes

2005-08-02 Thread Gordon Ross
I'm running SA 3.0.4 on OpenBSD with Perl 5.8.6 & Exim V4.52.

I'm noticing that SA seems to have a big problem with child processes just 
"running away", never terminating and eating CPU.

My mailservers can't cope, and I'm looking at having to switch off SA. (Not 
something I really want to do..)

No matter what I set "-m" to spamd, they all just go into this endless death 
spiral..

GTG

Gordon Ross,
Network Manager/Rheolwr Rhydwaith
Countryside Council for Wales/Cyngor Cefn Gwlad Cymru



Re: Qmail + spamassassin + squirellmail

2005-08-02 Thread Tom Q. Citizen

Dhanny Kosasih wrote:


Hi,
  Any body know, how to install qmail + spamassassin + squirellmail 
(can tell spam to spamassassin) ? And how to make spamassassin can 
autolearn for spam ?


Regards,
dankos.


Here are two "toaster" documents I used:

http://sylvestre.ledru.info/howto/howto_qmail_vpopmail.php#vpopmail
http://www.differentpla.net/node/view/165

Good luck!

Peace...

Tom


Re: unwanted breakthrough

2005-08-02 Thread Loren Wilton
> SARE_ADLTSUB2 Subject =~ /\b(?:blow|climax
>
|enlarg(e|ment)|fuck|inter+acial|lick|porn|penis|pervert|pussy|tits|tight|va
gina|virgins?)\b/i
>
> Fix the rule, don't ditch the \b's for such a broad rule..
>
> Besides, the whole rule is subject to all kinds of obfuscation tricks.
P.e.n.i.s
> still won't match, nor any other character-insertion obfuscation.
>
> I'd suggest creating obfu rules to detect obfuscations, and don't try to
expand
> the scope of this already over-broad rule. (which will match a few FP
cases
> as-is such as "your photo enlargement is ready")

Um, I was going to point out that this rule is in the _adult set, not the
_obfu set.

Loren