date:20080118

Re: more efficent big scoring

2008-01-18 Thread jdow


From: "Robert - elists" <[EMAIL PROTECTED]>
Sent: Friday, 2008, January 18 21:14




You can't run the rules in score-order without driving SA's performance
into the ground.

The key here is SA doesn't run tests sequentially, it runs them in
parallel as it works its way through the body. this allows for good,
efficient use of memory cache.

By running rules in score-order, you break this, forcing SA to run
through the body multiple times, degrading performance.



Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?


Before going further you should try to find a really good discussion of
how perl parses regular expressions. Oversimplifications can lead to
massive pessimization of the code in the name of optimization.

{^_^}

RE: more efficent big scoring

2008-01-18 Thread Robert - elists

> 
> You can't run the rules in score-order without driving SA's performance
> into the ground.
> 
> The key here is SA doesn't run tests sequentially, it runs them in
> parallel as it works its way through the body. this allows for good,
> efficient use of memory cache.
> 
> By running rules in score-order, you break this, forcing SA to run
> through the body multiple times, degrading performance.
> 

Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?

I do admit that I am respectfully optimistic about your teams ability to
design code that would run just as fast if not faster with a "score order"
end result.

Maybe you could let us make that decision with local.cf knob?

I mean, most processors are so fast nowadays..

I am thinking we would brute force it under some circumstances 'till you
folks come forth with even more brilliant design and implementation
breakthroughs.

What think?

Is there somewhere you recommend that we can view discussions on making
processing faster?

:-)

 - rh

Re: more efficent big scoring

2008-01-18 Thread Theo Van Dinter

Yes and no.  There aren't many negative scored rules, which could easily be
put into a low priority to run first.

The issue, which is where Matt was going I believe, is that the reason score
based short circuiting was removed is that it's horribly slow to keep checking
the score after each rule runs.  You can do it at the end of a priority's run,
but then you have to split the rules across multiple priorities, which does
impact performance.

I made some comments about this kind of thing in
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109 and envisioned SA
auto-prioritizing rules for short circuiting for things like what I mentioned
in c7, but there was some strong disagreement about things like SC based on
score and so it didn't get implemented in the current code.


On Fri, Jan 18, 2008 at 11:22:55PM -0500, Matt Kettler wrote:
> You can't run the rules in score-order without driving SA's performance 
> into the ground.
> 
> The key here is SA doesn't run tests sequentially, it runs them in 
> parallel as it works its way through the body. this allows for good, 
> efficient use of memory cache.
> 
> By running rules in score-order, you break this, forcing SA to run 
> through the body multiple times, degrading performance.
> 
> 
> George Georgalis wrote:
> >Noticed today (again) how long some messages take to test.  The
> >first thing that comes to mind is some dns is getting overloaded
> >answering joe-job rbldns backskatter, causing timeouts or slow
> >responce times.
> >
> >Then I was thinking about how some tests are excluded because they
> >generate too much regex load, which can be problematic even if
> >it's a good test.
> >
> >Some time back I recall a thread, amounting to why not quit
> >remaining tests if spam threshold is reached, the answer was some
> >tests have negative scores and could change the result.
> >
> >So, here are two ideas, on startup, after all the conf files are
> >parsed create a hash that has tests sorted by score, with the
> >largest positive tests starting after zero, ordered like this
> >
> >-5
> >-5
> >-2
> >-1
> >0
> >6
> >5
> >4
> >2
> >2
> >1
> >
> >then test in that order, whenever a test brings the message
> >to a spam score level, exit with result. (and add a switch to
> >optionally run all tests)
> >
> >Another approach might be simpler to integrate than above, simply
> >do all the negative score tests first and pull out if the score
> >gets to spam level.
> >
> >// George
> >
> >
> >  

-- 
Randomly Selected Tagline:
No one can feel as helpless as the owner of a sick goldfish.


pgpFz7e9zaSsp.pgp
Description: PGP signature

Re: more efficent big scoring

2008-01-18 Thread Matt Kettler

You can't run the rules in score-order without driving SA's performance 
into the ground.


The key here is SA doesn't run tests sequentially, it runs them in 
parallel as it works its way through the body. this allows for good, 
efficient use of memory cache.


By running rules in score-order, you break this, forcing SA to run 
through the body multiple times, degrading performance.



George Georgalis wrote:

Noticed today (again) how long some messages take to test.  The
first thing that comes to mind is some dns is getting overloaded
answering joe-job rbldns backskatter, causing timeouts or slow
responce times.

Then I was thinking about how some tests are excluded because they
generate too much regex load, which can be problematic even if
it's a good test.

Some time back I recall a thread, amounting to why not quit
remaining tests if spam threshold is reached, the answer was some
tests have negative scores and could change the result.

So, here are two ideas, on startup, after all the conf files are
parsed create a hash that has tests sorted by score, with the
largest positive tests starting after zero, ordered like this

-5
-5
-2
-1
0
6
5
4
2
2
1

then test in that order, whenever a test brings the message
to a spam score level, exit with result. (and add a switch to
optionally run all tests)

Another approach might be simpler to integrate than above, simply
do all the negative score tests first and pull out if the score
gets to spam level.

// George

Re: sa-learn error message

2008-01-18 Thread Jari Fredriksson

> Hello Craig,
> 
> I recently ran into this problem myself.  The solution,
> after being a dolt and not running a backup first, was
> the following sequence followed by line definitions:
> 
>/etc/init.d/mailserver stop
>sa-learn --backup > /etc/mail/spamassassin/database.bak
>sa-learn --dump magic
>sa-learn --no-sync --ham --progress --mbox
>/export/home/brian/Ham sa-learn --sync
>sa-learn --no-sync --spam --progress --mbox
>/export/home/brian/Spam sa-learn --sync
>sa-learn --dump magic
>spamassassin -D --lint
>/etc/init.d/mailserver start
> 
> 1) Shutdown Sendmail/ClamAV/MIMEDefang/Spamassassin.
> 2) Backup the database.
> 3) View current statistics which will also display the
> current bayes database version.
> 4) Do a ham learn.
> 5) This one was key!  Even after everything was parsed
> and the command line came back, the database was still
> not in a happy place. Doing the --sync brings it to that
> happy place. 6) Do a spam learn.
> 7) See #5.
> 8) View current statistics and note nham and nspam
> increases. 9) Run through the rules to make sure
> everything is still cool and no errors occur.
> 10) Start Sendmail/ClamAV/MIMEDefang/Spamassassin.
> 
> Notes:
> 
> - Doing a --sync on the sa-learn learning process didn't
> work.  I'm not sure why the system doesn't learn the file
> and then just resync the database when it's done.  Maybe
> Theo has an idea. - Shutting down the MTA isn't ideal but
> it prevents lock file conflicts which don't seem to work
> too well under Solaris 8.  Mail queues in the ether for
> about 30 minutes while all of this is going on.  I've
> even thought about automating the process which would
> help keep the Ham and Spam files at a reasonable size and
> shorten that to about 5 minutes. 
> 
> -BE
> 

Why do you put that --no-sync argument after each learning command in the first 
place? I have used it when learning several messages one at a time, and then 
later --sync

But in your script, I see no reason for 1st learning with --no-sync and then 
--sync after it.

Re: sa-learn error message

2008-01-18 Thread Brian Eliassen


Hello Craig,

I recently ran into this problem myself.  The solution, after being a 
dolt and not running a backup first, was the following sequence 
followed by line definitions:


   /etc/init.d/mailserver stop
   sa-learn --backup > /etc/mail/spamassassin/database.bak
   sa-learn --dump magic
   sa-learn --no-sync --ham --progress --mbox /export/home/brian/Ham
   sa-learn --sync
   sa-learn --no-sync --spam --progress --mbox /export/home/brian/Spam
   sa-learn --sync
   sa-learn --dump magic
   spamassassin -D --lint
   /etc/init.d/mailserver start

1) Shutdown Sendmail/ClamAV/MIMEDefang/Spamassassin.
2) Backup the database.
3) View current statistics which will also display the current bayes 
database version.

4) Do a ham learn.
5) This one was key!  Even after everything was parsed and the 
command line came back, the database was still not in a happy place. 
Doing the --sync brings it to that happy place.

6) Do a spam learn.
7) See #5.
8) View current statistics and note nham and nspam increases.
9) Run through the rules to make sure everything is still cool and no 
errors occur.

10) Start Sendmail/ClamAV/MIMEDefang/Spamassassin.

Notes:

- Doing a --sync on the sa-learn learning process didn't work.  I'm 
not sure why the system doesn't learn the file and then just resync 
the database when it's done.  Maybe Theo has an idea.
- Shutting down the MTA isn't ideal but it prevents lock file 
conflicts which don't seem to work too well under Solaris 8.  Mail 
queues in the ether for about 30 minutes while all of this is going 
on.  I've even thought about automating the process which would help 
keep the Ham and Spam files at a reasonable size and shorten that to 
about 5 minutes.


-BE



Hi again SA experts,

Note the error message in the 2nd-last line of the following transcript:

animalhead:~/sj $ sa-learn --no-rebuild --spam --mbox savejunk
The --no-rebuild option has been deprecated.  Please use --no-sync instead.
Learned tokens from 3025 message(s) (3047 message(s) examined)
animalhead:~/sj $ sa-learn --no-sync --spam thruJunk
bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/DBM.pm 
line 196.

Learned tokens from 170 message(s) (170 message(s) examined)

There are 171 messages in directory thruJunk.  The largest is 495K, 
the next largest is 137K.

$ sa-learn -Vyields "spamassassin v 3.2.1"

What should I do about this?

I still have another directory with ham to go.  It includes lots of 
large files.  Should I delete those over a certain size?


Thanks,
Craig MacKenna

Disabling eval rules (was: Re: Testing Botnet)

2008-01-18 Thread Karsten Bräckelmann

On Sat, 2008-01-12 at 12:23 -0800, Robert - elists wrote:

> > Sounds like you've been hit by bug 5519 [1] before the upgrade in Oct.
> > Setting rules scores to 0 did *not* prevent these tests from being
> > evaluated for SA 3.2.x before 3.2.3.
> > 
> > Fixed since 3.2.3.  Plugin eval rules with 0 scores are meant no not be
> > evaluated, and of course to not show up in the report.

> Interesting, does this mean that we should be changing scores we care about
> and want to see eval'd in the reports to .01 or something similar?
> 
> Any other implications in the bug and current or future fix methods?

AFAIK, nope. That should be all.  However...


I noticed that even my SA 3.2.4 still evaluates my URICountry plugin
rules, which are set to a score of 0.0 [1]. Which actually should *not*
happen since 3.2.3.

Anyone got a guess why?  Devs?

  guenther


[1] originally set up for exactly this testing purpose, btw

-- 
char *t="[EMAIL PROTECTED]";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

more efficent big scoring

2008-01-18 Thread George Georgalis

Noticed today (again) how long some messages take to test.  The
first thing that comes to mind is some dns is getting overloaded
answering joe-job rbldns backskatter, causing timeouts or slow
responce times.

Then I was thinking about how some tests are excluded because they
generate too much regex load, which can be problematic even if
it's a good test.

Some time back I recall a thread, amounting to why not quit
remaining tests if spam threshold is reached, the answer was some
tests have negative scores and could change the result.

So, here are two ideas, on startup, after all the conf files are
parsed create a hash that has tests sorted by score, with the
largest positive tests starting after zero, ordered like this

-5
-5
-2
-1
0
6
5
4
2
2
1

then test in that order, whenever a test brings the message
to a spam score level, exit with result. (and add a switch to
optionally run all tests)

Another approach might be simpler to integrate than above, simply
do all the negative score tests first and pull out if the score
gets to spam level.

// George


-- 
George Georgalis, information system scientist <

Re: disable all network test except ...

2008-01-18 Thread Stefan Jakobs

On Friday 18 January 2008 13:46, you wrote:
> Stefan Jakobs wrote:
> > Hello list,
> >
> > I'm using amavisd-new with spamassassin and for some tests I have to
> > disable all network tests in spamassassin except for sorbs, njabl, uribl
> > and maybe some other blackhole lists.
> > I guess I can comment out the corresponding header lines in the files
> > 20_dnsbl_tests.cf and 25_uribl.cf.
>
> Don't do that. Your changes will get clobbered whenever you run
> sa-update or upgrade SA versions..

I know. I will run spamassassin with disabled DNS querries only for 
performance tests. And in this time I will not change the system.

> >  And also deactivate the plugins for razor,
> > pyzor and so on. But it this enough, or is there a easier way to disable
> > most of the network tests?
>
> Set their score to 0 in your local.cf. Note for RBL's you'll need to set
> a 0 score for the normally un-scored "root" rule for that RBL, which is
> the one using check_rbl, not check_rbl_sub.
>
> For example, to disable all the spamhaus tests:
>
> score__RCVD_IN_ZEN 0
> score  RCVD_IN_SBL  0
> score RCVD_IN_XBL 0
> score RCVD_IN_PBL 0
>
> The only one you *really* need is the first one, as that one disables
> the DNS querry. However, disabling the sub-tests will save you a little
> CPU and prevent SA from constantly checking an empty result to see if
> different IPs match it.

OK, that's good to know.
Are there some other network test which are not mentioned in the following 
files?
20_dnsbl_tests.cf
25_uribl.cf
50_scores.cf

Thanks guys.
Stefan


pgpPnoBzbnysH.pgp
Description: PGP signature

Re: disable all network test except ...

2008-01-18 Thread mouss


Stefan Jakobs wrote:

Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf. And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?
  


create a scores.cf file in the directory where you have local.cf, and 
set the scores to zero for any rule you want to disable. Look at 
50_scores.cf in the spamassassin "core" rules directoy for the names of 
the rules.

Re: The googolbees are getting craftier

2008-01-18 Thread Jeff Chan


Quoting Justin Mason <[EMAIL PROTECTED]>:



the redirect detection should have no problem finding that...


And the redirected-to domain is on two SURBL blacklists, so it should  
be hitting.


Jeff C.


Loren Wilton writes:

I guess btnl is no longer working.  Now they are doing a redirect:

http://google.co.uk///pagead/iclk?sa=l&ai=livermore&num=970&adurl=http://-low-rate.tw?beast


 Loren

Re: disable all network test except ...

2008-01-18 Thread Matt Kettler


Stefan Jakobs wrote:

Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf.
Don't do that. Your changes will get clobbered whenever you run 
sa-update or upgrade SA versions..
 And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?
Set their score to 0 in your local.cf. Note for RBL's you'll need to set 
a 0 score for the normally un-scored "root" rule for that RBL, which is 
the one using check_rbl, not check_rbl_sub.


For example, to disable all the spamhaus tests:

score__RCVD_IN_ZEN 0
score  RCVD_IN_SBL  0
score RCVD_IN_XBL 0
score RCVD_IN_PBL 0

The only one you *really* need is the first one, as that one disables 
the DNS querry. However, disabling the sub-tests will save you a little 
CPU and prevent SA from constantly checking an empty result to see if 
different IPs match it.

Re: The googolbees are getting craftier

2008-01-18 Thread Justin Mason


the redirect detection should have no problem finding that...

Loren Wilton writes:
> I guess btnl is no longer working.  Now they are doing a redirect:
> 
> http://google.co.uk///pagead/iclk?sa=l&ai=livermore&num=970&adurl=http://-low-rate.tw?beast
> 
> 
>  Loren

disable all network test except ...

2008-01-18 Thread Stefan Jakobs

Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf. And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?

Thanks for your help.
Stefan


pgpjp3YDUrg12.pgp
Description: PGP signature

The googolbees are getting craftier

2008-01-18 Thread Loren Wilton


I guess btnl is no longer working.  Now they are doing a redirect:

http://google.co.uk///pagead/iclk?sa=l&ai=livermore&num=970&adurl=http://christmas-low-rate.tw?beast


   Loren

Re: more efficent big scoring

RE: more efficent big scoring

Re: more efficent big scoring

Re: more efficent big scoring

Re: sa-learn error message

Re: sa-learn error message

Disabling eval rules (was: Re: Testing Botnet)

more efficent big scoring

Re: disable all network test except ...

Re: disable all network test except ...

Re: The googolbees are getting craftier

Re: disable all network test except ...

Re: The googolbees are getting craftier

disable all network test except ...

The googolbees are getting craftier

15 matches

Site Navigation

Mail list logo

Footer information