The googolbees are getting craftier

2008-01-18 Thread Loren Wilton

I guess btnl is no longer working.  Now they are doing a redirect:

http://google.co.uk///pagead/iclk?sa=lai=livermorenum=970adurl=http://christmas-low-rate.tw?beast


   Loren




disable all network test except ...

2008-01-18 Thread Stefan Jakobs
Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf. And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?

Thanks for your help.
Stefan


pgpjp3YDUrg12.pgp
Description: PGP signature


Re: The googolbees are getting craftier

2008-01-18 Thread Jeff Chan

Quoting Justin Mason [EMAIL PROTECTED]:



the redirect detection should have no problem finding that...


And the redirected-to domain is on two SURBL blacklists, so it should  
be hitting.


Jeff C.


Loren Wilton writes:

I guess btnl is no longer working.  Now they are doing a redirect:

http://google.co.uk///pagead/iclk?sa=lai=livermorenum=970adurl=http://-low-rate.tw?beast


 Loren








Re: disable all network test except ...

2008-01-18 Thread Stefan Jakobs
On Friday 18 January 2008 13:46, you wrote:
 Stefan Jakobs wrote:
  Hello list,
 
  I'm using amavisd-new with spamassassin and for some tests I have to
  disable all network tests in spamassassin except for sorbs, njabl, uribl
  and maybe some other blackhole lists.
  I guess I can comment out the corresponding header lines in the files
  20_dnsbl_tests.cf and 25_uribl.cf.

 Don't do that. Your changes will get clobbered whenever you run
 sa-update or upgrade SA versions..

I know. I will run spamassassin with disabled DNS querries only for 
performance tests. And in this time I will not change the system.

   And also deactivate the plugins for razor,
  pyzor and so on. But it this enough, or is there a easier way to disable
  most of the network tests?

 Set their score to 0 in your local.cf. Note for RBL's you'll need to set
 a 0 score for the normally un-scored root rule for that RBL, which is
 the one using check_rbl, not check_rbl_sub.

 For example, to disable all the spamhaus tests:

 score__RCVD_IN_ZEN 0
 score  RCVD_IN_SBL  0
 score RCVD_IN_XBL 0
 score RCVD_IN_PBL 0

 The only one you *really* need is the first one, as that one disables
 the DNS querry. However, disabling the sub-tests will save you a little
 CPU and prevent SA from constantly checking an empty result to see if
 different IPs match it.

OK, that's good to know.
Are there some other network test which are not mentioned in the following 
files?
20_dnsbl_tests.cf
25_uribl.cf
50_scores.cf

Thanks guys.
Stefan


pgpPnoBzbnysH.pgp
Description: PGP signature


Re: disable all network test except ...

2008-01-18 Thread mouss

Stefan Jakobs wrote:

Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf. And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?
  


create a scores.cf file in the directory where you have local.cf, and 
set the scores to zero for any rule you want to disable. Look at 
50_scores.cf in the spamassassin core rules directoy for the names of 
the rules.


Re: disable all network test except ...

2008-01-18 Thread Matt Kettler

Stefan Jakobs wrote:

Hello list,

I'm using amavisd-new with spamassassin and for some tests I have to disable 
all network tests in spamassassin except for sorbs, njabl, uribl and maybe 
some other blackhole lists.
I guess I can comment out the corresponding header lines in the files 
20_dnsbl_tests.cf and 25_uribl.cf.
Don't do that. Your changes will get clobbered whenever you run 
sa-update or upgrade SA versions..
 And also deactivate the plugins for razor, 
pyzor and so on. But it this enough, or is there a easier way to disable most 
of the network tests?
Set their score to 0 in your local.cf. Note for RBL's you'll need to set 
a 0 score for the normally un-scored root rule for that RBL, which is 
the one using check_rbl, not check_rbl_sub.


For example, to disable all the spamhaus tests:

score__RCVD_IN_ZEN 0
score  RCVD_IN_SBL  0
score RCVD_IN_XBL 0
score RCVD_IN_PBL 0

The only one you *really* need is the first one, as that one disables 
the DNS querry. However, disabling the sub-tests will save you a little 
CPU and prevent SA from constantly checking an empty result to see if 
different IPs match it.





Re: The googolbees are getting craftier

2008-01-18 Thread Justin Mason

the redirect detection should have no problem finding that...

Loren Wilton writes:
 I guess btnl is no longer working.  Now they are doing a redirect:
 
 http://google.co.uk///pagead/iclk?sa=lai=livermorenum=970adurl=http://-low-rate.tw?beast
 
 
  Loren


Disabling eval rules (was: Re: Testing Botnet)

2008-01-18 Thread Karsten Bräckelmann
On Sat, 2008-01-12 at 12:23 -0800, Robert - elists wrote:

  Sounds like you've been hit by bug 5519 [1] before the upgrade in Oct.
  Setting rules scores to 0 did *not* prevent these tests from being
  evaluated for SA 3.2.x before 3.2.3.
  
  Fixed since 3.2.3.  Plugin eval rules with 0 scores are meant no not be
  evaluated, and of course to not show up in the report.

 Interesting, does this mean that we should be changing scores we care about
 and want to see eval'd in the reports to .01 or something similar?
 
 Any other implications in the bug and current or future fix methods?

AFAIK, nope. That should be all.  However...


I noticed that even my SA 3.2.4 still evaluates my URICountry plugin
rules, which are set to a score of 0.0 [1]. Which actually should *not*
happen since 3.2.3.

Anyone got a guess why?  Devs?

  guenther


[1] originally set up for exactly this testing purpose, btw

-- 
char *t=[EMAIL PROTECTED];
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: sa-learn error message

2008-01-18 Thread Brian Eliassen

Hello Craig,

I recently ran into this problem myself.  The solution, after being a 
dolt and not running a backup first, was the following sequence 
followed by line definitions:


   /etc/init.d/mailserver stop
   sa-learn --backup  /etc/mail/spamassassin/database.bak
   sa-learn --dump magic
   sa-learn --no-sync --ham --progress --mbox /export/home/brian/Ham
   sa-learn --sync
   sa-learn --no-sync --spam --progress --mbox /export/home/brian/Spam
   sa-learn --sync
   sa-learn --dump magic
   spamassassin -D --lint
   /etc/init.d/mailserver start

1) Shutdown Sendmail/ClamAV/MIMEDefang/Spamassassin.
2) Backup the database.
3) View current statistics which will also display the current bayes 
database version.

4) Do a ham learn.
5) This one was key!  Even after everything was parsed and the 
command line came back, the database was still not in a happy place. 
Doing the --sync brings it to that happy place.

6) Do a spam learn.
7) See #5.
8) View current statistics and note nham and nspam increases.
9) Run through the rules to make sure everything is still cool and no 
errors occur.

10) Start Sendmail/ClamAV/MIMEDefang/Spamassassin.

Notes:

- Doing a --sync on the sa-learn learning process didn't work.  I'm 
not sure why the system doesn't learn the file and then just resync 
the database when it's done.  Maybe Theo has an idea.
- Shutting down the MTA isn't ideal but it prevents lock file 
conflicts which don't seem to work too well under Solaris 8.  Mail 
queues in the ether for about 30 minutes while all of this is going 
on.  I've even thought about automating the process which would help 
keep the Ham and Spam files at a reasonable size and shorten that to 
about 5 minutes.


-BE



Hi again SA experts,

Note the error message in the 2nd-last line of the following transcript:

animalhead:~/sj $ sa-learn --no-rebuild --spam --mbox savejunk
The --no-rebuild option has been deprecated.  Please use --no-sync instead.
Learned tokens from 3025 message(s) (3047 message(s) examined)
animalhead:~/sj $ sa-learn --no-sync --spam thruJunk
bayes: bayes db version 0 is not able to be used, aborting! at 
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/BayesStore/DBM.pm 
line 196.

Learned tokens from 170 message(s) (170 message(s) examined)

There are 171 messages in directory thruJunk.  The largest is 495K, 
the next largest is 137K.

$ sa-learn -Vyields spamassassin v 3.2.1

What should I do about this?

I still have another directory with ham to go.  It includes lots of 
large files.  Should I delete those over a certain size?


Thanks,
Craig MacKenna




more efficent big scoring

2008-01-18 Thread George Georgalis
Noticed today (again) how long some messages take to test.  The
first thing that comes to mind is some dns is getting overloaded
answering joe-job rbldns backskatter, causing timeouts or slow
responce times.

Then I was thinking about how some tests are excluded because they
generate too much regex load, which can be problematic even if
it's a good test.

Some time back I recall a thread, amounting to why not quit
remaining tests if spam threshold is reached, the answer was some
tests have negative scores and could change the result.

So, here are two ideas, on startup, after all the conf files are
parsed create a hash that has tests sorted by score, with the
largest positive tests starting after zero, ordered like this

-5
-5
-2
-1
0
6
5
4
2
2
1

then test in that order, whenever a test brings the message
to a spam score level, exit with result. (and add a switch to
optionally run all tests)

Another approach might be simpler to integrate than above, simply
do all the negative score tests first and pull out if the score
gets to spam level.

// George


-- 
George Georgalis, information system scientist IXOYE


Re: sa-learn error message

2008-01-18 Thread Jari Fredriksson
 Hello Craig,
 
 I recently ran into this problem myself.  The solution,
 after being a dolt and not running a backup first, was
 the following sequence followed by line definitions:
 
/etc/init.d/mailserver stop
sa-learn --backup  /etc/mail/spamassassin/database.bak
sa-learn --dump magic
sa-learn --no-sync --ham --progress --mbox
/export/home/brian/Ham sa-learn --sync
sa-learn --no-sync --spam --progress --mbox
/export/home/brian/Spam sa-learn --sync
sa-learn --dump magic
spamassassin -D --lint
/etc/init.d/mailserver start
 
 1) Shutdown Sendmail/ClamAV/MIMEDefang/Spamassassin.
 2) Backup the database.
 3) View current statistics which will also display the
 current bayes database version.
 4) Do a ham learn.
 5) This one was key!  Even after everything was parsed
 and the command line came back, the database was still
 not in a happy place. Doing the --sync brings it to that
 happy place. 6) Do a spam learn.
 7) See #5.
 8) View current statistics and note nham and nspam
 increases. 9) Run through the rules to make sure
 everything is still cool and no errors occur.
 10) Start Sendmail/ClamAV/MIMEDefang/Spamassassin.
 
 Notes:
 
 - Doing a --sync on the sa-learn learning process didn't
 work.  I'm not sure why the system doesn't learn the file
 and then just resync the database when it's done.  Maybe
 Theo has an idea. - Shutting down the MTA isn't ideal but
 it prevents lock file conflicts which don't seem to work
 too well under Solaris 8.  Mail queues in the ether for
 about 30 minutes while all of this is going on.  I've
 even thought about automating the process which would
 help keep the Ham and Spam files at a reasonable size and
 shorten that to about 5 minutes. 
 
 -BE
 

Why do you put that --no-sync argument after each learning command in the first 
place? I have used it when learning several messages one at a time, and then 
later --sync

But in your script, I see no reason for 1st learning with --no-sync and then 
--sync after it.




Re: more efficent big scoring

2008-01-18 Thread Matt Kettler
You can't run the rules in score-order without driving SA's performance 
into the ground.


The key here is SA doesn't run tests sequentially, it runs them in 
parallel as it works its way through the body. this allows for good, 
efficient use of memory cache.


By running rules in score-order, you break this, forcing SA to run 
through the body multiple times, degrading performance.



George Georgalis wrote:

Noticed today (again) how long some messages take to test.  The
first thing that comes to mind is some dns is getting overloaded
answering joe-job rbldns backskatter, causing timeouts or slow
responce times.

Then I was thinking about how some tests are excluded because they
generate too much regex load, which can be problematic even if
it's a good test.

Some time back I recall a thread, amounting to why not quit
remaining tests if spam threshold is reached, the answer was some
tests have negative scores and could change the result.

So, here are two ideas, on startup, after all the conf files are
parsed create a hash that has tests sorted by score, with the
largest positive tests starting after zero, ordered like this

-5
-5
-2
-1
0
6
5
4
2
2
1

then test in that order, whenever a test brings the message
to a spam score level, exit with result. (and add a switch to
optionally run all tests)

Another approach might be simpler to integrate than above, simply
do all the negative score tests first and pull out if the score
gets to spam level.

// George


  




Re: more efficent big scoring

2008-01-18 Thread Theo Van Dinter
Yes and no.  There aren't many negative scored rules, which could easily be
put into a low priority to run first.

The issue, which is where Matt was going I believe, is that the reason score
based short circuiting was removed is that it's horribly slow to keep checking
the score after each rule runs.  You can do it at the end of a priority's run,
but then you have to split the rules across multiple priorities, which does
impact performance.

I made some comments about this kind of thing in
http://issues.apache.org/SpamAssassin/show_bug.cgi?id=3109 and envisioned SA
auto-prioritizing rules for short circuiting for things like what I mentioned
in c7, but there was some strong disagreement about things like SC based on
score and so it didn't get implemented in the current code.


On Fri, Jan 18, 2008 at 11:22:55PM -0500, Matt Kettler wrote:
 You can't run the rules in score-order without driving SA's performance 
 into the ground.
 
 The key here is SA doesn't run tests sequentially, it runs them in 
 parallel as it works its way through the body. this allows for good, 
 efficient use of memory cache.
 
 By running rules in score-order, you break this, forcing SA to run 
 through the body multiple times, degrading performance.
 
 
 George Georgalis wrote:
 Noticed today (again) how long some messages take to test.  The
 first thing that comes to mind is some dns is getting overloaded
 answering joe-job rbldns backskatter, causing timeouts or slow
 responce times.
 
 Then I was thinking about how some tests are excluded because they
 generate too much regex load, which can be problematic even if
 it's a good test.
 
 Some time back I recall a thread, amounting to why not quit
 remaining tests if spam threshold is reached, the answer was some
 tests have negative scores and could change the result.
 
 So, here are two ideas, on startup, after all the conf files are
 parsed create a hash that has tests sorted by score, with the
 largest positive tests starting after zero, ordered like this
 
 -5
 -5
 -2
 -1
 0
 6
 5
 4
 2
 2
 1
 
 then test in that order, whenever a test brings the message
 to a spam score level, exit with result. (and add a switch to
 optionally run all tests)
 
 Another approach might be simpler to integrate than above, simply
 do all the negative score tests first and pull out if the score
 gets to spam level.
 
 // George
 
 
   

-- 
Randomly Selected Tagline:
No one can feel as helpless as the owner of a sick goldfish.


pgpFz7e9zaSsp.pgp
Description: PGP signature


RE: more efficent big scoring

2008-01-18 Thread Robert - elists
 
 You can't run the rules in score-order without driving SA's performance
 into the ground.
 
 The key here is SA doesn't run tests sequentially, it runs them in
 parallel as it works its way through the body. this allows for good,
 efficient use of memory cache.
 
 By running rules in score-order, you break this, forcing SA to run
 through the body multiple times, degrading performance.
 

Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?

I do admit that I am respectfully optimistic about your teams ability to
design code that would run just as fast if not faster with a score order
end result.

Maybe you could let us make that decision with local.cf knob?

I mean, most processors are so fast nowadays..

I am thinking we would brute force it under some circumstances 'till you
folks come forth with even more brilliant design and implementation
breakthroughs.

What think?

Is there somewhere you recommend that we can view discussions on making
processing faster?

:-)

 - rh



Re: more efficent big scoring

2008-01-18 Thread jdow

From: Robert - elists [EMAIL PROTECTED]
Sent: Friday, 2008, January 18 21:14




You can't run the rules in score-order without driving SA's performance
into the ground.

The key here is SA doesn't run tests sequentially, it runs them in
parallel as it works its way through the body. this allows for good,
efficient use of memory cache.

By running rules in score-order, you break this, forcing SA to run
through the body multiple times, degrading performance.



Mr K

SA is an awesome, incredible product and tool.

Wonderful Job!

I am not an expert on the programming theory, design, and implementation
behind SA.

So... are you saying SA takes a single email and breaks it apart into
several pieces and scans those pieces via multiple processing threads and
comes back with an additive single end result for that single emails
multiple scan processing threads?


Before going further you should try to find a really good discussion of
how perl parses regular expressions. Oversimplifications can lead to
massive pessimization of the code in the name of optimization.

{^_^}