Re: Nasty bug? in 3.1.1 headers inserting?

2006-05-10 Thread Justin Mason

Daryl C. W. O'Shea writes:
 On 5/9/2006 2:16 PM, Theo Van Dinter wrote:
 
  There's some difference of opinion around this question, but my general
  opinion is that there should be an update to spamass-milter which
  properly handles the newlines either way.  I'm not sure whether or not
  that's happened yet.
 
 As discussed in this SA bug:
 
 http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4844
 
 this spamass-milter bug has a (confirmed to work) patch that fixes the 
 problem with spamass-milter:
 
 http://savannah.nongnu.org/bugs/?func=detailitemitem_id=16164
 
 
 I do not know if there is an updated spamass-milter release.  I'm 
 assuming there isn't since their bug is still open.

by the way this is a FAQ, too.

  http://wiki.apache.org/spamassassin/SaMilter030CorruptMsgs

--j.


RE: Nasty bug? in 3.1.1 headers inserting?

2006-05-10 Thread Sietse van Zanen
Thanks for all of your replies.
 
Think I should have kept a closer eye on the milter. I use DAG WIers packages 
for RHEL3 and he doesn;t have the 0.3.1 available yet. Never cared to look 
whether there was an update of the milter and therefor missed the issue.
 
Appologies for any inconveniences on the mailing list. I will compile the 
milter tonight, as I first have to dig up the source for the sendmail version 
I'm using.
 
Furthermore I did some digging in RFC822, and this is what I found:
 

 3.  LEXICAL ANALYSIS OF MESSAGES

 3.1.  GENERAL DESCRIPTION

A message consists of header fields and, optionally, a body.
 The  body  is simply a sequence of lines containing ASCII charac-
 ters.  It is separated from the headers by a null line  (i.e.,  a
 line with nothing preceding the CRLF).


Esto, the \r followed by the \n is against the RFC (Two line feeds is a CRLF on 
a null line), as it should be followed by a white space (or tab). I don't know 
exactly if it is spamassassin inserting this sequence or the milter. But if 
it's spamassassin it should be corrected there I think. If it's the milter it's 
already been fixed.
 
So in the end the Exchage server is actually adhering the RFC, who would've 
guessed that. :-)
 
-Sietse
 


From: Justin Mason [mailto:[EMAIL PROTECTED]
Sent: Wed 10-May-06 12:03
To: Daryl C. W. O'Shea
Cc: users@spamassassin.apache.org
Subject: Re: Nasty bug? in 3.1.1 headers inserting? 



version=3.1.1
X-Spam-Checker-Version: SpamAssassin 3.1.1 (2006-03-10) on zpm.wizdom.nu
X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on 
zpm.wizdom.nu
X-Virus-Status: Clean
Return-Path: [EMAIL PROTECTED]
X-OriginalArrivalTime: 10 May 2006 10:04:19.0072 (UTC) 
FILETIME=[1A4A5000:01C67419]


Daryl C. W. O'Shea writes:
 On 5/9/2006 2:16 PM, Theo Van Dinter wrote:

  There's some difference of opinion around this question, but my general
  opinion is that there should be an update to spamass-milter which
  properly handles the newlines either way.  I'm not sure whether or not
  that's happened yet.

 As discussed in this SA bug:

 http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4844

 this spamass-milter bug has a (confirmed to work) patch that fixes the
 problem with spamass-milter:

 http://savannah.nongnu.org/bugs/?func=detailitemitem_id=16164


 I do not know if there is an updated spamass-milter release.  I'm
 assuming there isn't since their bug is still open.

by the way this is a FAQ, too.

  http://wiki.apache.org/spamassassin/SaMilter030CorruptMsgs

--j.




Spamassassin + Kaspersky SMTP-Scanner

2006-05-10 Thread Thomas Gross






Hi List!

I'm runing a debian mailserver with qmail 1.03, vpopmail and kaspersky anti-virus smtp-scanner 5.5.3.

Now i wanted to add the latest spamassassin to filter the spam which grows up to 500 mails per day.

I searched the whole internet for a possible configuration with no result.

Has anyone a solution for that ?

Thank you!

Regards,

Thomas Gross











Re: Spamassassin + Kaspersky SMTP-Scanner

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 13:19 Thomas Gross wrote:
 Has anyone a solution for that ?

I have this for postfix, but not qmail.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpGQtU9aXnwz.pgp
Description: PGP signature


Bayes not working

2006-05-10 Thread Steven Stern
On a new SA installation that's as identical to the other 3 we have
running as possible, bayes is not running.

spamassassin -D --lint indicates that all is normal. The test message
generates a Bayes score.  sa-learn is able to talk to the mysql
database:  We're able to update the database using sa-learn.

However, in production, spamassassin does not report any BAYES_ scores.
 When the spam value exceeds the threshold that would normally cause
autolearning, autolearn=no changes to autolearn=unavailable.
Similarly, AWL entries are not being created.

Can anyone see what's wrong?

[3320] dbg: config: read file /usr/share/spamassassin/23_bayes.cf
[3320] dbg: bayes: using username: root
[3320] dbg: bayes: database connection established
[3320] dbg: bayes: found bayes db version 3
[3320] dbg: bayes: Using userid: 1
[3320] dbg: bayes: corpus size: nspam = 178, nham = 168
[3320] dbg: bayes: tok_get_all: token count: 20
[3320] dbg: bayes: score = 0.913557143318889
[3320] dbg: rules: ran eval rule BAYES_80 == got hit
[3320] dbg: auto-whitelist: sql-based connected to
DBI:mysql:sa_bayes:ccim-mx2
[3320] dbg: auto-whitelist: sql-based finish: disconnected from
DBI:mysql:sa_bayes:ccim-mx2
[3320] dbg: check:
tests=BAYES_80,MISSING_SUBJECT,NO_REAL_NAME,NO_RECEIVED,NO_RELAYS,TO_CC_NONE



# grep -i bayes local.cf
# Enable the Bayes system
use_bayes   1
# Enable Bayes auto-learning
bayes_auto_learn1
bayes_min_ham_num   100
bayes_min_spam_num  100
# bayes_path/var/spool/spamassassin/bayes
bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
bayes_sql_dsn   DBI:mysql:sa_bayes:ccim-mx2
bayes_sql_username  spamass
bayes_sql_password  xxx
bayes_sql_override_username root
bayes_auto_expire   0
user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2

# grep -i awl local.cf
user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2
user_awl_sql_table   awl
user_awl_sql_username   spamass
user_awl_sql_password   xxx
user_awl_sql_override_username  root


]# ps -ef |grep spam
root  2170 1  0 07:01 ?00:00:04 /usr/bin/spamd -d -c -m5
-H -r /var/run/spamd.pid
root  2247  2170  1 07:01 ?00:00:20 spamd child
root  2248  2170  0 07:01 ?00:00:00 spamd child
sa-milt   3264 1  0 07:15 pts/000:00:00 /bin/bash
/usr/sbin/spamass-milter-wrapper -p
/var/run/spamass-milter/spamass-milter.sock -P
/var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783
sa-milt   3265  3264  0 07:15 pts/000:00:00 /usr/sbin/spamass-milter
-p /var/run/spamass-milter/spamass-milter.sock -P
/var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783


SpamAssassin version 3.1.1
  running on Perl version 5.8.6
spamass-milter - Version 0.3.1


-- 

  Steve


RE: limit child process

2006-05-10 Thread Jean-Paul Natola

 
| Spamd calls it,
| 
| But I have seen my monitor , on more than one occasion, with this error,
| 
| swap_pager_getswapspace: failed
| 
| and the worst part is I don't realize it until I hit the KVM switch , and
| actually get on the console -  
| 
| so can I customize spamd to a lower limit?
| 
| I noticed after I stop /restart spamd my swap goes back to normal


spamd -m 

and what would be an ideal number to set it  ?

I came in this morning ,  got a bunch of those swap message , and my VM is at
86% right now


RE: Bayes not working

2006-05-10 Thread Greg Allen
You will probably get more ideas from posters, but here is my thought.

Are you running spamassassin -D --lint as the user that SA runs under when
it is running live?

For instance, I call SA with the user filter, not the user root.

So, to properly test SA I have to first type: su filter

This makes me a superuser filter.

Now I get a real test of SA when I run spamassassin -D --lint

It looks like you may be testing with the user root?

SA really should not run live under the root user.

Well, that's my idea.

Good luck




 -Original Message-
 From: Steven Stern [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, May 10, 2006 8:24 AM
 To: Spamass
 Subject: Bayes not working


 On a new SA installation that's as identical to the other 3 we have
 running as possible, bayes is not running.

 spamassassin -D --lint indicates that all is normal. The test message
 generates a Bayes score.  sa-learn is able to talk to the mysql
 database:  We're able to update the database using sa-learn.

 However, in production, spamassassin does not report any BAYES_ scores.
  When the spam value exceeds the threshold that would normally cause
 autolearning, autolearn=no changes to autolearn=unavailable.
 Similarly, AWL entries are not being created.

 Can anyone see what's wrong?

 [3320] dbg: config: read file /usr/share/spamassassin/23_bayes.cf
 [3320] dbg: bayes: using username: root
 [3320] dbg: bayes: database connection established
 [3320] dbg: bayes: found bayes db version 3
 [3320] dbg: bayes: Using userid: 1
 [3320] dbg: bayes: corpus size: nspam = 178, nham = 168
 [3320] dbg: bayes: tok_get_all: token count: 20
 [3320] dbg: bayes: score = 0.913557143318889
 [3320] dbg: rules: ran eval rule BAYES_80 == got hit
 [3320] dbg: auto-whitelist: sql-based connected to
 DBI:mysql:sa_bayes:ccim-mx2
 [3320] dbg: auto-whitelist: sql-based finish: disconnected from
 DBI:mysql:sa_bayes:ccim-mx2
 [3320] dbg: check:
 tests=BAYES_80,MISSING_SUBJECT,NO_REAL_NAME,NO_RECEIVED,NO_RELAYS,
 TO_CC_NONE



 # grep -i bayes local.cf
 # Enable the Bayes system
 use_bayes   1
 # Enable Bayes auto-learning
 bayes_auto_learn1
 bayes_min_ham_num   100
 bayes_min_spam_num  100
 # bayes_path/var/spool/spamassassin/bayes
 bayes_store_module  Mail::SpamAssassin::BayesStore::MySQL
 bayes_sql_dsn   DBI:mysql:sa_bayes:ccim-mx2
 bayes_sql_username  spamass
 bayes_sql_password  xxx
 bayes_sql_override_username root
 bayes_auto_expire   0
 user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2

 # grep -i awl local.cf
 user_awl_dsnDBI:mysql:sa_bayes:ccim-mx2
 user_awl_sql_table   awl
 user_awl_sql_username   spamass
 user_awl_sql_password   xxx
 user_awl_sql_override_username  root


 ]# ps -ef |grep spam
 root  2170 1  0 07:01 ?00:00:04 /usr/bin/spamd -d -c -m5
 -H -r /var/run/spamd.pid
 root  2247  2170  1 07:01 ?00:00:20 spamd child
 root  2248  2170  0 07:01 ?00:00:00 spamd child
 sa-milt   3264 1  0 07:15 pts/000:00:00 /bin/bash
 /usr/sbin/spamass-milter-wrapper -p
 /var/run/spamass-milter/spamass-milter.sock -P
 /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783
 sa-milt   3265  3264  0 07:15 pts/000:00:00 /usr/sbin/spamass-milter
 -p /var/run/spamass-milter/spamass-milter.sock -P
 /var/run/spamass-milter.pid -i 127.0.0.1 -r 10 -- -d localhost -p 783


 SpamAssassin version 3.1.1
   running on Perl version 5.8.6
 spamass-milter - Version 0.3.1


 --

   Steve





Re: Spamassassin + Kaspersky SMTP-Scanner

2006-05-10 Thread Rick Macdougall

Thomas Gross wrote:

Hi List!

 


I'm runing a debian mailserver with qmail 1.03, vpopmail and kaspersky
anti-virus smtp-scanner 5.5.3.

 


Now i wanted to add the latest spamassassin to filter the spam which grows
up to 500 mails per day.

 


I searched the whole internet for a possible configuration with no result.

 


Has anyone a solution for that ?



simscan.

You can find it at http://www.inter7.com/?page=simscan

Regards,

Rick



whitelist_from_rcvd not working

2006-05-10 Thread Robert Fitzpatrick
Can someone point out what I am doing wrong hereI have this in my
local.cf file:

whitelist_from_rcvd [EMAIL PROTECTED] mail*.magnetmail.net

But messages are getting blocked that I believe should match this?

May  5 14:54:19 esmtp postfix/smtpd[994]: 9315B7FA20: 
client=mail10.magnetmail.net[209.18.70.10]
May  5 14:54:20 esmtp postfix/cleanup[3083]: 9315B7FA20: message-id=[EMAIL 
PROTECTED]
May  5 14:54:36 esmtp postfix/qmgr[39594]: 9315B7FA20: from=, size=55412, 
nrcpt=1 (queue active)
May  5 14:54:47 esmtp amavis[3767]: (03767-02-2) Blocked SPAM, [209.18.70.10] 
 - [EMAIL PROTECTED], quarantine: spam-u95sUSnhhshW.gz, Message-ID: 
[EMAIL PROTECTED], mail_id: u95sUSnhhshW, Hits: 7.069, 11177 ms
May  5 14:54:47 esmtp postfix/smtp[2820]: 9315B7FA20: to=[EMAIL PROTECTED], 
relay=127.0.0.1[127.0.0.1], delay=28, status=sent (250 2.5.0 Ok, id=03767-02-2, 
BOUNCE)
May  5 14:54:47 esmtp postfix/qmgr[39594]: 9315B7FA20: removed

-- 
Robert



Re: Bayes not working

2006-05-10 Thread Andy Spiegl
 [3320] dbg: bayes: corpus size: nspam = 178, nham = 168
Probably because your corpus is still too small.

man Mail::SpamAssassin::Conf
...
   bayes_min_ham_num(Default: 200)
   bayes_min_spam_num   (Default: 200)
   To be accurate, the Bayes system does not activate until a
   certain number of ham (non-spam) and spam have been learned.
   The default is 200 of each ham and spam, but you can tune these
   up or down with these two settings.
...

Bye,
 Andy.

-- 
 Finagle's Sixth Law: Don't believe in miracles -- rely on them.


Re: Bayes not working

2006-05-10 Thread Steven Stern

Andy Spiegl wrote:

[3320] dbg: bayes: corpus size: nspam = 178, nham = 168


Probably because your corpus is still too small.

man Mail::SpamAssassin::Conf
...
   bayes_min_ham_num(Default: 200)
   bayes_min_spam_num   (Default: 200)
   To be accurate, the Bayes system does not activate until a
   certain number of ham (non-spam) and spam have been learned.
   The default is 200 of each ham and spam, but you can tune these
   up or down with these two settings.
  
I imported a corpus of about 2 messages total and it wasn't working. 
I blew it all away and started from scratch thinking that was the 
problem.  For now, local.cf has a minimum of 100 messages of each type. 
The current database exceeds that.


Re: whitelist_from_rcvd not working

2006-05-10 Thread Matt Kettler
Robert Fitzpatrick wrote:
 Can someone point out what I am doing wrong hereI have this in my
 local.cf file:

 whitelist_from_rcvd [EMAIL PROTECTED] mail*.magnetmail.net

 But messages are getting blocked that I believe should match this?
   
What about the below suggests this mail is [EMAIL PROTECTED] The below
suggests that the message is from  (A bounce), but is being delivered
to [EMAIL PROTECTED]

 May  5 14:54:19 esmtp postfix/smtpd[994]: 9315B7FA20: 
 client=mail10.magnetmail.net[209.18.70.10]
 May  5 14:54:20 esmtp postfix/cleanup[3083]: 9315B7FA20: message-id=[EMAIL 
 PROTECTED]
 May  5 14:54:36 esmtp postfix/qmgr[39594]: 9315B7FA20: from=, size=55412, 
 nrcpt=1 (queue active)
 May  5 14:54:47 esmtp amavis[3767]: (03767-02-2) Blocked SPAM, [209.18.70.10] 
  - [EMAIL PROTECTED], quarantine: spam-u95sUSnhhshW.gz, Message-ID: 
 [EMAIL PROTECTED], mail_id: u95sUSnhhshW, Hits: 7.069, 11177 ms
 May  5 14:54:47 esmtp postfix/smtp[2820]: 9315B7FA20: to=[EMAIL PROTECTED], 
 relay=127.0.0.1[127.0.0.1], delay=28, status=sent (250 2.5.0 Ok, 
 id=03767-02-2, BOUNCE)
 May  5 14:54:47 esmtp postfix/qmgr[39594]: 9315B7FA20: removed

   



RE: limit child process

2006-05-10 Thread Chris Santerre
Title: RE: limit child process







 -Original Message-
 From: Jean-Paul Natola [mailto:[EMAIL PROTECTED]]
 Sent: Wednesday, May 10, 2006 8:28 AM
 To: ; Matt Kettler
 Cc: users@spamassassin.apache.org
 Subject: RE: limit child process
 
 
 
 
 | Spamd calls it,
 | 
 | But I have seen my monitor , on more than one occasion, 
 with this error,
 | 
 | swap_pager_getswapspace: failed
 | 
 | and the worst part is I don't realize it until I hit the 
 KVM switch , and
 | actually get on the console - 
 | 
 | so can I customize spamd to a lower limit?
 | 
 | I noticed after I stop /restart spamd my swap goes back to normal
 
 
 spamd -m 
 
 and what would be an ideal number to set it ?
 
 I came in this morning , got a bunch of those swap message , 
 and my VM is at
 86% right now


There is no real answer to that. It depends on traffic and ram. Start with 4 and see how it goes. Monitor and adjust. 


--Chris 





RE: Strange Bayes results

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
 On Dienstag, 9. Mai 2006 23:14 Bowie Bailey wrote:
  When I look at the overall stats, bayes does pretty good:
  RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
  
 6BAYES_9926754 4.19   44.49   67.003.06
 
 3% HAM hits for BAYES_99 is horrible, not good. It's the FP that
 should make you alert.

True enough.  But no complaints so far.  I'm not sure how many of my
clients are even taking advantage of the spam markup.

  But when I do it for only our domain (which is where all the manual
  training happens), it hits less ham, but less spam as well:
  RANKRULE NAME   COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
  
 8BAYES_99 4649 3.29   33.41   54.640.20
 
 At least much better FP rate, by a factor of 15!
 
  Just my personal email address (which is trained aggressively) gets
  very few ham hits (partly because I lowered my threshold to 4.0),
  but less spam than overall: RANKRULE NAME   COUNT %OFRULES
  %OFMAIL %OFSPAM  %OFHAM
  5  
  BAYES_99 1643 3.08   27.05   65.720.08 
 
 Again the FPs reduced...

Of course, it's being constantly trained and the spam threshold is
lower.  I am curious why I don't get more spam hits with a
well-trained database.

  And then when I modify sa-stats to exclude our domain, I find that
  our customers (who are trained exclusively with autolearn) seem to
  do better than us: RANKRULE NAME   COUNT %OFRULES %OFMAIL
  %OFSPAM  %OFHAM
  6  
  BAYES_9922105 4.44   47.83   70.354.11 
 
 No, 4% FPs is nothing you should be happy with.
 
  Based on these results, it almost seems like the more training Bayes
  gets, the worse it does!
 
 But remember that sa-stats can never tell if that HAM/SPAM are really
 such, it just tells you what it *believed* was HAM/SPAM.

Right. That's what I was referring to below.

  Are these anomolies just an artifact of sa-stats relying on SA to
  judge ham and spam properly?  Can these numbers be trusted at all if
  my users don't reliably report false negatives and positives?
 
 As I said on the other thread: Be very careful what you feed to bayes.
 Try to find those 4% of FPs, and if they are really FPs. Maybe your SA
 made the mistakes because you don't have enough rules to detect all
 SPAMs.

The group with 4% false positives is trained exclusively through
autolearn.  There is no facility for manual training with those
accounts.

If I follow the false positives, it lines up with expectations.  The
more manual training in the group, the lower the false positives.  Why
don't I see a similar trend with the spam hits?

-- 
Bowie


RE: Strange Bayes results

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
 On Dienstag, 9. Mai 2006 23:32 Bowie Bailey wrote:
  And as an additional data point, I found this for one of our
  internal users who has never done any manual training:
  RANKRULE NAME COUNT %OFRULES %OFMAIL %OFSPAM  %OFHAM
  
 1BAYES_99373 6.76   78.20   95.640.00
 1BAYES_00 7320.51   15.300.00   83.91
 
 It at least looks as if he didn't feed wrong messages. Is bayes auto
 learn set?

Yes, this user is set with all the default options for Bayes learning
and a spam threshold of 5.0.  The entire Bayes database was created
via autolearn for this user.

It seems to me that Bayes is highly sensitive to the types of ham and
spam that each user gets.  This user has a near perfect Bayes
database created with autolearn.  No false positives or negatives and
95% of spam hit by BAYES_99.  My account, on the other hand, has a few
false positives and only a 66% spam hit rate despite aggressive manual
training.

-- 
Bowie


RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
 On Dienstag, 9. Mai 2006 23:01 Bowie Bailey wrote:
  Hmm... If you are training Bayes, and all of your ham is in English,
  then what does Bayes do with the Chinese ham your customers get?
 
 Nothing. But you won't get a SPAM report from bayes if the e-mail is
 chinese and you never feed chinese language e-mail. So no FPs.

I guess that would work if you simply don't feed Bayes with any
foreign language material at all.

  True, spam is spam.  It's the vast differences in ham that I am more
  worried about.  Our customers are salesmen for the most part, so
  they are constantly sending and receiving marketing type emails.
  For us, marketing stuff is almost always considered spam.  I think
  this would cause a problem with false positives for our customers
  if I train Bayes based on our idea of ham and spam.
 
 The important thing is that you should *never* feed to bayes something
 that *could* be a legit e-mail. Most people seem to make that error. I
 do NOT feed SPAM nor HAM that could be a legit mail.

So you are saying that I should not feed Bayes with the unsolicited
marketing garbage that I get because it looks like something that
could have been requested?

 Just those nigerian who want to give you some million $ because you
 are so nice, or those lotteries where you won a lot but before you
 have to pay, the very good jobs a lot of people seem to offer where
 you can earn 5000$ for only 3 hours of work and so on.
 
 No chance this could be HAM for anybody (with at least some brain, but
 anyway you have to protect such people from themselves *g*). The same
 for feeding HAM: Give it only food that *is legit e-mail*, not some
 which could be.
 
 Remember: 10 good SPAM and HAM are better than 200 where 5% are wrong.

Wrong for who?  If it looks like marketing, 99% of the time, I don't
want it.  And for most of the accounts that I deal with, this goes up
to 100%.  Not true for my customers, tho.

My philosophy with Bayes has always been to skip the ham/spam
definitions and go with a wanted/unwanted model.  This way Bayes
learns to filter out the emails you don't want even if some of them
may technically be ham.  (Obviously, I would not be able to do this on
a site-wide installation)

 Another good thing: Since I help with mass-checks, I found that of my
 6000 SPAMs, I had about 4 or 5 which I had to delete (but unlearn
 before), as they were mistakes. That's the advantage you get back when
 running mass-checks.

-- 
Bowie


Re: limit child process

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 14:27 Jean-Paul Natola wrote:
 and what would be an ideal number to set it  ?

How many do you have right now?

 I came in this morning ,  got a bunch of those swap message , and my
 VM is at 86% right now

And which processes consume your memory, and how much?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpYADREsQnUq.pgp
Description: PGP signature


RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
jdow wrote:
 From: Bowie Bailey [EMAIL PROTECTED]
 
  Michael Monnerie wrote:
   On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote:
I've got per-user Bayes and most of my users
don't bother to train it.
   
   Another reason for site-wide bayes, I'd say.
  
  I've considered that, but it won't work in our setup.  This box
  scans our internal email as well as all of our customer's email.
  Since we are in an entirely different line of business from our
  customers, what we consider to be ham and spam will be quite
  different from theirs. If I could train it on both sets, it might
  work, but I don't have access to any of their emails for training.
  
  Also, I really prefer a per-user bayes for our internal email
  since there are various accounts that get a specific type of ham
  and work very well with Bayes.
 
 Importune on them to feed you as large a collection of ham and spam
 as they can, once. Then turn on autolearn, cross your fingers, and
 put on your flack jacket.

What flack jacket?  I have Bayes turned on now and I never did any
manual training on most of the accounts.  I just turned it on and let
autolearn (with the default settings) do it's thing.  So far, I have
received very few complaints.

But then again, I think less than half of my users are even taking
advantage of the spam markup.  Since I don't do any blocking or
sorting on the server, it is up to them to use MUA rules to sort or
delete the spam once my server has marked it.

-- 
Bowie


Re: Strange Bayes results

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 17:08 Bowie Bailey wrote:
 Yes, this user is set with all the default options for Bayes learning
 and a spam threshold of 5.0.  The entire Bayes database was created
 via autolearn for this user.

Is that possible at all? I though that bayes to work you need 200 ham + 
200 spam first.

 It seems to me that Bayes is highly sensitive to the types of ham and
 spam that each user gets.  This user has a near perfect Bayes
 database created with autolearn.  No false positives or negatives and
 95% of spam hit by BAYES_99.  My account, on the other hand, has a
 few false positives and only a 66% spam hit rate despite aggressive
 manual training.

I had on offlist discussion with somebody, we tried to compare our setup 
and results. I'll post this as a separate thread tonight or tomorrow, 
I've gotta go now.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpIOn4NkgBU5.pgp
Description: PGP signature


Re: Bayes not working

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 16:01 Steven Stern wrote:
 I imported a corpus of about 2 messages total and it wasn't
 working. I blew it all away and started from scratch thinking that
 was the problem.  For now, local.cf has a minimum of 100 messages of
 each type. The current database exceeds that.

I've had such an issue. In ancient times I had done sudo -H -u 
spamscanner sa-learn , but that doesn't work now. I really have to 
do su -l spamscanner and then sa-learn. Maybe that's your problem.

Try to sa-learn --dump magic|grep token to see how many ham/spam there 
really are  - as that user.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgppibO7XSozC.pgp
Description: PGP signature


Re: My only problem with URIBL_BLACK

2006-05-10 Thread Matt Kettler
jdow wrote:
 From: Matt Kettler [EMAIL PROTECTED]
 Let's look at their IPs they are hosting their domain from:
 $ host uhmcargo*MUNGED*.com
snip
 
 Fascinating - even the whois registration seems to have MPD, er Multiple
 Personality Disorder. This is what I got in part:
 ===8---
 Registrant:
 Amber Furlong [EMAIL PROTECTED] +1.6785283829
 Private person
 20222 shadowood parkway
 Atlanta,GA,UNITED STATES 30339
 
 
 Domain Name:uhmcargo.net-M

Yeah, I screwed up and use .com instead of .net. When I query the .net I get the
same results as you.


Re: limit child process

2006-05-10 Thread Matt Kettler
Jean-Paul Natola wrote:

 
 spamd -m 
 
 and what would be an ideal number to set it  ?
 
 I came in this morning ,  got a bunch of those swap message , and my VM is at
 86% right now


As Chris S already said, there's no hard-fast rule here. However, here's a rule
of thumb to start with:

1) Use ps aux or top to find the RSS of your largest spamd instance. This will
likely be somewhere around 30M, unless you're using some really large add-on
sets. If your answer here is over 60M, see my footnotes on reducing memory use.

2) Add an extra 4M to this, to cover extra storage for data. If you're passing
-s to spamc, use 16 times the parameter (default is 250k, *16 = 4M). I'm going
to pretend my total is 34M. (Yes, I know 16* is generous, but this is a rule-of
thumb here)

3) Find out how much free memory you have without spamd running. If you use
linux I'd suggest running free and look at the free column next to -/+
buffers/cache:. I'll pretend we have 512M here.

4) Divide the free memory by your answer from 2. That should give you a good
rough-estimate number to work with.



Footnote on memory usage:

If your spamd instances are huge, review the add-on rulesets you're using. Be
warry of any add-on rule file that is over 128k in size.

In particular, do NOT use sa-blacklist unless you have tons of ram to spare.
This ruleset is nearly 2m in .cf file format and will massively expand your SA's
memory usage.



RE: Strange Bayes results

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
 On Mittwoch, 10. Mai 2006 17:08 Bowie Bailey wrote:
  Yes, this user is set with all the default options for Bayes
  learning and a spam threshold of 5.0.  The entire Bayes database
  was created via autolearn for this user.
 
 Is that possible at all? I though that bayes to work you need 200 ham
 + 200 spam first.

Sure it is.  Bayes will autolearn messages right from the start.  It
just waits until it has seen 200 ham and 200 spam before it starts
contributing to the score.  There is nothing saying that you have to
manually learn the first group of messages.

On the other hand, since there is very little direct feedback from
that initial set of messages, you have to be careful that false
positives and negatives do not corrupt the database before you even
get started.

  It seems to me that Bayes is highly sensitive to the types of ham
  and spam that each user gets.  This user has a near perfect Bayes
  database created with autolearn.  No false positives or negatives
  and 95% of spam hit by BAYES_99.  My account, on the other hand,
  has a few false positives and only a 66% spam hit rate despite
  aggressive manual training.
 
 I had on offlist discussion with somebody, we tried to compare our
 setup and results. I'll post this as a separate thread tonight or
 tomorrow, I've gotta go now.

Sounds interesting.

-- 
Bowie


RE: My only problem with URIBL_BLACK

2006-05-10 Thread Chris Santerre
Title: RE: My only problem with URIBL_BLACK





On a side note, to anyone watching this seemingly incredible long discusion about one FP:


This is typically what URIBL member do. We take every FP and delist request seriously. We do deep research on each one. Much deeper then anything you have seen here in this thread. Its not the first time someone has told us about an FP that has turned out to be false. Won't be the last. 

We've had spammers request delistings, which of course sets our magic elves into a firey rage or research. This only backfires on the spammers, and not only doesn't get his spam domain delisted, but gets a lot more of them found in research listed. 

A lot of people on other spam lists have said how Soul Grinding running an RBL is. Well we can now attest to that fact. Threads like this happen in private very often. Lots of work. One can often do hours of research to add 100+ domains, only to find another member has already done it! Bastards! :) 

All of this would not be possible without some very incredible people. I can't thank the members of URIBL enough. The people who support us with mirrors. The anonymous non-members who email us privately with lots of helpful info. Hosts for the bandwidth. Jeff Chan and W.Stearns, for that very first conference call. The SA devs for putting up with us,ok, me. And of course.the magic elves. Thanks to all. 

(Might as well add, all of the above also goes for the incredible work of the SARE team!) 



--Chris
(Holy crap! Did I just post a serious messege to the list? WTF is wrong with me?)


(Double holy crap! I said something nice about Jeff again! He won't believe it!)





Upgrade issues

2006-05-10 Thread Jason Staudenmayer
Hi all,

I upgrade from 2.63 to 3.1 a few weeks ago and it's running fine but I
can seem to figure out how to get something working again.
I did RTFM but I'm still at a loss, I'm looking to get my header reports
back in. below is what I have in my local.cf

# This is the right place to customize your installation of
SpamAssassin.
# See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
# tweaked.
#

###
#
#defang_mime 0

lock_method flock

always_add_report 0

#report_header  1

#always_add_headers  1

add_header all Status _YESNO_, score=_SCORE_ required=_REQD_
tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_

use_terse_report0
rewrite_subject 0
report_safe 0
required_hits   7.5

auto_whitelist_path /whitelist/auto-whitelist
auto_whitelist_file_mode666
auto_whitelist_factor   0.5

I know there are some old things still here but this is all I get in the
headers

Processed in 3.636165 secs); 02 May 2006 09:58:55 -
X-Spam-Status: Yes, hits=8.2 required=7.5

No report on what tests it hit on.
What I would like to see is the old terse report style headers

TIA

Jason



Re: My only problem with URIBL_BLACK

2006-05-10 Thread qqqq
RE: My only problem with URIBL_BLACK|On a side note, to anyone watching this 
seemingly incredible
long discusion about one FP:
|This is typically what URIBL member do. We take every FP and delist request 
seriously. We do deep
research on |each one. Much deeper then anything you have seen here in this 
thread. Its not the
first time someone has told us |about an FP that has turned out to be false. 
Won't be the last.
|We've had spammers request delistings, which of course sets our magic elves 
into a firey rage or
research. This |only backfires on the spammers, and not only doesn't get his 
spam domain delisted,
but gets a lot more of them |found in research listed.
|A lot of people on other spam lists have said how Soul Grinding running an 
RBL is. Well we can
now attest to that |fact. Threads like this happen in private very often. Lots 
of work. One can
often do hours of research to add 100+ |domains, only to find another member 
has already done it!
Bastards! :)
|All of this would not be possible without some very incredible people. I can't 
thank the members of
URIBL enough. |The people who support us with mirrors. The anonymous 
non-members who email us
privately with lots of helpful |info. Hosts for the bandwidth. Jeff Chan and 
W.Stearns, for that
very first conference call. The SA devs for putting up |with us,ok, me. And 
of course.the
magic elves. Thanks to all.
|(Might as well add, all of the above also goes for the incredible work of the 
SARE team!)
|--Chris


Chris,

I brought the issue up as I had a few messages of what my customers believed 
were FP's.  I only
posted 2 examples but there are many.  In my case, I have 1 out of 1000's how 
will want the mailing.
I think what I got out of this whole discussion was that I need to implement 
per user whitelisting.
I will be working on that this weekend.

I support URIBL 100%.  In fact, if you check, you will see that I am a mirror 
and have made
donations for the cause in the past ;-)





RE: Upgrade issues

2006-05-10 Thread Bowie Bailey
Jason Staudenmayer wrote:
 
 I upgrade from 2.63 to 3.1 a few weeks ago and it's running fine but I
 can seem to figure out how to get something working again.
 I did RTFM but I'm still at a loss, I'm looking to get my header
 reports back in. below is what I have in my local.cf
 
 # This is the right place to customize your installation of
 SpamAssassin.
 # See 'perldoc Mail::SpamAssassin::Conf' for details of what can be
 # tweaked.
 #
 
 ###
 #
 #defang_mime 0
 
 lock_method flock
 
 always_add_report 0
 
 #report_header  1
 
 #always_add_headers  1
 
 add_header all Status _YESNO_, score=_SCORE_ required=_REQD_
 tests=_TESTS_ autolearn=_AUTOLEARN_ version=_VERSION_
 
 use_terse_report0
 rewrite_subject 0
 report_safe 0
 required_hits   7.5
 
 auto_whitelist_path /whitelist/auto-whitelist
 auto_whitelist_file_mode666
 auto_whitelist_factor   0.5
 
 I know there are some old things still here but this is all I get in
 the headers
 
 Processed in 3.636165 secs); 02 May 2006 09:58:55 -
 X-Spam-Status: Yes, hits=8.2 required=7.5
 
 No report on what tests it hit on.
 What I would like to see is the old terse report style headers

First, some generic advice.

Run 'spamassassin --lint' and fix any errors that it finds.

As for the headers, it looks like it is giving you what you asked for
with your 'add_header' setting.  It looks like it is really on two
lines in your local.cf since what you got was just the first line of
the status.  Either put the entire 'add_header' definition on one
line, or just remove it to get the default headers.

-- 
Bowie


Big Idiot Needs Instructions

2006-05-10 Thread Chris Edwards

Hola,

I have spent two days trying to figure out how to get the following to
work.  I have set up Spamassassin and ClamAV, I am running sendmail on
the Solaris 10 platform.  I would like to be able to scan for all spam
and virus (in, out and relayed email).  Can someone please point me in
the right direction?  Do I use procmail or something else.  I set this
particular combination up years ago on a Linux box but I have had a lot
of gigo since then.

Thanks for any help

---

Chris Edwards



Re: Big Idiot Needs Instructions

2006-05-10 Thread Mike Jackson

I have spent two days trying to figure out how to get the following to
work.  I have set up Spamassassin and ClamAV, I am running sendmail on
the Solaris 10 platform.  I would like to be able to scan for all spam
and virus (in, out and relayed email).  Can someone please point me in
the right direction?  Do I use procmail or something else.  I set this
particular combination up years ago on a Linux box but I have had a lot
of gigo since then.


Both SA and ClamAV can run from milters - ClamAV comes with its own, but 
you'd have to Google for the SA one (or check the FAQ; I don't know if it's 
listed in there). That would scan the messages before they come in the door, 
and you can reject accordingly at the MTA level. You can also use procmail; 
for SA you'd likely want to run spamd, then invoke spamc from procmail 
(either system-wide or on a per-user basis, your call). I use this script to 
pass messages to clamscan (or clamdscan) via procmail:


http://www.virtualblueness.net/~blueness/clamscan-procfilter/clamscan-procfilter.pl

You can bounce messages that were detected via procmail, but it's a bad 
idea. 



RE: Big Idiot Needs Instructions

2006-05-10 Thread Matthew.van.Eerde
Mike Jackson wrote:
 I have set up Spamassassin and ClamAV, I am running
 sendmail on the Solaris 10 platform.  I would like to be able to
 scan for all spam and virus (in, out and relayed email).
 
 Both SA and ClamAV can run from milters - ClamAV comes with its own,
 but you'd have to Google for the SA one

Alternately you could install MIMEDefang, which is a milter that calls ClamAV 
and SpamAssassin directly.

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: Bayes not working

2006-05-10 Thread Steven Stern

Michael Monnerie wrote:

On Mittwoch, 10. Mai 2006 16:01 Steven Stern wrote:
  

I imported a corpus of about 2 messages total and it wasn't
working. I blew it all away and started from scratch thinking that
was the problem.  For now, local.cf has a minimum of 100 messages of
each type. The current database exceeds that.



I've had such an issue. In ancient times I had done sudo -H -u 
spamscanner sa-learn , but that doesn't work now. I really have to 
do su -l spamscanner and then sa-learn. Maybe that's your problem.


Try to sa-learn --dump magic|grep token to see how many ham/spam there 
really are  - as that user.



  
Everything's tweaked to use root as the user. We do sitewide 
processing since this sits on an MX server.




Re: Bayes not working

2006-05-10 Thread Matt Kettler
Steven Stern wrote:
   
 Everything's tweaked to use root as the user. We do sitewide
 processing since this sits on an MX server.
 

Do you use spamd? If so, it WILL NOT use root as the user. Ever. Period.


RE: Big Idiot Needs Instructions

2006-05-10 Thread Chris Edwards
 
Thanks for all the quick replies!  I was able to get both the mileters
up and running.  Now I have one new question... When I run spamd...

/usr/local/bin/spamd -d -u nobody

I get these errors...

[24001] warn: unix dgram connect: Socket operation on non-socket at
/usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79
[24001] error: no connection to syslog available at
/usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79

Any ideas?

Thanks Again

---

Chris Edwards


-Original Message-
From: Mike Jackson [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, May 10, 2006 12:41 PM
To: users@spamassassin.apache.org
Subject: Re: Big Idiot Needs Instructions

 I have spent two days trying to figure out how to get the following to

 work.  I have set up Spamassassin and ClamAV, I am running sendmail on

 the Solaris 10 platform.  I would like to be able to scan for all spam

 and virus (in, out and relayed email).  Can someone please point me in

 the right direction?  Do I use procmail or something else.  I set this

 particular combination up years ago on a Linux box but I have had a 
 lot of gigo since then.

Both SA and ClamAV can run from milters - ClamAV comes with its own, but
you'd have to Google for the SA one (or check the FAQ; I don't know if
it's listed in there). That would scan the messages before they come in
the door, and you can reject accordingly at the MTA level. You can also
use procmail; for SA you'd likely want to run spamd, then invoke spamc
from procmail (either system-wide or on a per-user basis, your call). I
use this script to pass messages to clamscan (or clamdscan) via
procmail:

http://www.virtualblueness.net/~blueness/clamscan-procfilter/clamscan-pr
ocfilter.pl

You can bounce messages that were detected via procmail, but it's a bad
idea. 





RE: Big Idiot Needs Instructions

2006-05-10 Thread Matthew.van.Eerde
Chris Edwards wrote:
 
 /usr/local/bin/spamd -d -u nobody
 
 I get these errors...
 
 [24001] warn: unix dgram connect: Socket operation on non-socket at
 /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79
 [24001] error: no connection to syslog available at
 /usr/perl5/site_perl/5.8.4/Mail/SpamAssassin/Logger/Syslog.pm line 79

Add --syslog-socket=inet to the spamd startup line
Source: 
http://lists.roaringpenguin.com/pipermail/mimedefang/2004-April/021539.html


seeing a lot of these?

2006-05-10 Thread Ken A



EMPTY_MESSAGE 1.50, MISSING_HEADERS 0.19, MISSING_SUBJECT 1.34, 
MSGID_FROM_MTA_HEADER 0.00, MSGID_FROM_MTA_ID 0.93, NO_REAL_NAME 0.55, 
TO_CC_NONE 0.13, UNCLOSED_BRACKET 2.48


?
We are seeing many, many hundreds of these very small messages, with NO 
Subject: or To: headers and NO Message Body. I assume it's a spam trojan 
run amok?


Currently, I just added a META rule to deal with them appropriately. 
Anyone else seeing these?


Ken A
Pacific.Net


spam getting autolearn=ham problem

2006-05-10 Thread Bazooka Joe
more and more i am seeing spam marked as autolearn=hamI was wondering the best way to correct this? I was going to delete the bayes and whitelist files and start over but I thought I would see what you do when this happens.
my setupusing fc4sendmailspamass-milter - one bayes file for all users on serverspamassassin -chris


RE: seeing a lot of these?

2006-05-10 Thread Bowie Bailey
Ken A wrote:
 Bowie Bailey wrote:
  Ken A wrote:
   EMPTY_MESSAGE 1.50, MISSING_HEADERS 0.19, MISSING_SUBJECT 1.34,
   MSGID_FROM_MTA_HEADER 0.00, MSGID_FROM_MTA_ID 0.93, NO_REAL_NAME
   0.55, TO_CC_NONE 0.13, UNCLOSED_BRACKET 2.48
   
   ?
   We are seeing many, many hundreds of these very small messages,
   with NO Subject: or To: headers and NO Message Body. I assume
   it's a spam trojan run amok? 
   
   Currently, I just added a META rule to deal with them
   appropriately. Anyone else seeing these?
  
  I see these from time to time.  I haven't tried to catch them with
  SA, but since they tend to cause problems with Outlook's pop3
  routines, I have a cronjob that sweeps the maildirs and removes
  them every 15 minutes. 
  
 
 I've seen issues with OE and these messages too, which is why I setup
 the Meta rule. Do you know if this is a documented issue with OE?

I don't know if this issue is documented or not.  I'm not sure MS
would consider it a bug anyway since it is caused by a severely
malformed email.

I think the problem is due to there not being a blank line in the
email to mark the end of the headers.

-- 
Bowie


Problem with spamassassin skipping messages and sa-learn coredumps on sync

2006-05-10 Thread marijane white

Greetings all,

I am having a problem where spamassassin is not running on all my messages 
and a bunch of spam is slipping into my inbox as a result.  I would say 
this has been going on for about a month, but it took me a few weeks to 
notice the lack of spamassassin headers in the skipped mail.  I am not 
having any luck figuring out why this happening or how to fix it; I hope 
someone on this list can help.


I'm running version 3.1.0 (Perl 5.8.7) and the platform it's running on is 
NetBSD 3.0.  I invoke it via procmail, with the recipe suggested at the 
spamassassin wiki.



I turned on procmail logging this morning and I am seeing a number of 
potentially troubling messages.


For example, I've got a couple of thes failure messages:

procmail: Match on  256000
procmail: Locking spamassassin.lock
procmail: Executing spamassassin
procmail: [7638] Wed May 10 10:29:40 2006
procmail: Program failure (-11) of spamassassin
procmail: Rescue of unfiltered data succeeded
procmail: [7638] Wed May 10 10:29:40 2006
procmail: Unlocking spamassassin.lock


And I see several of these:

procmail: [2585] Wed May 10 10:59:53 2006
procmail: Locking spamassassin.lock
[8699] warn: bayes: unknown packing format for bayes db, please re-learn: 
138 at
 /usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm 
line 1874.

procmail: [2585] Wed May 10 11:00:01 2006
procmail: Locking spamassassin.lock
[8699] warn: bayes: unknown packing format for bayes db, please re-learn: 
68 at
/usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm 
line 1874.
[8699] warn: bayes: expire_old_tokens: panic: sv_setpvn called with 
negative strlen at 
/usr/pkg/lib/perl5/vendor_perl/5.8.0/Mail/SpamAssassin/BayesStore/DBM.pm

line 624.


I notice the warnings mention re-learning. Unfortunately I am also having 
problems with sa-learn --sync dumping core on most of my attempts to train 
the classifier.



Does anyone have any idea what causes these errors and how I might fix 
them?



Thanks,
marijane


Re: spam getting autolearn=ham problem

2006-05-10 Thread Matt Kettler
Bazooka Joe wrote:
 more and more i am seeing spam marked as autolearn=ham
 
 I was wondering the best way to correct this? 

Depends.. Really you first need to figure out why it this happened before you
take any action at all.

Can you post a X-Spam-Status header for one of the messages?

Have you modified the required_score, or any of the learning thresholds in your
config?

In general there are only a few rules that can cause a message to be tagged as
spam, but do not count toward the computation of score for learning purposes.
*_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.


RE: spam getting autolearn=ham problem

2006-05-10 Thread Bowie Bailey
Bazooka Joe wrote:
 more and more i am seeing spam marked as autolearn=ham

This means that the spam is being given a very low score from SA.  The
score used here does not include the Bayes scoring, but if you learn
very many like this, then the effectiveness of your Bayes database
will drop.

 I was wondering the best way to correct this? I was going to delete
 the bayes and whitelist files and start over but I thought I would
 see what you do when this happens.  

It might be useful to start over with these files, but you need to fix
the underlying problem first.  Why are these messages scoring so low?
What kind of messages are they?  If you don't fix the scoring problem,
your new files will just inherit the same problems as the old ones.

 my setup
 
 using fc4
 sendmail
 spamass-milter - one bayes file for all users on server
 spamassassin

-- 
Bowie


Re: spam getting autolearn=ham problem

2006-05-10 Thread Bazooka Joe

X-Spam-Status: No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham 
	version=3.0.4
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on agwebinc.com
I have required of 3 which you can see and i have the milter rejecting email w/ score more than 7On 5/10/06, Matt Kettler
 [EMAIL PROTECTED] wrote:Bazooka Joe wrote:
 more and more i am seeing spam marked as autolearn=ham I was wondering the best way to correct this?Depends.. Really you first need to figure out why it this happened before youtake any action at all.
Can you post a X-Spam-Status header for one of the messages?Have you modified the required_score, or any of the learning thresholds in yourconfig?In general there are only a few rules that can cause a message to be tagged as
spam, but do not count toward the computation of score for learning purposes.*_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.


Re: spam getting autolearn=ham problem

2006-05-10 Thread Matt Kettler
Bazooka Joe wrote:
 
 *X-Spam-Status:* No, score=1.0 required=3.0 tests=BAYES_60 autolearn=ham
  version=3.0.4
 *X-Spam-Level:* *
 *X-Spam-Checker-Version:* SpamAssassin 3.0.4 (2005-06-05) on
 agwebinc.com http://agwebinc.com

As far as the autolearner is concerned, the score of that message is 0.
(BAYES_60 is the only rule matched, and the autolearner doesn't consider BAYES
rule scores to prevent self-feedback in the bayes learning).

0 is less than the default ham learning threshold of 0.1, and the existing
training only scores 60 (not strongly known as spam), so it autolearns it as 
ham.

I would approach this from two angles.

1) why did the spam message fail to match any rules other than bayes? Your SA
version is a little old, you might consider testing it against 3.1.1. You might
also consider some rulesemporium.com add-on rulesets to help detect the
particular spam message.

2) Why did it only rank as BAYES_60. Have you done any manual training?


Re: spam getting autolearn=ham problem

2006-05-10 Thread Jay Lee




Bazooka Joe wrote:

  

  

X-Spam-Status: No, score=1.0 required=3.0
tests=BAYES_60 autolearn=ham 
 version=3.0.4
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.0.4
(2005-06-05) on agwebinc.com

  

  
  
I have required of 3 which you can see and i have the milter rejecting
email w/ score more than 7
  
  
  On 5/10/06, Matt Kettler
   [EMAIL PROTECTED]
wrote:
  Bazooka
Joe wrote:

 more and more i am seeing spam marked as autolearn=ham

 I was wondering the best way to correct this?

Depends.. Really you first need to figure out why it this happened
before you
take any action at all.


Can you post a X-Spam-Status header for one of the messages?

Have you modified the required_score, or any of the learning thresholds
in your
config?

In general there are only a few rules that can cause a message to be
tagged as

spam, but do not count toward the computation of score for learning
purposes.
*_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.
  
  
  

You can set bayes_auto_learn_threshold_nonspam in local.cf to be 0 or a
negative number, then autolearn=ham won't kick in unless it's below a
certain score (not sure if this counts bayes or not). But yes, the
real question is why are no rules triggering... Is DNS working? Are
you using the blacklist rules, etc? What does the spam look like?

Jay




OT: anyone know how to do server-side MS-Exchange filters?

2006-05-10 Thread Jason Haar
We currently have to rely on users to create Rules Wizard settings
under Outlook to filter off their Spam (via X-Spam-Status: headers).

What would be better is if Exchange could do like procmail/maildrop and
allow the SysAdmin to create the rule on the server - so it hits everyone.

It should create a Spam folder per mailbox and deliver high-scoring
spam to that instead of the INBOX

Has anyone done this, and if so, what sort of tools allow it?

Thanks!

(2006, and still waiting for Microsoft to do what was done 15 years ago
on older systems...)

-- 
Cheers

Jason Haar
Information Security Manager, Trimble Navigation Ltd.
Phone: +64 3 9635 377 Fax: +64 3 9635 417
PGP Fingerprint: 7A2E 0407 C9A6 CAF6 2B9F 8422 C063 5EBB FE1D 66D1



Re: spam getting autolearn=ham problem

2006-05-10 Thread Bazooka Joe
the spamHi RobarIt is sad but it is true that the large groups of women are unhappy withthe size of there BF is thing. Don't be that guy,www.missusoandforever.org/ab1/
. and station, designed been grabbed theorized to artistThank youi run rulesTRUSTED_RULESETS=SARE_STOCKS TRIPWIRE SARE_EVILNUMBERS0 SARE_EVILNUMBERS1 BOGUSVIRUS SARE_ADULT SARE_FRAUD SARE_BML SARE_SPOOF SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM SARE_HEA
DER SARE_HTML SARE_SPECIFIC SARE_OBFU SARE_REDIRECT SARE_GENLSUBJ SARE_UNSUB SARE_WHITELIST;on my account I get about 10 spams a day scoring below a 3 out of 50 spams total (thats a guess)I will try moving the ham threshold down.
and no I haven't done any bayes training. and dns is working.some stats for my box for one weekI block using sbl-xbl.spamhaus.org, or spamassass catches, or clamav rejects about 45,000 emails. ham email w/ a score of 3 or less is about 9,000
On 5/10/06, Jay Lee [EMAIL PROTECTED] wrote:



  


Bazooka Joe wrote:

  

  

X-Spam-Status: No, score=1.0 required=3.0
tests=BAYES_60 autolearn=ham 
 version=3.0.4
X-Spam-Level: *
X-Spam-Checker-Version: SpamAssassin 3.0.4
(2005-06-05) on agwebinc.com

  

  
  
I have required of 3 which you can see and i have the milter rejecting
email w/ score more than 7
  
  
  On 5/10/06, Matt Kettler
   [EMAIL PROTECTED]
wrote:
  Bazooka
Joe wrote:

 more and more i am seeing spam marked as autolearn=ham

 I was wondering the best way to correct this?

Depends.. Really you first need to figure out why it this happened
before you
take any action at all.


Can you post a X-Spam-Status header for one of the messages?

Have you modified the required_score, or any of the learning thresholds
in your
config?

In general there are only a few rules that can cause a message to be
tagged as

spam, but do not count toward the computation of score for learning
purposes.
*_IN_BLACKLIST, AWL, BAYES_*, and GTUBE are the most noteworthy ones.
  
  
  

You can set bayes_auto_learn_threshold_nonspam in local.cf to be 0 or a
negative number, then autolearn=ham won't kick in unless it's below a
certain score (not sure if this counts bayes or not). But yes, the
real question is why are no rules triggering... Is DNS working? Are
you using the blacklist rules, etc? What does the spam look like?

Jay






Re: OT: anyone know how to do server-side MS-Exchange filters?

2006-05-10 Thread John D. Hardin
On Thu, 11 May 2006, Jason Haar wrote:

 Has anyone done this, and if so, what sort of tools allow it?

A Linux mail relay in front of the Exchange server. :)

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The problem is when people look at Yahoo, slashdot, or groklaw and
  jump from obvious and correct observations like Oh my God, this
  place is teeming with utter morons to incorrect conclusions like
  there's nothing of value here.-- Al Petrofsky, in Y! SCOX
---



The New SpamAssassin sa-update

2006-05-10 Thread David Baron
1. Depends on some PERL packages: libarchive-tar-perl, libio-zlib-perl. Debian 
packages did not enforce this dependency. I installed them manually.

2. Fails after several time outs with:
http: request failed: 500 read timeout: 500 read timeout
error: no mirror data available for channel updates.spamassassin.org
channel: MIRRORED.BY contents were missing, channel failed

EIther the rules update site is not ready or this PERL script needs some 
configuration. EIther way, not ready to play.


Re: {SPAM}{!} Re: spam getting autolearn=ham problem

2006-05-10 Thread Matt Kettler
Bazooka Joe wrote:
 the spam
snip
 and no I haven't done any bayes training.  and dns is working.



Are you running with SpamAssassin's built-in support for RBLs and URIBLs?

That message text got *TORN UP* by the URIBLs on my system:

X-EVI-MailScanner-SpamCheck: spam, SpamAssassin (score=15.259, required 5,
HTML_40_50 0.50, HTML_MESSAGE 0.00, INFO_GREYLIST_NOTDELAYED -0.00,
LOCAL_FORGED_REFERENCES 0.10, RAZOR2_CF_RANGE_51_100 0.50,
RAZOR2_CF_RANGE_E8_51_100 1.50, RAZOR2_CHECK 0.50, SPF_PASS -0.00,
SURBL_MULTI1 -0.50, SURBL_MULTI2 -0.20, URIBL_BLACK 1.50,
URIBL_BLACK_OVERLAP -1.00, URIBL_JP_SURBL 4.09, URIBL_SBL 1.64,
URIBL_SC_SURBL 4.50, URIBL_WS_SURBL 2.14)

Check your init.pre and see if the uribl plugin is loaded, also check to make
sure you have Net::DNS installed.


Re: The New SpamAssassin sa-update

2006-05-10 Thread Matt Kettler
David Baron wrote:
 1. Depends on some PERL packages: libarchive-tar-perl, libio-zlib-perl. 
 Debian 
 packages did not enforce this dependency. I installed them manually.
 
 2. Fails after several time outs with:
 http: request failed: 500 read timeout: 500 read timeout
 error: no mirror data available for channel updates.spamassassin.org
 channel: MIRRORED.BY contents were missing, channel failed
 
 EIther the rules update site is not ready or this PERL script needs some 
 configuration. EIther way, not ready to play.

Is the debian package SA 3.1.0 or 3.1.1?

If 3.1.0, known issue. sa-update was fixed in 3.1.1.



Re: spam getting autolearn=ham problem

2006-05-10 Thread Jay Lee

The message you sent directly to me hit the following:

*  0.5 HTML_40_50 BODY: Message is 40% to 50% HTML
*  0.1 HTML_MESSAGE BODY: HTML included in message
*  1.5 RAZOR2_CF_RANGE_E8_51_100 Razor2 gives engine 8 confidence level
*  above 50%
*  [cf: 100]
*  0.5 RAZOR2_CHECK Listed in Razor2 (http://razor.sf.net/)
*  3.5 RAZOR2_CF_RANGE_51_100 Razor2 gives confidence level above 50%
*  [cf: 100]
*   10 URIBL_SBL Contains an URL listed in the SBL blocklist
*  [URIs: missusoandforever.org]
*  4.5 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blocklist
*  [URIs: missusoandforever.org]


Of course, the scores are heavily inflated by my own personal rules (I 
don't recommend doing this unless you know what you're doing) but the 
point is, your SA doesn't seem to be firing on certain things it should, 
do you have the DNS BL's working?  Are you using Razor or DCC?  Are you 
on the latest 3.1.1?


Jay


Re: Latest sa-stats from last week

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote:
 So you are saying that I should not feed Bayes with the unsolicited
 marketing garbage that I get because it looks like something that
 could have been requested?

If it's a newsletter from a seemingly legit company I don't feed it to 
bayes. I try to unsubscribe from them. If they still send me, I write 
some rule to filter them. If some customer then rants, I tell them that 
said company doesn't work nicely - and he should make a filter to get 
e-mail from that company out of the SPAM folder again.

  Remember: 10 good SPAM and HAM are better than 200 where 5% are
  wrong.
 Wrong for who?  If it looks like marketing, 99% of the time, I don't
 want it.  And for most of the accounts that I deal with, this goes up
 to 100%.  Not true for my customers, tho.

Yes, some manual filters can catch those. If it's stupid SPAM, then 
bayes.

 My philosophy with Bayes has always been to skip the ham/spam
 definitions and go with a wanted/unwanted model.  This way Bayes
 learns to filter out the emails you don't want even if some of them
 may technically be ham.  (Obviously, I would not be able to do this
 on a site-wide installation)

But as you said your bayes is not quite accurate, so it seems not to 
work really. Wouldn't it be better to have a highly accurate bayes, and 
setup some filters for you personally? If a BAYES_99 would be always 
SPAM for you, you could give it 4.5 or 5 points, and probably filter 
more SPAM than now?

 But then again, I think less than half of my users are even taking
 advantage of the spam markup.  Since I don't do any blocking or
 sorting on the server, it is up to them to use MUA rules to sort or
 delete the spam once my server has marked it.

I do the same, just wrote a nice document for Outlook 2003 describing 
how to filter SPAM.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpzgGFS0Slt9.pgp
Description: PGP signature


Re: OT: anyone know how to do server-side MS-Exchange filters?

2006-05-10 Thread Jay Lee




John D. Hardin wrote:

  On Thu, 11 May 2006, Jason Haar wrote:

  
  
Has anyone done this, and if so, what sort of tools allow it?

  
  
A Linux mail relay in front of the Exchange server. :)
  

That wouldn't allow messages to be put in a subfolder instead of inbox,
just to do the header tagging. Not having used Exchange I can't answer
intellegently on whether or not it supports server side sorting.
However, if it doesn't you could use something like Maia Mailguard and
a Postfix frontend to the exchange server to quarantine and report the
spam, users would be able to configure and safely view and "free"
tagged spam messages via a web interface. It also can send regular
reports to the users on what spam they've gotten, senders and subject,
etc. Website is:

http://www.renaissoft.com/maia/




RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
Michael Monnerie wrote:
 On Mittwoch, 10. Mai 2006 17:27 Bowie Bailey wrote:
  So you are saying that I should not feed Bayes with the unsolicited
  marketing garbage that I get because it looks like something that
  could have been requested?
 
 If it's a newsletter from a seemingly legit company I don't feed it to
 bayes. I try to unsubscribe from them. If they still send me, I write
 some rule to filter them. If some customer then rants, I tell them
 that said company doesn't work nicely - and he should make a filter
 to get e-mail from that company out of the SPAM folder again.

If it comes to an account that does not subscribe to newsletters
(webmaster, sales, etc), it is spam by definition and is fed to Bayes.

   Remember: 10 good SPAM and HAM are better than 200 where 5% are
   wrong.
  Wrong for who?  If it looks like marketing, 99% of the time, I don't
  want it.  And for most of the accounts that I deal with, this goes
  up to 100%.  Not true for my customers, tho.
 
 Yes, some manual filters can catch those. If it's stupid SPAM, then
 bayes.
 
  My philosophy with Bayes has always been to skip the ham/spam
  definitions and go with a wanted/unwanted model.  This way Bayes
  learns to filter out the emails you don't want even if some of them
  may technically be ham.  (Obviously, I would not be able to do this
  on a site-wide installation)
 
 But as you said your bayes is not quite accurate, so it seems not to
 work really. Wouldn't it be better to have a highly accurate bayes,
 and setup some filters for you personally? If a BAYES_99 would be
 always SPAM for you, you could give it 4.5 or 5 points, and probably
 filter more SPAM than now?

If I look at my personal database, the spam percentage shown in the
stats is lower than I'd like, but I wouldn't say it's not accurate.  I
very rarely see a true false positive or negative with Bayes and I
watch my account closely.  I do see a few ham with BAYES_99 and spam
with BAYES_00, but that's usually simply because those were either
spam that only hit BAYES_99 or ham (usually from this list) that
tripped a few extra rules.

  But then again, I think less than half of my users are even taking
  advantage of the spam markup.  Since I don't do any blocking or
  sorting on the server, it is up to them to use MUA rules to sort or
  delete the spam once my server has marked it.
 
 I do the same, just wrote a nice document for Outlook 2003 describing
 how to filter SPAM.

I've done the same for both Outlook Express and Thunderbird.  The
Thunderbird setup is a single checkbox. :)

-- 
Bowie


ALL_TRUSTED causing false negatives?

2006-05-10 Thread Philip Mak
I've been getting a lot of spam lately ever since I moved my mail
server to a new system. Here's one of the false negatives that slipped
through, for example:

X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50,  
NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3 
X-Spam-Summary:  0.0 NO_REAL_NAME   From: does not include a real name  
0.1 RCVD_BY_IP Received by mail server with no name 
-3.3 ALL_TRUSTEDDid not pass through any untrusted hosts
1.1 YOUR_INCOMEBODY: Doing something with my income 
0.0 BAYES_50   BODY: Bayesian spam probability is 40 to 60% 
[score: 0.5000]  

Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any
spammer who connects directly to my mail server has a good chance of
getting past SpamAssassin?

I did not define any trusted/internal networks when I installed
SpamAssassin.

SpamAssassin version 3.0.3
  running on Perl version 5.8.4

Linux naga.aaanime.net 2.6.8-11-amd64-k8 #1 Sun Oct 2 21:26:54 UTC 2005 x86_64 
GNU/Linux

Running Debian Sarge


Re: ALL_TRUSTED causing false negatives?

2006-05-10 Thread Matt Kettler
Philip Mak wrote:
 I've been getting a lot of spam lately ever since I moved my mail
 server to a new system. Here's one of the false negatives that slipped
 through, for example:
 
 X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50,
   
 NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3   
   
 X-Spam-Summary:  0.0 NO_REAL_NAME   From: does not include a real 
 name  
 0.1 RCVD_BY_IP Received by mail server with no name   
   
 -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts  
   
 1.1 YOUR_INCOMEBODY: Doing something with my income   
   
 0.0 BAYES_50   BODY: Bayesian spam probability is 40 to 
 60% 
 [score: 0.5000]  
 
 Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any
 spammer who connects directly to my mail server has a good chance of
 getting past SpamAssassin?

That should not happen on a properly working SA setup. Odds are very good you've
got a NATed mailserver, resulting in the Trust Path gueser to fail. You'll have
to declare trusted_networks manually to fix it.

http://wiki.apache.org/spamassassin/TrustPath


Re: ALL_TRUSTED causing false negatives?

2006-05-10 Thread jdow

From: Philip Mak [EMAIL PROTECTED]


I've been getting a lot of spam lately ever since I moved my mail
server to a new system. Here's one of the false negatives that slipped
through, for example:

X-Spam-Status: No, score=-2.1 required=5.0 tests=ALL_TRUSTED,BAYES_50,
   NO_REAL_NAME,RCVD_BY_IP,YOUR_INCOME autolearn=ham version=3.0.3
X-Spam-Summary:  0.0 NO_REAL_NAME   From: does not include a real name
   0.1 RCVD_BY_IP Received by mail server with no name
   -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts
   1.1 YOUR_INCOMEBODY: Doing something with my income
   0.0 BAYES_50   BODY: Bayesian spam probability is 40 to 60%
   [score: 0.5000]

Why does ALL_TRUSTED have a score of -3.3? Doesn't this mean that any
spammer who connects directly to my mail server has a good chance of
getting past SpamAssassin?

I did not define any trusted/internal networks when I installed
SpamAssassin.

SpamAssassin version 3.0.3
 running on Perl version 5.8.4

Linux naga.aaanime.net 2.6.8-11-amd64-k8 #1 Sun Oct 2 21:26:54 UTC 2005 x86_64 
GNU/Linux

Running Debian Sarge


There is a strong indication that you have your trusted networks
maldefined. I suggest visiting the wiki and looking up ALL_TRUSTED.

{^_^} 



Re: Big Idiot Needs Instructions

2006-05-10 Thread jdow

From: Chris Edwards [EMAIL PROTECTED]

Hola,

I have spent two days trying to figure out how to get the following to
work.  I have set up Spamassassin and ClamAV, I am running sendmail on
the Solaris 10 platform.  I would like to be able to scan for all spam
and virus (in, out and relayed email).  Can someone please point me in
the right direction?  Do I use procmail or something else.  I set this
particular combination up years ago on a Linux box but I have had a lot
of gigo since then.

Thanks for any help

jdow I use procmail with great success. I also use the SpamAssassin
ClamAV plugin. (See plugins on the wiki.)

{^_^}


Re: Latest sa-stats from last week

2006-05-10 Thread jdow

From: Bowie Bailey [EMAIL PROTECTED]


jdow wrote:

From: Bowie Bailey [EMAIL PROTECTED]

 Michael Monnerie wrote:
  On Dienstag, 9. Mai 2006 16:18 Bowie Bailey wrote:
   I've got per-user Bayes and most of my users
   don't bother to train it.
  
  Another reason for site-wide bayes, I'd say.
 
 I've considered that, but it won't work in our setup.  This box

 scans our internal email as well as all of our customer's email.
 Since we are in an entirely different line of business from our
 customers, what we consider to be ham and spam will be quite
 different from theirs. If I could train it on both sets, it might
 work, but I don't have access to any of their emails for training.
 
 Also, I really prefer a per-user bayes for our internal email

 since there are various accounts that get a specific type of ham
 and work very well with Bayes.

Importune on them to feed you as large a collection of ham and spam
as they can, once. Then turn on autolearn, cross your fingers, and
put on your flack jacket.


What flack jacket?  I have Bayes turned on now and I never did any
manual training on most of the accounts.  I just turned it on and let
autolearn (with the default settings) do it's thing.  So far, I have
received very few complaints.

But then again, I think less than half of my users are even taking
advantage of the spam markup.  Since I don't do any blocking or
sorting on the server, it is up to them to use MUA rules to sort or
delete the spam once my server has marked it.


Fairly frequently I see evidence that autolearn can massively misfire
on SpamAssassin startup. It does not always happen or there'd be a lot
more messages about it. But there is apparently a vulnerable period
that can go bad with just the wrong selection of messages. Once the
database is large inertia will save the day.

{^_^}


Re: OT: anyone know how to do server-side MS-Exchange filters?

2006-05-10 Thread Tim Litwiller
When you create filters in outlook connected to exchange you can chose 
if the filter is server or client side. 

disclaimer - I don't know what versions of outlook or exchange are 
required to make it work.
at my previous job where we had exchange I had outlook 2000 and I don't 
know what version of exchange.


The main problem was that sometime it would mess up and try to run a 
client rule on the server and fail.


Jay Lee wrote:

John D. Hardin wrote:

On Thu, 11 May 2006, Jason Haar wrote:

  

Has anyone done this, and if so, what sort of tools allow it?



A Linux mail relay in front of the Exchange server. :)
  
That wouldn't allow messages to be put in a subfolder instead of 
inbox, just to do the header tagging.  Not having used Exchange I 
can't answer intellegently on whether or not it supports server side 
sorting.  However, if it doesn't you could use something like Maia 
Mailguard and a Postfix frontend to the exchange server to quarantine 
and report the spam, users would be able to configure and safely view 
and free tagged spam messages via a web interface.  It also can send 
regular reports to the users on what spam they've gotten, senders and 
subject, etc.  Website is:


http://www.renaissoft.com/maia/




RE: Latest sa-stats from last week

2006-05-10 Thread Bowie Bailey
jdow wrote:
 From: Bowie Bailey [EMAIL PROTECTED]
  jdow wrote:
   
   Importune on them to feed you as large a collection of ham and
   spam as they can, once. Then turn on autolearn, cross your
   fingers, and put on your flack jacket.
  
  What flack jacket?  I have Bayes turned on now and I never did any
  manual training on most of the accounts.  I just turned it on and
  let autolearn (with the default settings) do it's thing.  So far, I
  have received very few complaints. 
  
  But then again, I think less than half of my users are even taking
  advantage of the spam markup.  Since I don't do any blocking or
  sorting on the server, it is up to them to use MUA rules to sort or
  delete the spam once my server has marked it.
 
 Fairly frequently I see evidence that autolearn can massively misfire
 on SpamAssassin startup. It does not always happen or there'd be a lot
 more messages about it. But there is apparently a vulnerable period
 that can go bad with just the wrong selection of messages. Once the
 database is large inertia will save the day.

Right.  I understand the danger of doing things this way.  I was just
pointing out that my users don't generally complain about spam.  I
assume SpamAssassin is doing well for them, but since they never tell
me anything, I really have no idea.

-- 
Bowie


Bayes advanced questions

2006-05-10 Thread Michael Monnerie
Dear SA users, I've had an offlist comparison of bayes DBs, and we found 
some interesting differences. We're trying to find out why bayes on 
server #1 makes better scores.:

Server #1 local.cf (SA 3.1.1):
bayes_expiry_max_db_size200
bayes_auto_expire   0
bayes_file_mode 0777
bayes_auto_learn_threshold_spam 8.00
bayes_auto_learn_threshold_nonspam  1.0

Server #1 bayes files:
-rw-rw-rw-+  1 vscan  vscan  19738624 May 10 10:04 bayes_db_seen
-rw-rw-rw-+  1 vscan  vscan  41697280 May 10 10:04 bayes_db_toks

Server #1 bayes dump:
0.000  0  93053  0  non-token data: nspam
0.000  0  53428  0  non-token data: nham
0.000  01261864  0  non-token data: ntokens

Server #2 local.cf:
bayes_auto_learn1
bayes_learn_to_journal  1
bayes_auto_expire   1
ok_languagesde en es
ok_locales  en

Server #2 bayes files:
  21M 2006-05-10 10:20 bayes_seen
 5,3M 2006-05-10 10:20 bayes_toks

Server #2 bayes dump:
0.000  0 155791  0  non-token data: nspam
0.000  0  80523  0  non-token data: nham
0.000  0 129852  0  non-token data: ntokens

From the numbers I would say that server #2 had learned more spam+ham, 
but has about 1/10th of tokens. That server is also far less accurate 
with bayes than server #1. Could the ntokens be the reason? 
With the new SPAM this last weeks, that tries to poison bayes, it could 
maybe be effective with the default of 150.000 tokens?


Another tip for all: With server #1 setting 
bayes_auto_learn_threshold_spam 8.00
you could expect this message to be autolearned:

 X-Spam-Status: Yes, hits=8.7 required=5.0 tests=BAYES_99=3.5, 
 HTML_MESSAGE=0.001,HTML_MIME_NO_HTML_TAG=0,HTML_TAG_EXIST_TBODY=0.282, 
 MIME_HTML_ONLY=0.389,RELAY_DE=0.01,REPLY_TO_EMPTY=0.512, 
 SARE_FORGED_EBAY=4 autolearn=no bayes=1.

But it is autolearn=no. This shows, that manual re-feeding SPAM can be 
effective for your Bayes, because this sure-is-spam would not have been 
learned automatically. Since it's already BAYES_99, you could say 
don't bother, I'll be fine *g* but bayes needs to be trained 
permanently, because tokens time out...

And why was SARE_FORGED_EBAY set down to 4? It was so nice at 100+...



Also, we set bayes_expiry_max_db_size to 5, and made
sa-learn --force-expire --sync
But still those numbers:
  0.000  0 242424  0  non-token data: nspam
  0.000  0 313252  0  non-token data: nham
  0.000  0 134001  0  non-token data: ntokens

Why are still 134k tokens there?

mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpFfLkAJB6Y1.pgp
Description: PGP signature


Re: Bayes advanced questions

2006-05-10 Thread Michael Monnerie
On Mittwoch, 10. Mai 2006 23:41 Matt Kettler wrote:
 Particularly on servers with a site-wide DB used against broadly
 diverse spread of mail, increasing the token limit will improve
 accuracy.

 However, this comes at the expense of increased storage needs and
 slower performance. (In particular, expiry takes a LOT longer with
 larger DBs)

DB Files are about 60MB together, so not really big (I just got a 
pricelist with the new 750GB SATA drive from Seagate *g*).

And tonights expiry for server #1:
bayes: synced databases from journal in 11 seconds: 1968 unique entries 
(3059 total entries)

So it's not too long also. Could possibly be longer on a server that 
gets some million mails per day, of course.

 score used is the score the message would have got if:
   bayes was disabled
   the AWL was disabled
   no userconf (ie:black/whitelists) rules were enabled.

Thats good info which should be in the man page.

 Since that message scored 8.7, and derives 3.5 of it's points from
 BAYES_99, it does not surprise me at all the message was not learned.

 Also, EVEN if the learning score is over the threshold, SA will not
 learn a message as spam unless:
   there are at least 3.0 points of header rules
   there are at least 3.0 points of body rules
   Existing learning would not place the message in a low bayes
 category (ie: don't learn as spam if the message would have hit
 BAYES_00 otherwise)

This is written in the man page, except the last line with the BAYES_00 
wasn't clear to me from there. Is this valid just for BAYES_00 and 
BAYES_99, or also BAYES_05 and BAYES_95? 

  Since it's already BAYES_99, you could say
  don't bother, I'll be fine *g* but bayes needs to be trained
  permanently, because tokens time out...

 Also realize that just because the message got BAYES_99 doesn't mean
 there are no tokens in it that can be learned from. Spam mutates. New
 phrases and words creep in. These need to be learned from, even if
 the current message is already BAYES_99.

Yes, this is very valuable info for others also I believe.

Thanks for your help on this,
mfg zmi
-- 
// Michael Monnerie, Ing.BSc-  http://it-management.at
// Tel: 0660/4156531  .network.your.ideas.
// PGP Key:   lynx -source http://zmi.at/zmi3.asc | gpg --import
// Fingerprint: 44A3 C1EC B71E C71A B4C2  9AA6 C818 847C 55CB A4EE
// Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE


pgpe1NOeeBvm0.pgp
Description: PGP signature