Re: Personal rule matching ToCc

2006-02-07 Thread jdow

From: Ramprasad [EMAIL PROTECTED]


Hi,
  I want to write a personal domain-wise rule 
The rule I am using now is 


header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i

But the above rule would match @domain.net as well as
@domain.net.in 


You have not tried it, have you? The \b assures that it will not match
on @domain.net.in.

{^_^}


Re: Personal rule matching ToCc

2006-02-07 Thread Ramprasad
On Tue, 2006-02-07 at 00:15 -0800, jdow wrote:
 From: Ramprasad [EMAIL PROTECTED]
 
  Hi,
I want to write a personal domain-wise rule 
  The rule I am using now is 
  
  header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i
  
  But the above rule would match @domain.net as well as
  @domain.net.in 
 
 You have not tried it, have you? The \b assures that it will not match
 on @domain.net.in.

I have tested this with SA3.1
ToCc =~ /[EMAIL PROTECTED]/i   matched  @domain.net as well as
@domain.net.in

Thanks
Ram



Re: Pump and Dump SARE rules

2006-02-07 Thread Jeremy
I've been using it, seems to work well for me on my MDaemon server

Cheers,
Jeremy



Doc Schneider [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Chris Santerre wrote:


   -Original Message-
   From: Doc Schneider [mailto:[EMAIL PROTECTED]
   Sent: Friday, January 27, 2006 5:14 PM
   To: users@spamassassin.apache.org
   Subject: Pump and Dump SARE rules
  
  
   http://rulesemporium.com/rules/70_sare_stocks.cf
  
   Is the latest addition to the SARE rule sets.
  
   -Doc (SARE Ninja)

 This has to be the MOST test ruleset of any SARE release. :)  If you guys 
 only knew how long Doc and the other SARE ninjas have been working on 
 this set. I think a giant *sigh* of relief can be heard throughout the 
 lands.

 Please give feedback.  And this set will be continualy updated.

 --Chris


 I just updated this ruleset with some new rules and also added in the 
 counts for the scoring.

 Also updated http://www.rulesemporium.com/rules.htm adding this new set to 
 it.

 And please if anyone is using this set let us know we like feedback!

 -Doc (SARE Ninja)
 





Re: Couple of newbie questions... (repost)

2006-02-07 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Philip Prindeville wrote:
 Matt Kettler wrote:
 
 Philip Prindeville wrote:
  

 I.e. any provider or country that doesn't have an institutional policy
 of prosecuting spam senders...
   
 Erm, so you're going to block all of the US, correct?
  

 
 No.  We have laws against spam that hopefully most legitimate ISP's attempt
 to conform to.
 

Interestingly enough, Japan also has laws against spam that most
legitimate ISPs attemp to conform to.  You probably weren't aware of that.

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFD6GovE2gsBSKjZHQRAlnIAKCVG92Hp7VPSw86rB+6RhuPPY/bzwCgzn1Q
7PGyS3eN8ekYWCkYBgxf058=
=/zRt
-END PGP SIGNATURE-


Re: Bayes filtering only runs about 70% of the time - SOLVED

2006-02-07 Thread Ole Nomann Thomsen
Ole Nomann Thomsen wrote:

 Hi All.
 
 I was scanning my SA log-files, when i noticed that about 30% of the
 result: -lines do not contain any BAYES_* score.
 
 Version info:
 SpamAssassin version 3.1.0
   running on Perl version 5.8.4
   on Debian and Redhat Linux

The explanation was two-fold:

1. Whenever two spamd-demons ran opportunistic journal syncing approximately
*at the same time*, one of them would lose the bayes-check.
2. The last journal sync atime was *stuck* at sometimes march 2005. So any
run of spamd would sync the journal.

Ad 1: This will never happen in normal circumstances, as syncing is only run
once a day (by one spamd, i surmise). 

Ad 2: This is probably because of the upgrade to SA 3.1.0 where i dumped the
old-format bayes and inserted it in the new format. I must have goofed
somewhere.
To solve, i only had to:

stop SA
sa-learn --backup  /tmp/backup.txt'
sa-learn --restore /tmp/backup.txt
start SA

So now I get 1 when I
perl -ne 'if (/result:/) {$n++; $b++ if (/BAYES/);} } print $b/$n,\n; {' 
current

Thanks for all your inputs, 

Ole.



Re: Re[2]: spam still isn't being caught much.

2006-02-07 Thread Brian S. Meehan
Hi Bob,
spamassassin --lint returns no errors. I've been checking it after I add
rules.

You can view my entire local.cf at http://www.meehanontheweb.com/local.cf.txt

I haven't seen ALL_TRUSTED in any of the x-spam-status headers in any
message.

So far, what I've been doing is logging in as root and running
sa-learn --ham /path/to/each/mail/folder/*   and of course
sa-learn --spam /path/to/spam/folder/*
my email account is actually under a username of brian though if I run it
logged in a brian, it doesn't change the size of the bayes_toks 
bayes_seen files. Is any of this wrong?

Thanks,
Brian

On Mon, February 6, 2006 23:14, Robert Menschel wrote:
 Sans rules?  What rules do you have in local.cf?  When you do a
 spamassassin --lint (no -D for this test), do you get any error
 messages? An error in custom rules could contribute to the problems
 you're having.

 Bob Menschel








message sneaking past

2006-02-07 Thread Julian Underwood
http://168.100.199.67/message.txt

Sorry if this is the third time I've posted this, I've been having
problems posting.  Anyhow, I've been receiving the above type of emails
(vertical drug advertisements) and they appear to receive a very high
score.  However they always seem to get past spamassassin--other spams
get tagged and redirected to our spam box fine.

The only reason I can think that they may not be getting sent to our
spam box is either SURBL scores aren't registering or somehow these
types of messages can get around spamassassin... Could anyone shed some
light on why these types of messages are getting by?

Thanks a lot,

Julian




Re: Personal rule matching ToCc

2006-02-07 Thread Matt Kettler
jdow wrote:
 From: Ramprasad [EMAIL PROTECTED]

 Hi,
   I want to write a personal domain-wise rule The rule I am using now is
 header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i

 But the above rule would match @domain.net as well as
 @domain.net.in 

 You have not tried it, have you? The \b assures that it will not match
 on @domain.net.in.
Um, J.. \b will match any word boundary.. a . is not a word character
so /\.net\b/ matches .net.in.





Re: message sneaking past

2006-02-07 Thread Evan Platt

At 07:13 AM 2/7/2006, you wrote:

http://168.100.199.67/message.txt

Sorry if this is the third time I've posted this, I've been having
problems posting.  Anyhow, I've been receiving the above type of emails
(vertical drug advertisements) and they appear to receive a very high
score.  However they always seem to get past spamassassin--other spams
get tagged and redirected to our spam box fine.

The only reason I can think that they may not be getting sent to our
spam box is either SURBL scores aren't registering or somehow these
types of messages can get around spamassassin... Could anyone shed some
light on why these types of messages are getting by?


So the message gets tagged by spamassassin properly, but not moved to 
a spam 'box'.


So how is the message moved to the spambox? Procmail? 



ADMIN: User with OOO reply ...

2006-02-07 Thread Evan Platt
Please unsubscribe [EMAIL PROTECTED] I'm getting a Out 
Of Office reply to each of my posts to the list.


Thanks.



Re: message sneaking past

2006-02-07 Thread Matt Kettler
Julian Underwood wrote:
 http://168.100.199.67/message.txt

 Sorry if this is the third time I've posted this, I've been having
 problems posting.  Anyhow, I've been receiving the above type of emails
 (vertical drug advertisements) and they appear to receive a very high
 score.  However they always seem to get past spamassassin--other spams
 get tagged and redirected to our spam box fine.

 The only reason I can think that they may not be getting sent to our
 spam box is either SURBL scores aren't registering or somehow these
 types of messages can get around spamassassin... Could anyone shed some
 light on why these types of messages are getting by?

   
That message looks like it got tagged just fine. What mechanism do you
use to move messages to your spam box?




syntax error, what to do?

2006-02-07 Thread amavis
Hi there,

i have the newes spmassassin and amavisd-new installed.

if i run amavisd-new debug, i receive an systnax error
inside the debug messages:

Feb  7 16:46:10 mail..de /usr/sbin/amavisd-new[31097]: SpamControl: 
initializing Mail::SpamAssassin
auto-whitelist: syntax error at (eval 851) line 2, near require 
Mail::SpamAssassin:
syntax error at (eval 851) line 3, near Mail::SpamAssassin:


what can i do?

Thax peter



Re: spam still isn't being caught much.

2006-02-07 Thread Matt Kettler
Brian S. Meehan wrote:
 Hi Bob,
 spamassassin --lint returns no errors. I've been checking it after I add
 rules.

 You can view my entire local.cf at http://www.meehanontheweb.com/local.cf.txt

I spot one minor bug in your local.cf:

bayes_file_mode 0666

That should be 7's not 6's.. (note in the docs the default is 0700, not 0600) 
bayes_file_mode is really a mask, and sometimes gets used in directory creation.

However, that shouldn't be causing your problems..




Re: spam still isn't being caught much.

2006-02-07 Thread Matt Kettler
Brian S. Meehan wrote:
 Dirk,
 I adjusted the rights as follows in /etc/mail/spamassassin:
 drw-rw-rw-  3 root root   352 Feb  5 17:04 .
 drwxr-xr-x  3 root root80 Jul 13  2005 ..
 drw-rw-rw-  2 root users   48 Nov 29 15:15 bayes
 -rw---  1 root root60 Feb  5 07:54 bayes.lock
 -rw-rw-rw-  1 root users 1.2M Feb  4 17:56 bayes_seen
 -rw-rw-rw-  1 root users 5.3M Feb  5 07:54 bayes_toks
 -rw-rw-rw-  1 root root  5.3M Feb  4 17:38 bayes_toks.expire8083

Brian.. I see that your local.cf contains:

bayes_path /etc/mail/spamassassin/bayes

Note the docs carefully, bayes path isn't just a path. That option specifies 
the bayes DB is stored in /etc/mail/spamassassin, and stored in files beginning 
with the name bayes.


However, /etc/mail/spamassassin contains a sub-directory named bayes... did you 
create that? Please remove it. I know in 2.6x versions this would screw up SA's 
file locking.

In your bayes path there must be files starting with the word bayes other 
than the ones SA creates.

I'd also really question using /etc/mail/spamassassin for your bayes database. 
The bayes directory needs to be RWX to all the users who SA runs as. You 
*really* don't want /etc/mail/spamassassin being world rwx as that's a gigantic 
security hole that a smart user could likely abuse to gain root privilege 
whenever root calls SA.
 

May I suggest creating a /var/spamassassin/bayes/ directory that's world RWX 
and then setting 

bayes_path  /var/spamassassin/bayes/bayes




Re: syntax error, what to do?

2006-02-07 Thread Michael Monnerie
On Dienstag, 7. Februar 2006 16:57 [EMAIL PROTECTED] wrote:
 Feb  7 16:46:10 mail..de /usr/sbin/amavisd-new[31097]:
 SpamControl: initializing Mail::SpamAssassin auto-whitelist: syntax
 error at (eval 851) line 2, near require Mail::SpamAssassin: syntax
 error at (eval 851) line 3, near Mail::SpamAssassin:

 what can i do?

The message says that your auto-whitelist entry is wrong. Check your 
config. There's a : as the last sign, looks wrong.

mfg zmi
-- 
// Michael Monnerie, Ing.BSc  ---   it-management Michael Monnerie
// http://zmi.at   Tel: 0660/4156531  Linux 2.6.11
// PGP Key:   lynx -source http://zmi.at/zmi2.asc | gpg --import
// Fingerprint: EB93 ED8A 1DCD BB6C F952  F7F4 3911 B933 7054 5879
// Keyserver: www.keyserver.net Key-ID: 0x70545879


pgpfoklXuIe9Z.pgp
Description: PGP signature


Spamassassin local.cf

2006-02-07 Thread carlos baptista
Hi,

Does anyone can help me with a strage problem? I have installed a 
spamassassin+clamv+qmail, and it's working. Today I nedded to change the 
local.cf, but spamassassin just keep using the old settings. I already restart 
the services, even rebooted the server. 

These are my settings:

To start spamd
/usr/sbin/spamd -x -u spamd -H /home/spamd -d -s /var/log/spamd.log -m 10 
--max-conn-per-child=10 --siteconfigpath=/etc/mail/spamassassin

old local.cf
# How many hits before a message is considered spam.
required_score 5.0

# Change the subject of suspected spam
rewrite_header subject :SPAM:

# Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe 0

# Enable the Bayes system
use_bayes   1
bayes_path /home/spamd/.spamassassin

# Enable Bayes auto-learning
bayes_auto_learn  1

# Enable or disable network checks
skip_rbl_checks 0
use_razor2  1
use_dcc 1
use_pyzor   1

# Mail using languages used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_languages all

# Mail using locales used in these country codes will not be marked
# as being possibly spam in a foreign language.
ok_locales all



Changes to local.cf
# Change the subject of suspected spam
rewrite_header subject SPAM
rewrite_header from [EMAIL PROTECTED]

# Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
report_safe 2


Does anyone have a clue?

Thanks

Carlos Baptista



RE: Bayes filtering only runs about 70% of the time - SOLVED

2006-02-07 Thread Mike Sassaman


 Ole Nomann Thomsen wrote:
 
  Hi All.
  
  I was scanning my SA log-files, when i noticed that about 30% of the
  result: -lines do not contain any BAYES_* score.
  
  Version info:
  SpamAssassin version 3.1.0
running on Perl version 5.8.4
on Debian and Redhat Linux
 
 The explanation was two-fold:
 
 1. Whenever two spamd-demons ran opportunistic journal 
 syncing approximately
 *at the same time*, one of them would lose the bayes-check.
 2. The last journal sync atime was *stuck* at sometimes 
 march 2005. So any
 run of spamd would sync the journal.
 
 Ad 1: This will never happen in normal circumstances, as 
 syncing is only run
 once a day (by one spamd, i surmise). 
 
 Ad 2: This is probably because of the upgrade to SA 3.1.0 
 where i dumped the
 old-format bayes and inserted it in the new format. I must have goofed
 somewhere.
 To solve, i only had to:
 
 stop SA
 sa-learn --backup  /tmp/backup.txt'
 sa-learn --restore /tmp/backup.txt
 start SA
 
 So now I get 1 when I
 perl -ne 'if (/result:/) {$n++; $b++ if (/BAYES/);} } print 
 $b/$n,\n; {' 
 current
 
 Thanks for all your inputs, 
 
 Ole.
 

Thanks for posting your solution.  I think I am experiencing a similar
problem... Although bayes seems to be working fine most of the time, Bayes
tags do not appear on many false negatives.  Before I try anything, though,
I have a more general related question - can someone give me some context
about the numbers in the 'sa-learn --dump magic' output?  Here's mine:

[mail] /home/_vilter/.spamassassin  sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  62789  0  non-token data: nspam
0.000  0725  0  non-token data: nham
0.000  0 144079  0  non-token data: ntokens
0.000  0  946707021  0  non-token data: oldest atime
0.000  0 1139412959  0  non-token data: newest atime
0.000  0 1139331748  0  non-token data: last journal sync
atime
0.000  0 1139328045  0  non-token data: last expiry atime
0.000  0 111811  0  non-token data: last expire atime
delta
0.000  0   9624  0  non-token data: last expire
reduction count

What is the number for 'last journal sync atime'?  Is that seconds since the
last sync?  If so does that indicate my last sync time is 'stuck' like
Ole's?  If so what would cause that?  I am running 3.0.4, it is a fresh
database install not an upgrade.

Here's my database:

[mail] /home/_vilter/.spamassassin  ls -al
total 8730
drwx--  2 _vilter  wheel  512 Feb  7 12:37 .
drwxr-xr-x  4 _vilter  wheel  512 Jan 10 13:41 ..
-rw---  1 root wheel   78 Jan 26 11:55
bayes.lock.mail.stcservices.com.3788
-rw-rw-rw-  1 _vilter  wheel84456 Feb  7 12:37 bayes_journal
-rw-rw-rw-  1 _vilter  wheel  5111808 Feb  7 12:37 bayes_seen
-rw-rw-rw-  1 _vilter  wheel  5423104 Feb  7 12:37 bayes_toks

bayes_seen and bayes_toks are modified pretty much constantly.  Is this the
expected behavior, or should the sync only occur once a day?

Sorry for all the questions, I'm obviously new at this.  Thanks for any
input.



Re: Pump and Dump SARE rules

2006-02-07 Thread Jeff Chan
On Sunday, February 5, 2006, 3:41:22 PM, Doc Schneider wrote:
 I just updated this ruleset with some new rules and also added in the
 counts for the scoring.

 Also updated http://www.rulesemporium.com/rules.htm adding this new set 
 to it.

 And please if anyone is using this set let us know we like feedback!

 -Doc (SARE Ninja)

Thanks much Doc and all!  It seems to work very well.

What SARE rules would folks recommend for a default 3.1.0
SpamAssassin installation (non RDJ)?

Currently we're using:

  http://rulesemporium.com/rules/70_sare_stocks.cf
  http://www.rulesemporium.com/rules/99_sare_fraud_post25x.cf
  http://www.rulesemporium.com/rules/70_sare_adult.cf

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Spamassassin Learn

2006-02-07 Thread trichard
Can you just feed spamassassin spam or do you need to give it ham also?

I read the docs and it didn't say you had to feed it ham.

I then read another doc and it said you should feed it equal amounts of
spam and ham.



Re: Personal rule matching ToCc

2006-02-07 Thread Loren Wilton
  header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i
 
  But the above rule would match @domain.net as well as
  @domain.net.in

 You have not tried it, have you? The \b assures that it will not match
 on @domain.net.in.

Well, no, it will.   The dot is a wordbreak, and \b is only looking for a
break.  So it would match.

There are several ways to go.  A fairly trivial one that doesn't use a
lookahead might be

header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i

Loren



RE: Spamassassin Learn

2006-02-07 Thread Bowie Bailey
[EMAIL PROTECTED] wrote:
 Can you just feed spamassassin spam or do you need to give it ham
 also? 
 
 I read the docs and it didn't say you had to feed it ham.
 
 I then read another doc and it said you should feed it equal amounts
 of spam and ham.

You need to feed it both.  I wouldn't worry too much about the ratios,
but the Bayes scoring won't take effect until you have learned at least
200 ham and 200 spam.

-- 
Bowie


Re: Spamassassin Learn

2006-02-07 Thread mike
200 of each to even make it start working on sa-learn email.  I then 
feed it representative amounts of ham and spam.  The ratio it comes in.


[EMAIL PROTECTED] wrote:


Can you just feed spamassassin spam or do you need to give it ham also?

I read the docs and it didn't say you had to feed it ham.

I then read another doc and it said you should feed it equal amounts of
spam and ham.


 





Re: Re[2]: spam still isn't being caught much.

2006-02-07 Thread Loren Wilton
 So far, what I've been doing is logging in as root and running
 sa-learn --ham /path/to/each/mail/folder/*   and of course
 sa-learn --spam /path/to/spam/folder/*
 my email account is actually under a username of brian though if I run it
 logged in a brian, it doesn't change the size of the bayes_toks 
 bayes_seen files. Is any of this wrong?

Um, we may be getting somewhere.  How again are you calling SA to process
spam?  Specifically, which usercode will SA be running under when it
processes your spam?  We can be sure that it will NOT be running as 'root',
since SA won't run as root even if you try to do that (it will switch to
'nobody').  I forget if you are calling from Amvis or the like, but if you
are they often determine the usercode SA will run under.

I have two guesses at the moment: 1) SA is running as nobody, which doesn't
have a home directory, so can't create bayes files.  2) SA can get to global
bayes files, but because you have been creating/maintaining them as root, it
doesn't have the permissions it needs to write to them.

Oh wait...

bayes_file_mode 0666

Now there's a problem.  This needs to be 0777.

I think you are the person that is getting 'failed' a lot when trying to
process bayes?  Yea, you probably have a permissions problem here.

Loren



Re: message sneaking past

2006-02-07 Thread Loren Wilton
 http://168.100.199.67/message.txt

I cna't seem to connect to your site, so I'll just assume that is a standard
vertical drug spam.

 they appear to receive a very high
 score.  However they always seem to get past spamassassin--other spams
 get tagged and redirected to our spam box fine.

Now wait, something doesn't make sense here.  Are you saying that you see
'ham' that shows a very high score (above the threshold) but it somehow
wasn't flagged as spam?

Or are you saying that when one of these puppies gets through and you go
back later and test it it gets a very high score?

 The only reason I can think that they may not be getting sent to our
 spam box is either SURBL scores aren't registering or somehow these
 types of messages can get around spamassassin... Could anyone shed some
 light on why these types of messages are getting by?

The answer could be both.

If you don't have sare_specific.cf (I believe it is) then these Leo drug
spams will sail right past the SA standard rules.  Even with the sare rules
it is a bit of a fight; Leo is pretty good about updating the format pretty
frequently.

As for SURBL, it will certainly catch these - IF you aren't one of the first
lucky winners that gets the initial batch before they can show up in SURBL.
I suspect this is probably what is happening when you say they have a high
score but sneak past.  They probably had a low score when they first showed
up, and only have a high score now that you run it through by hand some
hours (or even minutes) later.

Grab the SARE rules and most of these will get caught I suspect.  However,
if you are somehow unlucky enough to be on the leading edge of most batches,
you will probably always have some leaking through until SURBL can catch up.

Loren



Re: Pump and Dump SARE rules

2006-02-07 Thread Loren Wilton
 What SARE rules would folks recommend for a default 3.1.0
 SpamAssassin installation (non RDJ)?

I'd recommend at least all of the 0 rule sets,a nd probably the matching
1 rule sets also.  Also sare_specific.  Some of the old standbys like
tripwire (available from the sare site) also are still doing good work.

Loren



Re: Couple of newbie questions... (repost)

2006-02-07 Thread Philip Prindeville

Alan Premselaar wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Philip Prindeville wrote:
 


Matt Kettler wrote:

   


Philip Prindeville wrote:


 


I.e. any provider or country that doesn't have an institutional policy
of prosecuting spam senders...
 
   


Erm, so you're going to block all of the US, correct?


 


No.  We have laws against spam that hopefully most legitimate ISP's attempt
to conform to.

   



Interestingly enough, Japan also has laws against spam that most
legitimate ISPs attemp to conform to.  You probably weren't aware of that.
 



You'd never know it from their effectiveness!

-Philip




Re: Couple of newbie questions... (repost)

2006-02-07 Thread Matt Kettler
Philip Prindeville wrote:

   

 Interestingly enough, Japan also has laws against spam that most
 legitimate ISPs attemp to conform to.  You probably weren't aware of
 that.
  

 
 You'd never know it from their effectiveness!

This isn't really any different than the US with u-can-spam. I mean, yes the
prosecuted Leo, but they never collected any money and he's still operating.


Re: Couple of newbie questions... (repost)

2006-02-07 Thread jdow

From: Philip Prindeville [EMAIL PROTECTED]


Alan Premselaar wrote:


-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



Philip Prindeville wrote:
 


Matt Kettler wrote:

   


Philip Prindeville wrote:


 


I.e. any provider or country that doesn't have an institutional policy
of prosecuting spam senders...
 
   


Erm, so you're going to block all of the US, correct?


 


No.  We have laws against spam that hopefully most legitimate ISP's attempt
to conform to.

   



Interestingly enough, Japan also has laws against spam that most
legitimate ISPs attemp to conform to.  You probably weren't aware of that.
 



You'd never know it from their effectiveness!

-Philip


Gee, most spam seems to come from email. I think I'll simply block email
here. I expect I will see a magical reduction in the amount of spam I
receive.

{O,o}   (I see that the men with the canvas coats that have funny sleeves
   are coming for me. They must not like reductio ad absurdum)


Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
[EMAIL PROTECTED] wrote:
 Can you just feed spamassassin spam or do you need to give it ham also?
 
 I read the docs and it didn't say you had to feed it ham.
 
 I then read another doc and it said you should feed it equal amounts of
 spam and ham.

Yes, you really should feed it both. You also should strive for a 1:1 ratio of
spam and nonspam, but don't kill yourself to get there.

SA's use of chi-squared combining makes it very tolerant of wild imbalances in
training. However, the closer you are to a 1:1 ratio the better SA will be able
to distinguish tokens that are present in both kinds of mail and ignore them. So
this is a worthwhile goal to strive for as long as it doesn't become a burden.

My current training ratio is about 7:1 spam:nonspam, but in the past it's been
as bad as 20:1. Both of those are very far off from equal amounts, but the
imbalance has never caused me any problems.

From my sa-learn --dump magic output as of today:
0.000  0 995764  0  non-token data: nspam
0.000  0 145377  0  non-token data: nham

That works out to a ratio of 6.85:1






Re: Spamassassin local.cf

2006-02-07 Thread Theo Van Dinter
On Tue, Feb 07, 2006 at 04:51:00PM +, carlos baptista wrote:
 Does anyone can help me with a strage problem? I have installed a 
 spamassassin+clamv+qmail, and it's working. Today I nedded to change the 
 local.cf, but spamassassin just keep using the old settings. I already 
 restart the services, even rebooted the server. 
[...]
 Changes to local.cf
 # Change the subject of suspected spam
 rewrite_header subject SPAM
 rewrite_header from [EMAIL PROTECTED]
 
 # Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
 report_safe 2
 
 Does anyone have a clue?

My guess is that whatever you have linking qmail and SpamAssassin is doing the
rewrites, so changing local.cf doesn't actually change the part you want
changed.

-- 
Randomly Generated Tagline:
A journey of a thousand miles begins with but a step. - Chinese Proverb


pgp5n4IHK7b16.pgp
Description: PGP signature


Re: Spamassassin Learn

2006-02-07 Thread Clay Davis
Does anyone have any good techniques for capturing a sample of ham that can be 
used as the ham corpus.  I'm in a corporate environment and am not keen on the 
idea of intercepting non-spam messages.  I will if I have to, but was hoping 
someone had a better idea.

Regards,
Clay


 On 2/7/2006 at 3:16 pm, in message [EMAIL PROTECTED], Matt Kettler
[EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Can you just feed spamassassin spam or do you need to give it ham also?
 
 I read the docs and it didn't say you had to feed it ham.
 
 I then read another doc and it said you should feed it equal amounts of
 spam and ham.
 
 Yes, you really should feed it both. You also should strive for a 1:1 ratio 
 of
 spam and nonspam, but don't kill yourself to get there.
 
 SA's use of chi-squared combining makes it very tolerant of wild imbalances 
 in
 training. However, the closer you are to a 1:1 ratio the better SA will be 
 able
 to distinguish tokens that are present in both kinds of mail and ignore 
 them. So
 this is a worthwhile goal to strive for as long as it doesn't become a 
 burden.
 
 My current training ratio is about 7:1 spam:nonspam, but in the past it's 
 been
 as bad as 20:1. Both of those are very far off from equal amounts, but the
 imbalance has never caused me any problems.
 
 From my sa-learn --dump magic output as of today:
 0.000  0 995764  0  non-token data: nspam
 0.000  0 145377  0  non-token data: nham
 
 That works out to a ratio of 6.85:1



Re: Spamassassin local.cf

2006-02-07 Thread Matt Kettler
carlos baptista wrote:
 Hi,
 
 Does anyone can help me with a strage problem? I have installed a 
 spamassassin+clamv+qmail, and it's working. Today I nedded to change the 
 local.cf, but spamassassin just keep using the old settings. I already 
 restart the services, even rebooted the server. 

snip

 
 Changes to local.cf
 # Change the subject of suspected spam
 rewrite_header subject SPAM
 rewrite_header from [EMAIL PROTECTED]
 
 # Encapsulate spam in an attachment (0=no, 1=yes, 2=safe)
 report_safe 2
 
 
 Does anyone have a clue?
 

How do you call spamassassin from qmail? Do you use qmail-scanner? Do you use
the fast spamassassin option?

If so, it's qmail-scanner that's doing your tagging, not spamassassin. You need
to disable fast spamassassin, or change your qmail-scanner config to get the
format you want.


Re: Couple of newbie questions... (repost)

2006-02-07 Thread jdow

From: Matt Kettler [EMAIL PROTECTED]


Philip Prindeville wrote:

  


Interestingly enough, Japan also has laws against spam that most
legitimate ISPs attemp to conform to.  You probably weren't aware of
that.
 



You'd never know it from their effectiveness!


This isn't really any different than the US with u-can-spam. I mean, yes the
prosecuted Leo, but they never collected any money and he's still operating.


I keep saying that some attitude readjustment is probably the only way
to solve the problem. That can start with a standard kneecapping and
continue onwards until attitude is adjusted. Or we could adopt the
Mohammedan strategy of cutting off hands.

{o.o}Not feeling very politically correct today.


Re: Spamassassin Learn

2006-02-07 Thread jdow

This is what automatic training attempts to solve.

If you are reliably nailing spam with your current setup you can experiment
with the automatic learning. But I'd widen the score ranges a little, as
far as is practical for your mail mix.

{^_^}
- Original Message - 
From: Clay Davis [EMAIL PROTECTED]



Does anyone have any good techniques for capturing a sample of ham that can be used as the 
ham corpus.  I'm in a corporate environment and am not keen on the idea of intercepting 
non-spam messages.  I will if I have to, but was hoping someone had a better idea.


Regards,
Clay



On 2/7/2006 at 3:16 pm, in message [EMAIL PROTECTED], Matt Kettler

[EMAIL PROTECTED] wrote:

[EMAIL PROTECTED] wrote:

Can you just feed spamassassin spam or do you need to give it ham also?

I read the docs and it didn't say you had to feed it ham.

I then read another doc and it said you should feed it equal amounts of
spam and ham.


Yes, you really should feed it both. You also should strive for a 1:1 ratio
of
spam and nonspam, but don't kill yourself to get there.

SA's use of chi-squared combining makes it very tolerant of wild imbalances
in
training. However, the closer you are to a 1:1 ratio the better SA will be
able
to distinguish tokens that are present in both kinds of mail and ignore
them. So
this is a worthwhile goal to strive for as long as it doesn't become a
burden.

My current training ratio is about 7:1 spam:nonspam, but in the past it's
been
as bad as 20:1. Both of those are very far off from equal amounts, but the
imbalance has never caused me any problems.

From my sa-learn --dump magic output as of today:
0.000  0 995764  0  non-token data: nspam
0.000  0 145377  0  non-token data: nham

That works out to a ratio of 6.85:1 




Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:
 My current training ratio is about 7:1 spam:nonspam, but in the past it's been
 as bad as 20:1. Both of those are very far off from equal amounts, but the
 imbalance has never caused me any problems.
 
 From my sa-learn --dump magic output as of today:
 0.000  0 995764  0  non-token data: nspam
 0.000  0 145377  0  non-token data: nham

Interesting... it appears I actually need to do a better job of training
spam!
sa-learn --dump magic|grep am
0.000  0  98757  0  non-token data: nspam
0.000  0 255134  0  non-token data: nham

I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
that does...
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:
 My current training ratio is about 7:1 spam:nonspam, but in the past it's 
 been
 as bad as 20:1. Both of those are very far off from equal amounts, but the
 imbalance has never caused me any problems.

 From my sa-learn --dump magic output as of today:
 0.000  0 995764  0  non-token data: nspam
 0.000  0 145377  0  non-token data: nham
 
 Interesting... it appears I actually need to do a better job of training
 spam!
 sa-learn --dump magic|grep am
 0.000  0  98757  0  non-token data: nspam
 0.000  0 255134  0  non-token data: nham
 
 I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
 that does...

Actually, you can't ever set the threshold below 6.0. SA has a hard-coded
requirement of at least 3.0 header points, and 3.0 body points before it will
autolearn as spam. Therefore, any setting below 6 is moot, because the two 3.0
requirements can't both be met without a score of at least 6.

I would also check to make sure you don't have a lot of spam coming in that's
getting autolearned as ham. (note: the learner's idea of score is very different
than the final message score, so a message CAN be tagged as spam, and still get
autolearned as ham)




Re: Spamassassin Learn

2006-02-07 Thread jdow

From: Jim C. Nasby [EMAIL PROTECTED]


On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:

My current training ratio is about 7:1 spam:nonspam, but in the past it's been
as bad as 20:1. Both of those are very far off from equal amounts, but the
imbalance has never caused me any problems.

From my sa-learn --dump magic output as of today:
0.000  0 995764  0  non-token data: nspam
0.000  0 145377  0  non-token data: nham


Interesting... it appears I actually need to do a better job of training
spam!
sa-learn --dump magic|grep am
0.000  0  98757  0  non-token data: nspam
0.000  0 255134  0  non-token data: nham

I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
that does...


If you have the option manually train the spam for awhile. If the threshold
is set too low for autolearning spam you will find yourself with a mangled
database that has a high percentage of actual ham learned as spam. That is
not a good thing. You might actually lower the ham threshold, as well. It
looks like you might be at risk of learning spam as ham. (And in fact may
have done this already to a high degree.)

{^_^}


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 04:40:40PM -0500, Matt Kettler wrote:
 I would also check to make sure you don't have a lot of spam coming in that's
 getting autolearned as ham. (note: the learner's idea of score is very 
 different
 than the final message score, so a message CAN be tagged as spam, and still 
 get
 autolearned as ham)
 
What would be the easiest way to do that? Grep through my caughtspam
maildir?
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 04:40:40PM -0500, Matt Kettler wrote:
 I would also check to make sure you don't have a lot of spam coming in that's
 getting autolearned as ham. (note: the learner's idea of score is very 
 different
 than the final message score, so a message CAN be tagged as spam, and still 
 get
 autolearned as ham)
  
 What would be the easiest way to do that? Grep through my caughtspam
 maildir?

That would be the way I'd check.. grep for autolearn=ham


Re: Personal rule matching ToCc

2006-02-07 Thread hamann . w

Hi,

I was experimenting with something similar, although as a client of a big ISP I 
need full match
rather than domain match.
My experience so far: some mail that does not have me in To or Cc is definitely 
spam,
or worse. The other part is legit mail, mostly from mailinglists or other mail 
forwarders,
and needs explicit whitelisting (or a meta rule)
Expect to check the return path (such as [EMAIL PROTECTED])

Wolfgang Hamann

 
 Hi,
I want to write a personal domain-wise rule 
 The rule I am using now is 
 
  header __TO_DOMAIN_NETToCc =~ /[EMAIL PROTECTED]/i
 
 But the above rule would match @domain.net as well as
 @domain.net.in 
 Which is the best way to match only @domain.net and not @domain.net.in 
 
 Thanks
 Ram
 
 




Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 05:02:25PM -0500, Matt Kettler wrote:
 Jim C. Nasby wrote:
  On Tue, Feb 07, 2006 at 04:40:40PM -0500, Matt Kettler wrote:
  I would also check to make sure you don't have a lot of spam coming in 
  that's
  getting autolearned as ham. (note: the learner's idea of score is very 
  different
  than the final message score, so a message CAN be tagged as spam, and 
  still get
  autolearned as ham)
   
  What would be the easiest way to do that? Grep through my caughtspam
  maildir?
 
 That would be the way I'd check.. grep for autolearn=ham
 
Nothing autolearned. Interesting... I know I've fed my sent mail as ham,
but I'm pretty sure I only did that once or twice...

Guess I'll see how the numbers change with the low autolearn
threshold...
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Mike Jackson
Does anyone have any good techniques for capturing a sample of ham that 
can be used as the ham corpus.  I'm in a corporate environment and am not 
keen on the idea of intercepting non-spam messages.  I will if I have to, 
but was hoping someone had a better idea.


Depending on your MTA/MDA, you might be able to do it on the fly so that an 
actual copy of the message isn't necessary. For instance, if the messages 
pass through procmail, learn them just before delivery if the X-Spam-Status 
header isn't set to yes. Oh, and make sure you pass the --no-sync flag to 
sa-learn, then schedule the syncing for sometime during off-peak hours. 



Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 05:02:25PM -0500, Matt Kettler wrote:
 Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 04:40:40PM -0500, Matt Kettler wrote:
 I would also check to make sure you don't have a lot of spam coming in 
 that's
 getting autolearned as ham. (note: the learner's idea of score is very 
 different
 than the final message score, so a message CAN be tagged as spam, and 
 still get
 autolearned as ham)
  
 What would be the easiest way to do that? Grep through my caughtspam
 maildir?
 That would be the way I'd check.. grep for autolearn=ham
  
 Nothing autolearned. 

Nothing autolearned at all? or nothing autolearned as ham?

Are there any autolearn strings? Are they all autolearn=no? are there any
decent number that are autolearn=failed or autolearn=disabled?


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 01:45:48PM -0800, jdow wrote:
 From: Jim C. Nasby [EMAIL PROTECTED]
 
 On Tue, Feb 07, 2006 at 03:16:57PM -0500, Matt Kettler wrote:
 My current training ratio is about 7:1 spam:nonspam, but in the past it's 
 been
 as bad as 20:1. Both of those are very far off from equal amounts, but the
 imbalance has never caused me any problems.
 
 From my sa-learn --dump magic output as of today:
 0.000  0 995764  0  non-token data: nspam
 0.000  0 145377  0  non-token data: nham
 
 Interesting... it appears I actually need to do a better job of training
 spam!
 sa-learn --dump magic|grep am
 0.000  0  98757  0  non-token data: nspam
 0.000  0 255134  0  non-token data: nham
 
 I just changed bayes_auto_learn_threshold_spam to 5.0, we'll see what
 that does...
 
 If you have the option manually train the spam for awhile. If the threshold
 is set too low for autolearning spam you will find yourself with a mangled
 database that has a high percentage of actual ham learned as spam. That is
 not a good thing. You might actually lower the ham threshold, as well. It
 looks like you might be at risk of learning spam as ham. (And in fact may
 have done this already to a high degree.)

See my other reply, which showed stats for all spam over 5 this month.
The stats for last month are:
grep -r autolearn oldspam/ | grep -v 'Binary file' | sed -e
's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c
5862 no
1225 spam
  24 unavailable

So based on this, I'd think it's not learning spam as ham...

BTW, autolearn ham should be at it's default setting...

What's interesting is that I get about 10-20 spams a day that are scored
below 3, and another 30-50 a day that are between 3 and 5 (which go to
my 'probablespam' folder). I send all of these to sa via spamassassin
-r, so I would have thought that I'd have far more spam in the database
than ham...
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 Are there any autolearn strings? Are they all autolearn=no? are there any
 decent number that are autolearn=failed or autolearn=disabled?

 
 grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
 's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
 1545 no
  140 spam
4 unavailable

Fair enough, that at least suggests that the autolearner is working. However,
that learning ratio is pretty low.

Are you using network tests? Without DNSBLs it's often hard to get enough header
points to cause spam learning..

(Note I use mailscanner, hence the odd log syntax)

 grep is spam, /var/log/maillog |wc -l
   3434
 grep is spam, /var/log/maillog|grep autolearn=spam |wc -l
   2766
 grep is spam, /var/log/maillog|grep autolearn=not spam | wc -l
  0

So I'm autolearning about 80% of my tagged spam as spam, and none as ham.

I'm also autolearning about 38% of my nonspam as ham.

I'm using the default bayes_auto_learn_threshold_spam (12.0)

I'm also using modified bayes_auto_learn_threshold_nonspam (-0.01). I use this
coupled with a series of custom rules with tiny negative scores (all  -0.1).
This makes nonspam learning something that has to be minimally earned, not just
granted by virtue of a low score.




Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 05:47:36PM -0500, Matt Kettler wrote:
 Jim C. Nasby wrote:
  On Tue, Feb 07, 2006 at 05:02:25PM -0500, Matt Kettler wrote:
  Jim C. Nasby wrote:
  On Tue, Feb 07, 2006 at 04:40:40PM -0500, Matt Kettler wrote:
  I would also check to make sure you don't have a lot of spam coming in 
  that's
  getting autolearned as ham. (note: the learner's idea of score is very 
  different
  than the final message score, so a message CAN be tagged as spam, and 
  still get
  autolearned as ham)
   
  What would be the easiest way to do that? Grep through my caughtspam
  maildir?
  That would be the way I'd check.. grep for autolearn=ham
   
  Nothing autolearned. 
 
 Nothing autolearned at all? or nothing autolearned as ham?
 
 Are there any autolearn strings? Are they all autolearn=no? are there any
 decent number that are autolearn=failed or autolearn=disabled?
 

grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
1545 no
 140 spam
   4 unavailable

-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 06:17:20PM -0500, Matt Kettler wrote:
 Jim C. Nasby wrote:
  Are there any autolearn strings? Are they all autolearn=no? are there any
  decent number that are autolearn=failed or autolearn=disabled?
 
  
  grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
  's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
  1545 no
   140 spam
 4 unavailable
 
 Fair enough, that at least suggests that the autolearner is working. However,
 that learning ratio is pretty low.
 
 Are you using network tests? Without DNSBLs it's often hard to get enough 
 header
 points to cause spam learning..

I believe so...

grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
# loadplugin Mail::SpamAssassin::Plugin::RelayCountry
loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
loadplugin Mail::SpamAssassin::Plugin::Hashcash
loadplugin Mail::SpamAssassin::Plugin::SPF

grep -v # ~/.spamassassin/user_prefs | grep -v whitelist
bayes_auto_learn 1
bayes_auto_learn_threshold_spam 5.0


This is basically a stock FreeBSD install from ports, if you're
familiar...
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread mike

Probably would work if you were running Linux.

Jim C. Nasby wrote:


On Tue, Feb 07, 2006 at 05:47:36PM -0500, Matt Kettler wrote:
 


Chupacabra


Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 Are you using network tests? Without DNSBLs it's often hard to get enough 
 header
 points to cause spam learning..
 
 I believe so...
 
 grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
 # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 loadplugin Mail::SpamAssassin::Plugin::Hashcash
 loadplugin Mail::SpamAssassin::Plugin::SPF
 

None of that will tell you if DNSBLs are enabled.. The DNSBLs aren't a plugin,
they're a built-in that auto-enables itself in you have perl's Net::DNS
installed. Try running spamassassin --lint -D and look for these lines:

[18000] dbg: dns: is Net::DNS::Resolver available? yes
[18000] dbg: dns: Net::DNS version: 0.48

 This is basically a stock FreeBSD install from ports, if you're
 familiar...

Nope. I personally dislike distro packages and ports of any sort for tools that
are rapidly updated.


Re: spam still isn't being caught much.

2006-02-07 Thread Michael W Cocke
On Tue, 07 Feb 2006 18:58:08 +0100, you wrote:



I know that SuSE had -L as default at one point in time. Just remove the 
'-L' part.

It still does as of 10.0, Bog knows why.  And as of 9.3 it was a an
incredibly poor idea to allow YaST to update SpamAssassin.

Mike-

(Amavisd-new, f-prot, clam, SpamAssassin, postfix, and now
Snort-Inline.  The next step is to unplug the network.)

--
If you're not confused, you're not trying hard enough.
--
Please note - Due to the intense volume of spam, we have installed 
site-wide spam filters at catherders.com.  If email from you bounces,
try non-HTML, non-encoded, non-attachments,



Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 05:36:56PM -0600, Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 06:17:20PM -0500, Matt Kettler wrote:
  Jim C. Nasby wrote:
   Are there any autolearn strings? Are they all autolearn=no? are there 
   any
   decent number that are autolearn=failed or autolearn=disabled?
  
   
   grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
   's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
   1545 no
140 spam
  4 unavailable
  
  Fair enough, that at least suggests that the autolearner is working. 
  However,
  that learning ratio is pretty low.
  
  Are you using network tests? Without DNSBLs it's often hard to get enough 
  header
  points to cause spam learning..
 
 I believe so...
 
 grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
 # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 loadplugin Mail::SpamAssassin::Plugin::Hashcash
 loadplugin Mail::SpamAssassin::Plugin::SPF
 
 grep -v # ~/.spamassassin/user_prefs | grep -v whitelist
 bayes_auto_learn 1
 bayes_auto_learn_threshold_spam 5.0

Hmm... here's something interesting...

grep -r autolearn pgsql/ | grep -v 'Binary file' | sed -e
's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c
2010 ham
 198 no
  17 unavailable

So a big chunk of [EMAIL PROTECTED] email is being learned as ham.
Looking further, I see...

X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham
version=3.1.0

ISTM that having the thresholds setup so that BAYES_00 scores low enough
to autolearn is a BadThing, as it creates a positive feedback loop. :)
I've added bayes_auto_learn_threshold_nonspam -2.6 to my personal
config; we'll see if that helps.
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 06:47:06PM -0500, Matt Kettler wrote:
 Jim C. Nasby wrote:
  Are you using network tests? Without DNSBLs it's often hard to get enough 
  header
  points to cause spam learning..
  
  I believe so...
  
  grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
  # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
  loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
  loadplugin Mail::SpamAssassin::Plugin::Hashcash
  loadplugin Mail::SpamAssassin::Plugin::SPF
  
 
 None of that will tell you if DNSBLs are enabled.. The DNSBLs aren't a plugin,
 they're a built-in that auto-enables itself in you have perl's Net::DNS
 installed. Try running spamassassin --lint -D and look for these lines:
 
 [18000] dbg: dns: is Net::DNS::Resolver available? yes
 [18000] dbg: dns: Net::DNS version: 0.48

spamassassin --lint -D | grep Net::DNS | grep -i version
[50306] dbg: dns: Net::DNS version: 0.55
[50306] dbg: diag: module installed: Net::DNS, version 0.55

-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 05:45:54PM -0600, mike wrote:
 Probably would work if you were running Linux.

The problem isn't that it isn't working, the problem is that it's
working too well. I guess maybe that's something you're not used to. :P
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: Spamassassin Learn

2006-02-07 Thread mike

Jim C. Nasby wrote:


On Tue, Feb 07, 2006 at 05:45:54PM -0600, mike wrote:
 


Probably would work if you were running Linux.
   



The problem isn't that it isn't working, the problem is that it's
working too well. I guess maybe that's something you're not used to. :P
 



Something tells me if that were true you would not be in here asking 
questions but demoing howtos 


IE how to make SA work too well.  Whatever that is supposed to mean.



Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 05:36:56PM -0600, Jim C. Nasby wrote:
   
 On Tue, Feb 07, 2006 at 06:17:20PM -0500, Matt Kettler wrote:
 
 Jim C. Nasby wrote:
   
 Are there any autolearn strings? Are they all autolearn=no? are there 
 any
 decent number that are autolearn=failed or autolearn=disabled?

   
 grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
 's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
 1545 no
  140 spam
4 unavailable
 
 Fair enough, that at least suggests that the autolearner is working. 
 However,
 that learning ratio is pretty low.

 Are you using network tests? Without DNSBLs it's often hard to get enough 
 header
 points to cause spam learning..
   
 I believe so...

 grep loadplugin /usr/local/etc/mail/spamassassin/init.pre
 # loadplugin Mail::SpamAssassin::Plugin::RelayCountry
 loadplugin Mail::SpamAssassin::Plugin::URIDNSBL
 loadplugin Mail::SpamAssassin::Plugin::Hashcash
 loadplugin Mail::SpamAssassin::Plugin::SPF

 grep -v # ~/.spamassassin/user_prefs | grep -v whitelist
 bayes_auto_learn 1
 bayes_auto_learn_threshold_spam 5.0
 

 Hmm... here's something interesting...

 grep -r autolearn pgsql/ | grep -v 'Binary file' | sed -e
 's/.*autolearn=\([^ ]*\).*/\1/' | sort | uniq -c
 2010 ham
  198 no
   17 unavailable

 So a big chunk of [EMAIL PROTECTED] email is being learned as ham.
 Looking further, I see...

 X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00 autolearn=ham
 version=3.1.0

 ISTM that having the thresholds setup so that BAYES_00 scores low enough
 to autolearn is a BadThing, as it creates a positive feedback loop. :)
 I've added bayes_auto_learn_threshold_nonspam -2.6 to my personal
 config; we'll see if that helps.
   

Jim,

Bayes is NOT used when calculating autolearning score, that would
promote self feedbac. As I said before, the autolearner's concept of
score is VERY different from the final message score. Score
contributions from bayes, white/blacklists, and the AWL are all ignored
by the autolearner. It also looks up the individual rule scores from set
0 or 1 instead of 2 or 3. This is a MASSIVE difference.


However, the default autolearn threshold is 0.1. That's a POSITIVE
threshold. To the autolearner that message scored 0 points. 0 is less
than 0.1, so it learned as HAM.

I'd suggest re-adjusting your threshold, as a default spamassasin config
will only VERY rarely generate a negative score to the autolearner. The
only rules that can do it are bondedsender, habeas COI/SOI and hashcash.
Hashcash is so rare it may as well not exist at present. BondedSender
and Habeas are only use by large legitamate mailers, so none of your
person-to-person mail will ever get autolearned in your current setup
unless you know someone who uses hashcash.




Re: Spamassassin Learn

2006-02-07 Thread jdow

From: Matt Kettler [EMAIL PROTECTED]


Jim C. Nasby wrote:

Are there any autolearn strings? Are they all autolearn=no? are there any
decent number that are autolearn=failed or autolearn=disabled?



grep -r autolearn caughtspam/ | grep -v 'Binary file' | sed -e
's/.*autolearn=\([^ ]*\).*/\1/'|sort|uniq -c
1545 no
 140 spam
   4 unavailable


Fair enough, that at least suggests that the autolearner is working. However,
that learning ratio is pretty low.

Are you using network tests? Without DNSBLs it's often hard to get enough header
points to cause spam learning..

(Note I use mailscanner, hence the odd log syntax)

grep is spam, /var/log/maillog |wc -l
  3434
grep is spam, /var/log/maillog|grep autolearn=spam |wc -l
  2766
grep is spam, /var/log/maillog|grep autolearn=not spam | wc -l
 0

So I'm autolearning about 80% of my tagged spam as spam, and none as ham.

I'm also autolearning about 38% of my nonspam as ham.

I'm using the default bayes_auto_learn_threshold_spam (12.0)

I'm also using modified bayes_auto_learn_threshold_nonspam (-0.01). I use this
coupled with a series of custom rules with tiny negative scores (all  -0.1).
This makes nonspam learning something that has to be minimally earned, not just
granted by virtue of a low score.


I wonder if he has greylisting turned on.
{^_^}


Mail::DomainKeys 0.80: Known bad with SA 3.1.0?

2006-02-07 Thread Larry Rosenman
I have run into an issue, that I think is SA's.

If I have Mail::DomainKeys 0.80 installed, SA's DomainKeys plugin can't find
Method 'header'.

Is this known?

Is a fix/patch available?

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 512-248-2683 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3683 US



RE: Mail::DomainKeys 0.80: Known bad with SA 3.1.0?

2006-02-07 Thread Matthew.van.Eerde
Larry Rosenman wrote:
 I have run into an issue, that I think is SA's.
 
 If I have Mail::DomainKeys 0.80 installed, SA's DomainKeys plugin
 can't find Method 'header'.
  
 Is this known?

Yup.

 Is a fix/patch available?

Yup.

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4623

(bug, patch attached)

-- 
Matthew.van.Eerde (at) hbinc.com   805.964.4554 x902
Hispanic Business Inc./HireDiversity.com   Software Engineer


Re: Spamassassin Learn

2006-02-07 Thread Jim C. Nasby
On Tue, Feb 07, 2006 at 07:59:37PM -0500, Matt Kettler wrote:
 Jim,
 
 Bayes is NOT used when calculating autolearning score, that would
 promote self feedbac. As I said before, the autolearner's concept of
 score is VERY different from the final message score. Score
 contributions from bayes, white/blacklists, and the AWL are all ignored
 by the autolearner. It also looks up the individual rule scores from set
 0 or 1 instead of 2 or 3. This is a MASSIVE difference.
 
 
 However, the default autolearn threshold is 0.1. That's a POSITIVE
 threshold. To the autolearner that message scored 0 points. 0 is less
 than 0.1, so it learned as HAM.
 
 I'd suggest re-adjusting your threshold, as a default spamassasin config
 will only VERY rarely generate a negative score to the autolearner. The
 only rules that can do it are bondedsender, habeas COI/SOI and hashcash.
 Hashcash is so rare it may as well not exist at present. BondedSender
 and Habeas are only use by large legitamate mailers, so none of your
 person-to-person mail will ever get autolearned in your current setup
 unless you know someone who uses hashcash.

Ahh, got it. Makes much more sense. :)

So I guess either 0 or -0.1 makes the most sense?
-- 
Jim C. Nasby, Database Architect[EMAIL PROTECTED] 
Give your computer some brain candy! www.distributed.net Team #1828

Windows: Where do you want to go today?
Linux: Where do you want to go tomorrow?
FreeBSD: Are you guys coming, or what?


Re: spam still isn't being caught much.

2006-02-07 Thread Daryl C. W. O'Shea

Michael W Cocke wrote:

On Tue, 07 Feb 2006 18:58:08 +0100, you wrote:
I know that SuSE had -L as default at one point in time. Just remove the 
'-L' part.


It still does as of 10.0, Bog knows why.  And as of 9.3 it was a an
incredibly poor idea to allow YaST to update SpamAssassin.


I know RedHat's policy is to not have packages query outside services by 
default.  I'd imagine Novell has a similar policy.


Daryl



Re: message sneaking past

2006-02-07 Thread Julian Underwood
Hey guys, thanks for your replies, it's appreciated.

On Tue, 2006-02-07 at 10:44 -0800, Loren Wilton wrote:
  
 
 I cna't seem to connect to your site, so I'll just assume that is a standard
 vertical drug spam.
 

Yep.  I've been getting weird Horizontal spams too which are slipping
by.

To answer Evan and Matt's question, I use MIMEDefang to send spams to
the spam box.  Again, most spam is tagged correctly and moved
accordingly.

 Or are you saying that when one of these puppies gets through and you go
 back later and test it it gets a very high score?

These spams do not get marked as spam, they are treated as if they are
regular e-mails (ham), despite the fact that when I check them later,
they get an /extremely/ high score.

 
  The only reason I can think that they may not be getting sent to our
  spam box is either SURBL scores aren't registering or somehow these
  types of messages can get around spamassassin... Could anyone shed some
  light on why these types of messages are getting by?
 
 The answer could be both.
 
 If you don't have sare_specific.cf (I believe it is) then these Leo drug
 spams will sail right past the SA standard rules.  Even with the sare rules
 it is a bit of a fight; Leo is pretty good about updating the format pretty
 frequently.

 Here's another example message:

http://168.100.199.67/message2.txt



 
 As for SURBL, it will certainly catch these - IF you aren't one of the first
 lucky winners that gets the initial batch before they can show up in SURBL.
 I suspect this is probably what is happening when you say they have a high
 score but sneak past.  They probably had a low score when they first showed
 up, and only have a high score now that you run it through by hand some
 hours (or even minutes) later.

Hmm... I don't feel so lucky. ;-)

I think the problem is SURBL points aren't being tallied or even
calculated when a spam first comes in, therefore these messages don't
get tagged.  I tested it by sending a URL to my organization which
grossly triggers SURBL, yet it goes through not being tagged as spam.

Any thoughts on how I could troubleshoot this?  And perhaps rectify it?
Maybe some log I could view?  The annoying thing is if I check a message
manually with spamassassin at the command-line, it calculates the points
correctly.


Thoughts/Suggestions?


Julian


 
 Grab the SARE rules and most of these will get caught I suspect.  However,
 if you are somehow unlucky enough to be on the leading edge of most batches,
 you will probably always have some leaking through until SURBL can catch up.
 
 Loren



Re: Spamassassin Learn

2006-02-07 Thread Matt Kettler
Jim C. Nasby wrote:
 On Tue, Feb 07, 2006 at 07:59:37PM -0500, Matt Kettler wrote:
   
 Jim,

 Bayes is NOT used when calculating autolearning score, that would
 promote self feedbac. As I said before, the autolearner's concept of
 score is VERY different from the final message score. Score
 contributions from bayes, white/blacklists, and the AWL are all ignored
 by the autolearner. It also looks up the individual rule scores from set
 0 or 1 instead of 2 or 3. This is a MASSIVE difference.


 However, the default autolearn threshold is 0.1. That's a POSITIVE
 threshold. To the autolearner that message scored 0 points. 0 is less
 than 0.1, so it learned as HAM.

 I'd suggest re-adjusting your threshold, as a default spamassasin config
 will only VERY rarely generate a negative score to the autolearner. The
 only rules that can do it are bondedsender, habeas COI/SOI and hashcash.
 Hashcash is so rare it may as well not exist at present. BondedSender
 and Habeas are only use by large legitamate mailers, so none of your
 person-to-person mail will ever get autolearned in your current setup
 unless you know someone who uses hashcash.
 

 Ahh, got it. Makes much more sense. :)

 So I guess either 0 or -0.1 makes the most sense?
   
0 makes the most sense, unless you add on negative-scoring rules.  With
a default SA there's really no difference in autolearning threshold
between -1.3 and -0.1, and very little difference between -0.001 and -100.0.

Ignoring hashcash due to it's rarity, and bayes, the AWL, and all
whitelists can't count so they are omitted:

There are 0 rules in SA that can get you a learning score at or below -8.001
There are only 3 rules in SA that can get you a learning score at or
below -2.3
There are only 7 rules in SA that can get you a learning score at or
below -0.1.
There are only 12 rules in SA that can get you a learning score at or
below -0.001.

The differences between the 4 cases is more-or less moot. You won't
learn much ham at all.

Even if you consider hashcash, that's only another 5 rules, and only
applies when senders realize what hashcash even is.

 I run my boxes with -0.01 as a threshold, but I've added on about 30
simple body-text rules looking for industry terminology for my
company's business and assigning -0.02 scores to them. This way I
autolearn any business-related mail without any real chance of a spammer
abusing them to whitelist himself. Even if a spam every single one of my
rules, it would only knock 0.6 points off the spam score.

For reference, these are the only rules in a stock  SA 3.1.0 that can
give you a negative learning score:

score HABEAS_ACCREDITED_COI 0 -8.0 0 -8.0
score RCVD_IN_BSP_TRUSTED 0 -4.3 0 -4.3
score HABEAS_ACCREDITED_SOI 0 -4.3 0 -4.3
score ALL_TRUSTED -1.360 -1.440 -1.665 -1.800
score RCVD_IN_IADB_VOUCHED 0 -1.825 0 -2.200
score HABEAS_CHECKED 0 -0.2 0 -0.2
score RCVD_IN_BSP_OTHER 0 -0.1 0 -0.1
score NO_RELAYS -0.001
score NO_RECEIVED -0.001
score DK_VERIFIED -0.001
score SPF_PASS -0.001
score SPF_HELO_PASS -0.001

score HASHCASH_20 -0.500
score HASHCASH_21 -0.700
score HASHCASH_22 -1.000
score HASHCASH_23 -2.000
score HASHCASH_24 -3.000
score HASHCASH_25 -4.000
score HASHCASH_HIGH -5.000





Re: message sneaking past

2006-02-07 Thread Daryl C. W. O'Shea

Julian Underwood wrote:


 Here's another example message:

http://168.100.199.67/message2.txt


This message contains both SURBL hits, a correct score total, and 
appropriate mark up.



I think the problem is SURBL points aren't being tallied or even
calculated when a spam first comes in, therefore these messages don't
get tagged.  I tested it by sending a URL to my organization which
grossly triggers SURBL, yet it goes through not being tagged as spam.

Any thoughts on how I could troubleshoot this?  And perhaps rectify it?
Maybe some log I could view?  The annoying thing is if I check a message
manually with spamassassin at the command-line, it calculates the points
correctly.


Sample messages that show it working, quite obviously, won't help to 
resolve your problem of it not working.


If you want to provide a sample, provide a copy of a mail that didn't 
work, that is, one that shows the problem.



In any case, I'll just guess.  Remove the -L from the SPAMDOPTIONS in 
/etc/sysconfig/spamassassin.  Restart spamd, and send me some food. :)



Daryl



Re: message sneaking past

2006-02-07 Thread jdow

From: Julian Underwood [EMAIL PROTECTED]


http://168.100.199.67/message.txt

Sorry if this is the third time I've posted this, I've been having
problems posting.  Anyhow, I've been receiving the above type of emails
(vertical drug advertisements) and they appear to receive a very high
score.  However they always seem to get past spamassassin--other spams
get tagged and redirected to our spam box fine.

The only reason I can think that they may not be getting sent to our
spam box is either SURBL scores aren't registering or somehow these
types of messages can get around spamassassin... Could anyone shed some
light on why these types of messages are getting by?

Thanks a lot,

Julian


1) You might unblock the address block from which this message comes.
  It is now a legally allocated block of dynamic addresses for the
  Ontario California area.

2) It obviously scores as spam. Are you expecting SpamAssassin to do
  more than score it? That is ALL that SpamAssassin ever does. Once
  SpamAssassin is done with it something else, procmail in my case,
  would be used to divert the mail if I did such a dastardly thing.
  (I sort it into a spam folder in my MUA, instead.)

3) Given the headers you saw fit to vouchsafe to our tender scrutiny
  there is no problem. Perhaps you are feeding it through SpamAssassin
  twice and we're only seeing here the results of one of the transits?

{^_^}


Re: message sneaking past

2006-02-07 Thread jdow

From: Julian Underwood [EMAIL PROTECTED]


Hey guys, thanks for your replies, it's appreciated.

On Tue, 2006-02-07 at 10:44 -0800, Loren Wilton wrote:
 


I cna't seem to connect to your site, so I'll just assume that is a standard
vertical drug spam.



Yep.  I've been getting weird Horizontal spams too which are slipping
by.

To answer Evan and Matt's question, I use MIMEDefang to send spams to
the spam box.  Again, most spam is tagged correctly and moved
accordingly.


Or are you saying that when one of these puppies gets through and you go
back later and test it it gets a very high score?


These spams do not get marked as spam, they are treated as if they are
regular e-mails (ham), despite the fact that when I check them later,
they get an /extremely/ high score.


They very definately get marked as spam. Something in your MIMEDefang
configuration is hosed. SpamAssassin can't fix that.


 The only reason I can think that they may not be getting sent to our
 spam box is either SURBL scores aren't registering or somehow these
 types of messages can get around spamassassin... Could anyone shed some
 light on why these types of messages are getting by?

The answer could be both.

If you don't have sare_specific.cf (I believe it is) then these Leo drug
spams will sail right past the SA standard rules.  Even with the sare rules
it is a bit of a fight; Leo is pretty good about updating the format pretty
frequently.


Here's another example message:

http://168.100.199.67/message2.txt


When you have your web access blocked to legal address blocks it is hard
for some of us to help you. I have to use an alternate address I can reach
and wget it from a shell. The piece of the message posted indicates it
was marked up. Your MIMEDefang setup is screwing you.


As for SURBL, it will certainly catch these - IF you aren't one of the first
lucky winners that gets the initial batch before they can show up in SURBL.
I suspect this is probably what is happening when you say they have a high
score but sneak past.  They probably had a low score when they first showed
up, and only have a high score now that you run it through by hand some
hours (or even minutes) later.


Hmm... I don't feel so lucky. ;-)

I think the problem is SURBL points aren't being tallied or even
calculated when a spam first comes in, therefore these messages don't
get tagged.  I tested it by sending a URL to my organization which
grossly triggers SURBL, yet it goes through not being tagged as spam.

Any thoughts on how I could troubleshoot this?  And perhaps rectify it?
Maybe some log I could view?  The annoying thing is if I check a message
manually with spamassassin at the command-line, it calculates the points
correctly.


The SURBL points are being tabulated. What do you think these lines mean?
===8---
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.0 (2005-09-13) on fedora.LTP.LOCAL
X-Spam-Level: **
X-Spam-Status: Yes, score=18.4 required=4.0 tests=BAYES_95,NO_RECEIVED,
   NO_RELAYS,URIBL_AB_SURBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SC_SURB
   autolearn=no version=3.1.0
X-Spam-Report:
   * -0.0 NO_RELAYS Informational: message was not relayed via SMTP
   *  3.0 BAYES_95 BODY: Bayesian spam probability is 95 to 99%
   *  [score: 0.9881]
   *  3.8 URIBL_AB_SURBL Contains an URL listed in the AB SURBL blockli
   *  [URIs: asbiltasa.com-M]
   *  4.1 URIBL_JP_SURBL Contains an URL listed in the JP SURBL blockli
   *  [URIs: asbiltasa.com-M]
   *  3.0 URIBL_OB_SURBL Contains an URL listed in the OB SURBL blockli
   *  [URIs: asbiltasa.com-M]
   *  4.5 URIBL_SC_SURBL Contains an URL listed in the SC SURBL blockli
   *  [URIs: asbiltasa.com-M]
   * -0.0 NO_RECEIVED Informational: message has no Received headers
===8---
It is marked as spam four different ways. MIMEDefang is not properly
picking it up. Take the problem up with the MIMEDefang folks. Your
MIMEDefang setup may not even be useing the same configuration as this
test message used. Without the complete message it is impossible to tell.
(Note that the linux mail program does not show you the complete
message. You must look at the raw mail file and clip from that.)

{^_^}


Re: message sneaking past

2006-02-07 Thread Daryl C. W. O'Shea

jdow wrote:

From: Julian Underwood [EMAIL PROTECTED]

To answer Evan and Matt's question, I use MIMEDefang to send spams to
the spam box.  Again, most spam is tagged correctly and moved
accordingly.


Or are you saying that when one of these puppies gets through and you go
back later and test it it gets a very high score?


These spams do not get marked as spam, they are treated as if they are
regular e-mails (ham), despite the fact that when I check them later,
they get an /extremely/ high score.


They very definately get marked as spam. Something in your MIMEDefang
configuration is hosed. SpamAssassin can't fix that.


I missed that you're using MIMEDefang.  You must set $SALocalTestsOnly 
to zero in your MIMEDefang configuration.


Daryl



Re: message sneaking past SOLVED

2006-02-07 Thread Julian Underwood
On Tue, 2006-02-07 at 22:51 -0500, Daryl C. W. O'Shea wrote:
 jdow wrote:
  From: Julian Underwood [EMAIL PROTECTED]
  To answer Evan and Matt's question, I use MIMEDefang to send spams to
  the spam box.  Again, most spam is tagged correctly and moved
  accordingly.
 
  Or are you saying that when one of these puppies gets through and you go
  back later and test it it gets a very high score?
 
  These spams do not get marked as spam, they are treated as if they are
  regular e-mails (ham), despite the fact that when I check them later,
  they get an /extremely/ high score.
  
  They very definately get marked as spam. Something in your MIMEDefang
  configuration is hosed. SpamAssassin can't fix that.
 
 I missed that you're using MIMEDefang.  You must set $SALocalTestsOnly 
 to zero in your MIMEDefang configuration.
 
 Daryl

Daryl,

That was it!  Thanks a bunch for everyone helping, jdow, etc.

Julian




Re: Spamassassin Learn

2006-02-07 Thread Gene Heskett
On Tuesday 07 February 2006 15:27, Clay Davis wrote:
Does anyone have any good techniques for capturing a sample of ham
 that can be used as the ham corpus.  I'm in a corporate environment
 and am not keen on the idea of intercepting non-spam messages.  I
 will if I have to, but was hoping someone had a better idea.

I wouldn't have too guilty a consience(sp?) on that subject because 
generally, you won't be reading very much other than intercepted spam.  
There may be an FP in there occasionally, but you'll soon learn to 
catch those and feed them to the ham learner  hence move them to the 
correct mailbox folder.  In other words, to make an omelete, you 
normally have to break a few eggs.  What you accidently read in an FP 
should be treated with the usual amount of salt and otherwise 
forgotten.

Regards,
Clay

 On 2/7/2006 at 3:16 pm, in message [EMAIL PROTECTED],
 Matt Kettler

[EMAIL PROTECTED] wrote:
 [EMAIL PROTECTED] wrote:
 Can you just feed spamassassin spam or do you need to give it ham
 also?

 I read the docs and it didn't say you had to feed it ham.

 I then read another doc and it said you should feed it equal
 amounts of spam and ham.

 Yes, you really should feed it both. You also should strive for a
 1:1 ratio of
 spam and nonspam, but don't kill yourself to get there.

 SA's use of chi-squared combining makes it very tolerant of wild
 imbalances in
 training. However, the closer you are to a 1:1 ratio the better SA
 will be able
 to distinguish tokens that are present in both kinds of mail and
 ignore them. So
 this is a worthwhile goal to strive for as long as it doesn't become
 a burden.

 My current training ratio is about 7:1 spam:nonspam, but in the past
 it's been
 as bad as 20:1. Both of those are very far off from equal amounts,
 but the imbalance has never caused me any problems.

 From my sa-learn --dump magic output as of today:
 0.000  0 995764  0  non-token data: nspam
 0.000  0 145377  0  non-token data: nham

 That works out to a ratio of 6.85:1

-- 
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules.  I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


getmail?

2006-02-07 Thread Gene Heskett
Greetings all;

I just stumbled over an announcement on freshmeat about getmail as a 
substitute for fetchmail, but from looking at the web page  FAQ, its 
not clear if getmail can both filter by passing the incoming mail thru 
SA, and put it in the /var/spool/user mailfile format that kmail 
expects to retrieve incoming messages from.  I'd expect it needs a 
scratchfile someplace in order to do the pipeing thru spamc, but its 
not at all clear from the web page.

Its my intention of moveing the spamc function from being used as a 
filter via a pipe from one of kmails filter settings, to a function of 
getmail which would be asynch from kmail, and possibly making kmail 
many times more responsive is as its UI is locked for many seconds at a 
time while this filter is waiting for spamc to finish.  If getmail can 
run in the background like fetchmail is now I hopefully wouldn't have 
the lags slap me in the face.  As is kmail can hang long enough for me 
to type a line and a half, and while thats disconcerting, the slow 
cursor movements when attempting to go back and fix a typo are 7000% 
exasperating.

If this transfer of jobs can be done, then kmails filters would be 
reduced to just sorting the mail to the right folder, something I think 
it can do many times faster than it is now since its always waiting on 
spamc.

Does anyone here have any experience with previous versions of this 
utility?  And if so, any hints to toss my way?

-- 
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules.  I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.


Re: getmail?

2006-02-07 Thread Craig White
On Wed, 2006-02-08 at 01:10 -0500, Gene Heskett wrote:
 Greetings all;
 
 I just stumbled over an announcement on freshmeat about getmail as a 
 substitute for fetchmail, but from looking at the web page  FAQ, its 
 not clear if getmail can both filter by passing the incoming mail thru 
 SA, and put it in the /var/spool/user mailfile format that kmail 
 expects to retrieve incoming messages from.  I'd expect it needs a 
 scratchfile someplace in order to do the pipeing thru spamc, but its 
 not at all clear from the web page.
 
 Its my intention of moveing the spamc function from being used as a 
 filter via a pipe from one of kmails filter settings, to a function of 
 getmail which would be asynch from kmail, and possibly making kmail 
 many times more responsive is as its UI is locked for many seconds at a 
 time while this filter is waiting for spamc to finish.  If getmail can 
 run in the background like fetchmail is now I hopefully wouldn't have 
 the lags slap me in the face.  As is kmail can hang long enough for me 
 to type a line and a half, and while thats disconcerting, the slow 
 cursor movements when attempting to go back and fix a typo are 7000% 
 exasperating.
 
 If this transfer of jobs can be done, then kmails filters would be 
 reduced to just sorting the mail to the right folder, something I think 
 it can do many times faster than it is now since its always waiting on 
 spamc.
 
 Does anyone here have any experience with previous versions of this 
 utility?  And if so, any hints to toss my way?

personally, I think you should handle your email entirely differently. I
have gathered that you use a number of computers around your house and
thus, the logical method that I see is to set one computer up - not your
primary desktop...as a mail server and have an MTA (sendmail or postfix)
and a IMAP server (dovecot or cyrus) and run fetchmail and spamassassin
on that system too. Mail would get retrieved by this system on very
frequent intervals like you are doing now, it would be analyzed by SA
and delivered to your IMAP spool or maildir or cyrus mailstore.

This would allow a lot more flexibility...

You could then use any computer to read/respond to mail and with IMAP,
if you have read it/deleted it on one system, all mail clients on any
computer would likewise see the same.

Computer sluggishness from things like SA would not be apparent as they
aren't occurring on your desktop system.

Ultimately, using fetchmail or getmail or whatever mail retrieval tool
you use isn't likely to make much of a difference...segregate your
services and don't force your desktop computer to do everything.

This really has little to do with spamassassin so I hesitate to go on.

Craig



Re: getmail?

2006-02-07 Thread Gene Heskett
On Wednesday 08 February 2006 01:33, Craig White wrote:
On Wed, 2006-02-08 at 01:10 -0500, Gene Heskett wrote:
 Greetings all;

 Does anyone here have any experience with previous versions of this
 utility?  And if so, any hints to toss my way?


personally, I think you should handle your email entirely differently.
 I have gathered that you use a number of computers around your house
 and thus, the logical method that I see is to set one computer up -
 not your primary desktop...as a mail server and have an MTA (sendmail
 or postfix) and a IMAP server (dovecot or cyrus) and run fetchmail
 and spamassassin on that system too. Mail would get retrieved by this
 system on very frequent intervals like you are doing now, it would be
 analyzed by SA and delivered to your IMAP spool or maildir or cyrus
 mailstore.

A good idea, if the firewall box had the cojones to do it.  Its a 500mhz 
k6-III, and already running the firewall duties fairly fast, but adding 
that load to it does concern me somewhat.  As it is, its job is fairly 
well defined and easy to maintain.  My contention is that things would 
be improved immensely if the SA functions were removed from the 
threadless execution kmail does.  Stuff that SA does are totally 
charged aganst kmails time at the cpu trough, and moving it to a 
background task seems like a desirable thing.

Thats not saying your idea is not a good one, it is, but that now 8 year 
old box would need a heart tansplant to do it all I fear.

This would allow a lot more flexibility...

You could then use any computer to read/respond to mail and with IMAP,
if you have read it/deleted it on one system, all mail clients on any
computer would likewise see the same.

That would be an advantage in that I could then do email from the shop 
box, probably while emc is carving parts on my milling machine.  As it 
is, I spend entirely too many hours in this chair when I could be up 
doing other things.

I'll think about it, maybe even do it.  If the box isn't up to it, thats 
a good excuse to upgrade it, right? :)

Computer sluggishness from things like SA would not be apparent as
 they aren't occurring on your desktop system.

Ultimately, using fetchmail or getmail or whatever mail retrieval tool
you use isn't likely to make much of a difference...segregate your
services and don't force your desktop computer to do everything.

This really has little to do with spamassassin so I hesitate to go on.

Well, SA is the main cause of the lag, so its almost germain. :)

Craig

-- 
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules.  I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.