test my bleeding edge broken code. with your finger!

2006-08-24 Thread Faisal N Jawdat
two bits of sa related code i've written, neither of them are what  
i'd particularly call "polished", but if you feel like firing them  
up, i'd love to hear your feedback:


Phisher:
http://www.faisal.com/software/phisher/
This is a plugin that does nothing more complicated than check for  
the case of something like http://scam.ru";>www.paypal.coma>.  I've run it on and off since August of last year, although most  
of the time was not after 3.1.1 (which is why I only claim it works  
on 3.1).  I don't have a suggested score for it (would love feedback  
there).  I ran it at .1 mostly to see how much it triggered and fp'd  
(not much, as it turns out.  I know this has been a problem in the  
past, so I'm wondering if the normalization code helps there, or I've  
just been lucky).  As noted, this has some rewrite bits coming when I  
get some time.



sa-harvest:
http://www.faisal.com/software/sa-harvest/
This is a script that does several obvious things and one possibly  
not-so-obvious thing:


- You configure it, telling it what your spam and ham folders are,  
and after that it will automatically train whenever you invoke it,  
without having to explicitly configure folders to scan (I find this  
useful for cron jobs, and less typing when I'm doing the same obvious  
thing every couple days).
- It also scans your ham boxes and automatically rebuilds your  
whitelist based on the contents of presumed food folders (this will  
mangle your user_prefs.  READ THE DOCS ON HOW THIS WORKS SO YOU DON'T  
LOSE OTHER SETTINGS.)


I've been using variants of this script since about a week after the  
first SA with training came out.  I finally generalized it a little  
last month, and have been running it nightly via cron ever since.



Feedback would be greatly appreciated.

-faisal



Re: Training Spamassassin

2006-08-24 Thread Edward Diener

John D. Hardin wrote:

On Thu, 24 Aug 2006, Edward Diener wrote:


Is this true ? Am I supposed to be putting copies of messages
which Spamassassin has not marked as spam and which are not spam
into my 'ham-to-learn' folder, as opposed to messages which
Spamassassin has erroneously marked as spam ?


That is true. Think of it as training SA in what good mail looks like
rather than correcting misclassifications. Give it a good taste of
what legitimate mail looks like and it will do a good job of
recognizing it in the future.


OK, thanks, I will train it with good e-mails and I hope it will 
recognize the spam after awhile.




Re: How to whitelist_from <> ?

2006-08-24 Thread Matt Kettler
Philip Prindeville wrote:
>
> There's no way to whitelist just the empty address then?  Rather than
> everything?
>
> -Philip
>
>   
Not given the simple file-glob format of the whitelist commands. You'd
need a regular expression and negation.

You could do it with a rule...

header __NULL_RETURN   From !~   /./i
header __RCVD_MYHOST   Received =~ //
meta MY_NULL_RETURN   (__NULL_RETURN && __RCVD_MYHOST)





Re: bayes autolearn acting up

2006-08-24 Thread lists


On Aug 24, 2006, at 10:11 AM, [EMAIL PROTECTED] wrote:




Since upgrading to 3.14, when I turn on bayes auto-learn with:

bayes_auto_learn 1

and I set the learn boundaries with:

bayes_auto_learn_threshold_nonspam-3.5
bayes_auto_learn_threshold_spam   15.5

I get unexpected auto-learning.  Example:  I just saw a spam come
through that scored 9.9, which is enough for it to be tagged as  
spam,

but it should not be auto-learned as spam.  But, in the header it
clearly reads:

X-Spam-Status:
Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE 
,

MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
version=3.1.4


Any ideas?
SA does not autolearn based on the final message score. So, toss  
the 9.9

out the window. That's not the number SA compares to the 15.5.

For learning SA uses what the message score would have been if: 1)  
the
AWL is off. 2) Bayes was disabled, including shifting what  
scoreset is
used for all the other rules. 3) all white/blacklists are  
disabled. This

is often *quite* different from the final score.

However, in this case I don't entirely understand... The default  
SA 3.1

scores are:

score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
score DCC_CHECK 0 1.37 0 2.17
score DIGEST_MULTIPLE 0 0.233 0 0.765
score HTML_40_50 0.611 0 0.497 0.496
score HTML_MESSAGE 0.001
score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
score RAZOR2_CHECK 0 0.5 0 0.5
score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234

Adding the set1 scores up, the learning score should have been 4.753.

Have you modified any rule scores?


Here's another example:

X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 3.1.4 (2006-07-25) on localhost
X-Spam-Level: *
X-Spam-Status: Yes, score=6.0 required=5.0 tests=ADVANCE_FEE_1,BAYES_95,
DNS_FROM_RFC_ABUSE,DNS_FROM_RFC_POST,SPF_HELO_PASS  
autolearn=spam

version=3.1.4


I just can't see why it is autolearning everything that is tagged as  
spam.


Regards,
Devin


RE: Adding 'SA scores' to all incoming mails

2006-08-24 Thread Michael Grey

Why is it 'better' ? I didn't say it was... 

Simply one of the possible approaches to getting the full headers.

Michael Grey


-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 3:14 PM
To: Michael Grey
Cc: users@spamassassin.apache.org
Subject: RE: Adding 'SA scores' to all incoming mails

On Thu, 24 Aug 2006, Michael Grey wrote:

> In this example, all emails get an additional header :
> 
> X-Spam-score-breakdown calvin score 6.77/4.5  
> 
> "add_header all score-breakdown calvin score _HITS_/_REQD_ "

And that's better than this:

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_50,FROM_EXCESS_QP,
FROM_SUBDOMAIN,HTML_COMMENTS,HTML_EMBED_IMG_04,HTML_MESSAGE,
HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,SARE_UNSUB38D,SPF_PASS,
SUBJECT_EXCESS_QP autolearn=disabled version=3.1.3

how?

I'm not saying it shouldn't be done, but that the scores and rule hits
are *already there* so why paste them in yet again?

> On Thu, 24 Aug 2006, list wrote:
> 
> > I'd like SA to make a extra line/section under all my mails where it 
> > tells what score the mail got (or maybe even which rules scored on the 
> > mail)  is there such a setting?
> > 
> > it would help me to finetune my SA.
> 
> You mean, actually paste the score into the body or attach it as
> another MIME body part?
> 
> Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: Training Spamassassin

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, Edward Diener wrote:

> Is this true ? Am I supposed to be putting copies of messages
> which Spamassassin has not marked as spam and which are not spam
> into my 'ham-to-learn' folder, as opposed to messages which
> Spamassassin has erroneously marked as spam ?

That is true. Think of it as training SA in what good mail looks like
rather than correcting misclassifications. Give it a good taste of
what legitimate mail looks like and it will do a good job of
recognizing it in the future.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  America is an amazing country. We spend billions of dollars sending
  troops halfway around the world to evict a sadistic dictator and
  bring freedom to the people of a foreign country, while at the same
  time working to build a police state at home.
---
 26 days until Talk Like a Pirate day



Re: Filtering Aliases/Forwarders

2006-08-24 Thread jdow

Restart if you are using it daemonized one way or another.

{^_^}
- Original Message - 
From: "DuBois, Joseph" <[EMAIL PROTECTED]>



Ok I added the Score to the header, so I now have the following set of
rules(below). Also added the case insensitive expresion.

Also found out version information is, now do I have to restart anything
when I adjust rules?
spamassassin -V
SpamAssassin version 3.1.4
running on Perl version 5.8.7


body LOCAL_DEMONSTRATION_RULE   /spam/i
score LOCAL_DEMONSTRATION_RULE 6.0
describe LOCAL_DEMONSTRATION_RULE   This is a simple test rule
header LOCAL_DEMONSTRATION_SUBJECT  Subject =~ /test/i
score LOCAL_DEMONSTRATION_SUBJECT   2
required_score 5
rewrite_header subject * Rated SPAM: Junk This! _SCORE(00)_  *


Thanks again for help
Joe


-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 


nb - I typoed the score rule. Should be:
rewrite_header subject * Rated SPAM: _SCORE(00)_ *
  ^ I left that off.
sorry.

And, of course, the rule did not fire. You are testing for "spam"
and you sent "SPAM". If you want a case insensitive test then you
need:
body LOCAL_DEMONSTRATION_RULE   /spam/i
 ^ makes it case insensitive

And you tested case insensitive for "test" and sent "TEST". 
That should have fired and probably did.


{^_^}

- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>
To: 
Sent: Thursday, August 24, 2006 06:35
Subject: RE: Filtering Aliases/Forwarders


Joanne,

Thanks for info, yeah saw the variables I could make substitutions on
and will probably do that once I get it up and running so I can make
better rules, but I am just trying to get it running right now.

For my tests, I am just trying to get it to work. I was sending emails
to myself from work to home accounts with the proper stuff to set off
the rules, but for some reason it doesn't seem to be catching it(and
rewritting the subjects).

For example I sent an email with a subject TEST and in the body SPAM. I
would expect to receive the email at my home email account with the
subject line rewritten, but it's not happening. The hosting provider
won't give me any help, thus turning to the list. Once I get it working
I will starting writing more unique rules to try and filter out all the
truly junk mail I get. Verizon has already shut off my emails once,
because I have about 23 public emails accounts forward to a single
account, thus really need to try and start cleaning it up.

On the CPANEL, it says it's enabled and that's about he only status I
seem to be able to check. It does have a "configuration" button which
brings up a form, where I can see that my local rules are set (from my
user_prefs file). So it seems to be at least reading that. Not sure
which version we are running, but I sent a request into the Hosting
provider to get that information along with any other configuration
information he can give me since he will not help me.

So I can only think, that one it is not parsing Alias/forward emails? Or
something else is wrong.

Again any help in getting this running is greatly appreciated! 


Thanks all.


-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 6:49 PM

To: users@spamassassin.apache.org
Subject: Re: Filtering Aliases/Forwarders

Joseph, may I make a slight suggestion for you?

For the rewrite try something about the same size that makes eyeball
filtering ham out of the spam folder much easier:
rewrite_header subject * Rated SPAM: _SCORE(00) *

Then the header subject will start with something like this:
"* Rated SPAM: 019.8 *". It'll be followed by the original subject, of
course. You filter to spam if "* Rated SPAM:" is seen. And you can sort
by subject to bring the low scores to the top.

And for demonstration or test rules I'd use low scores unless you
specifically wanted to see a hit. Then I'd search for something
gibberish in the text. Hm, actually I wonder if "gibberish" itself would
be a safe rule for testing. It almost never appears in normal mail and
spammers USUALLY are averse to calling their mail gibberish.
So {^_-} The scores I use run for rule testing are in the 0.001 to
0.1 range. Once the look good I give them real scores.


{^_^}   Joanne
- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>


Well met,

Just activated SpamAssassin on my website (by my web hosting provider)
and wanted to do some simple tests which I read from the Wiki site and
FAQ. When it didn't run I opened a ticket with my provider and he said
he didn't support it and I needed to find help else where. So here I am.
Right now, I'm just trying some simple tests to get my
Aliases/Forwarders (which get sent through my site) and forwarded onto
my ISP providers email account.

i.e. a public email [EMAIL PROTECTED] would get forwarded onto my
local isp provider at verizon, or comcast depending on who I have for a
particular 

Re: catch-all = sa-learn folder

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, Chris Mills (Chrysalis) wrote:

> Idea, I have 100 domains on the same server, for all of which I
> had deleted the catch-all accounts. How about I recreate the catch
> all for all 100 domains, and point them all to the one single pop
> mail account and then run sa-learn on that pop account to which
> all this catch all junk malk would have been saved? Sounds like a
> good idea to me! Let me know what you all think I think this
> should help significantly with the training process.

Don't do it without manual review first.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  America is an amazing country. We spend billions of dollars sending
  troops halfway around the world to evict a sadistic dictator and
  bring freedom to the people of a foreign country, while at the same
  time working to build a police state at home.
---
 26 days until Talk Like a Pirate day



RE: Filtering Aliases/Forwarders

2006-08-24 Thread DuBois, Joseph
Ok I added the Score to the header, so I now have the following set of
rules(below). Also added the case insensitive expresion.

Also found out version information is, now do I have to restart anything
when I adjust rules?
spamassassin -V
SpamAssassin version 3.1.4
running on Perl version 5.8.7


body LOCAL_DEMONSTRATION_RULE   /spam/i
score LOCAL_DEMONSTRATION_RULE 6.0
describe LOCAL_DEMONSTRATION_RULE   This is a simple test rule
header LOCAL_DEMONSTRATION_SUBJECT  Subject =~ /test/i
score LOCAL_DEMONSTRATION_SUBJECT   2
required_score  5
rewrite_header subject * Rated SPAM: Junk This! _SCORE(00)_  *


Thanks again for help
Joe
 

-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 8:02 PM
To: users@spamassassin.apache.org
Subject: Re: Filtering Aliases/Forwarders

nb - I typoed the score rule. Should be:
rewrite_header subject * Rated SPAM: _SCORE(00)_ *
   ^ I left that off.
sorry.

And, of course, the rule did not fire. You are testing for "spam"
and you sent "SPAM". If you want a case insensitive test then you
need:
body LOCAL_DEMONSTRATION_RULE   /spam/i
  ^ makes it case insensitive

And you tested case insensitive for "test" and sent "TEST". 
That should have fired and probably did.

{^_^}

- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>
To: 
Sent: Thursday, August 24, 2006 06:35
Subject: RE: Filtering Aliases/Forwarders


Joanne,

Thanks for info, yeah saw the variables I could make substitutions on
and will probably do that once I get it up and running so I can make
better rules, but I am just trying to get it running right now.

For my tests, I am just trying to get it to work. I was sending emails
to myself from work to home accounts with the proper stuff to set off
the rules, but for some reason it doesn't seem to be catching it(and
rewritting the subjects).

For example I sent an email with a subject TEST and in the body SPAM. I
would expect to receive the email at my home email account with the
subject line rewritten, but it's not happening. The hosting provider
won't give me any help, thus turning to the list. Once I get it working
I will starting writing more unique rules to try and filter out all the
truly junk mail I get. Verizon has already shut off my emails once,
because I have about 23 public emails accounts forward to a single
account, thus really need to try and start cleaning it up.

On the CPANEL, it says it's enabled and that's about he only status I
seem to be able to check. It does have a "configuration" button which
brings up a form, where I can see that my local rules are set (from my
user_prefs file). So it seems to be at least reading that. Not sure
which version we are running, but I sent a request into the Hosting
provider to get that information along with any other configuration
information he can give me since he will not help me.

So I can only think, that one it is not parsing Alias/forward emails? Or
something else is wrong.

Again any help in getting this running is greatly appreciated! 

Thanks all.
 

-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 6:49 PM
To: users@spamassassin.apache.org
Subject: Re: Filtering Aliases/Forwarders

Joseph, may I make a slight suggestion for you?

For the rewrite try something about the same size that makes eyeball
filtering ham out of the spam folder much easier:
rewrite_header subject * Rated SPAM: _SCORE(00) *

Then the header subject will start with something like this:
"* Rated SPAM: 019.8 *". It'll be followed by the original subject, of
course. You filter to spam if "* Rated SPAM:" is seen. And you can sort
by subject to bring the low scores to the top.

And for demonstration or test rules I'd use low scores unless you
specifically wanted to see a hit. Then I'd search for something
gibberish in the text. Hm, actually I wonder if "gibberish" itself would
be a safe rule for testing. It almost never appears in normal mail and
spammers USUALLY are averse to calling their mail gibberish.
So {^_-} The scores I use run for rule testing are in the 0.001 to
0.1 range. Once the look good I give them real scores.


{^_^}   Joanne
- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>


Well met,
 
Just activated SpamAssassin on my website (by my web hosting provider)
and wanted to do some simple tests which I read from the Wiki site and
FAQ. When it didn't run I opened a ticket with my provider and he said
he didn't support it and I needed to find help else where. So here I am.
Right now, I'm just trying some simple tests to get my
Aliases/Forwarders (which get sent through my site) and forwarded onto
my ISP providers email account.
 
i.e. a public email [EMAIL PROTECTED] would get forwarded onto my
local isp provider at verizon, or comcast depending on who I have for a
particular month, so that w

RE: Filtering Aliases/Forwarders

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, DuBois, Joseph wrote:

> So I can only think, that one it is not parsing Alias/forward
> emails?

Well, check for that. Do the messages have any X-Spam-* headers that
imply SA on that machine has seen the messages? There should be
*something* there if SA processed the message, regardless of what the
message scored.

If there are no X-Spam-* headers, then SA likely is not touching the
messages.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: Filtering Aliases/Forwarders

2006-08-24 Thread jdow

nb - I typoed the score rule. Should be:
rewrite_header subject * Rated SPAM: _SCORE(00)_ *
  ^ I left that off.
sorry.

And, of course, the rule did not fire. You are testing for "spam"
and you sent "SPAM". If you want a case insensitive test then you
need:
body LOCAL_DEMONSTRATION_RULE   /spam/i
 ^ makes it case insensitive

And you tested case insensitive for "test" and sent "TEST". 
That should have fired and probably did.


{^_^}

- Original Message - 
From: "DuBois, Joseph" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, August 24, 2006 06:35
Subject: RE: Filtering Aliases/Forwarders


Joanne,

Thanks for info, yeah saw the variables I could make substitutions on
and will probably do that once I get it up and running so I can make
better rules, but I am just trying to get it running right now.

For my tests, I am just trying to get it to work. I was sending emails
to myself from work to home accounts with the proper stuff to set off
the rules, but for some reason it doesn't seem to be catching it(and
rewritting the subjects).

For example I sent an email with a subject TEST and in the body SPAM. I
would expect to receive the email at my home email account with the
subject line rewritten, but it's not happening. The hosting provider
won't give me any help, thus turning to the list. Once I get it working
I will starting writing more unique rules to try and filter out all the
truly junk mail I get. Verizon has already shut off my emails once,
because I have about 23 public emails accounts forward to a single
account, thus really need to try and start cleaning it up.

On the CPANEL, it says it's enabled and that's about he only status I
seem to be able to check. It does have a "configuration" button which
brings up a form, where I can see that my local rules are set (from my
user_prefs file). So it seems to be at least reading that. Not sure
which version we are running, but I sent a request into the Hosting
provider to get that information along with any other configuration
information he can give me since he will not help me.

So I can only think, that one it is not parsing Alias/forward emails? Or
something else is wrong.

Again any help in getting this running is greatly appreciated! 


Thanks all.


-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 6:49 PM

To: users@spamassassin.apache.org
Subject: Re: Filtering Aliases/Forwarders

Joseph, may I make a slight suggestion for you?

For the rewrite try something about the same size that makes eyeball
filtering ham out of the spam folder much easier:
rewrite_header subject * Rated SPAM: _SCORE(00) *

Then the header subject will start with something like this:
"* Rated SPAM: 019.8 *". It'll be followed by the original subject, of
course. You filter to spam if "* Rated SPAM:" is seen. And you can sort
by subject to bring the low scores to the top.

And for demonstration or test rules I'd use low scores unless you
specifically wanted to see a hit. Then I'd search for something
gibberish in the text. Hm, actually I wonder if "gibberish" itself would
be a safe rule for testing. It almost never appears in normal mail and
spammers USUALLY are averse to calling their mail gibberish.
So {^_-} The scores I use run for rule testing are in the 0.001 to
0.1 range. Once the look good I give them real scores.


{^_^}   Joanne
- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>


Well met,

Just activated SpamAssassin on my website (by my web hosting provider)
and wanted to do some simple tests which I read from the Wiki site and
FAQ. When it didn't run I opened a ticket with my provider and he said
he didn't support it and I needed to find help else where. So here I am.
Right now, I'm just trying some simple tests to get my
Aliases/Forwarders (which get sent through my site) and forwarded onto
my ISP providers email account.

i.e. a public email [EMAIL PROTECTED] would get forwarded onto my
local isp provider at verizon, or comcast depending on who I have for a
particular month, so that way I don't have to change my email every
month.

So for my test, I set up the following basic local rules in
~/.spamassassin/user_prefs file.

I assume this would take any email with the word spam in the BODY or
test in SUBJECT and rewrite the SUBJECT with the new HEADER. But for
some reason it does not appear to be working.

body LOCAL_DEMONSTRATION_RULE   /spam/
score LOCAL_DEMONSTRATION_RULE 6.0
describe LOCAL_DEMONSTRATION_RULE   This is a simple test rule
header LOCAL_DEMONSTRATION_SUBJECT  Subject =~ /\btest\b/i
score LOCAL_DEMONSTRATION_SUBJECT   2
required_score  5
rewrite_header subject * Rated SPAM: Junk This! *


Does it not work for Aliases/Forwarders? Do you have to have a special
Client? I am using BAT by RitLABs, and/or Webbrowser.

Thanks!

Joseph DuBois, Lead Application Specialist
Application Standard

Re: Do I have too many rules? [WAS: timeout help]

2006-08-24 Thread jdow

From: "Josh Trutwin" <[EMAIL PROTECTED]>


On Sat, 19 Aug 2006 18:06:59 -0400
"Daryl C. W. O'Shea" <[EMAIL PROTECTED]> wrote:


Josh Trutwin wrote:
> On Fri, 18 Aug 2006 12:16:51 -0400
> "Daryl C. W. O'Shea" <[EMAIL PROTECTED]> wrote:
> 
>> Josh Trutwin wrote:

>>> I've recently had a server experience some really slow spam
>>> processing - I'm not sure what's going on but I notice a lot
>>> of timeouts in the mail log:
>>>
>>> Aug 18 09:20:21 www spamd[27673]: timeout with empty $@
>>> at /usr/local/share/perl/5.8.4/Mail/SpamAssassin/Timeout.pm
>>> line 182,  line 1126. Aug 18 09:22:02 www spamd[27674]:
>>> timeout with empty $@
>>> Any suggestions?
>>>
>>> Debian linux - spamd 3.0.4 with pyzor/dcc/razor
>>>
>>> spamd running with:
>>>
>>> /usr/bin/spamd -d -D -q -x -H /etc/razor --max-children=12
>>> --socketpath=/var/spool/spamassassin/spamd.sock -u spamd
>> Unless you've got at least 600 or more MB of free RAM just for
>> spamd's use, you've got too many children and are swap
>> thrashing.  Back of the --max-children number.
> 
> I was getting the same results with less values - the box has 1

> GB
> - more is on the way though.  I disabled network tests with -L
> and things work great again so something along that line is the
> culprit.

Using -L processes messages faster, thus requiring less children
to handle the load.  This error is caused by a system not being
able to restore the child's config fast enough after it's done
processing a message.  It's always due to one of two things...
high load, or high load caused by swap thrashing.

If you're using a lot of add-on rulesets your children may be
taking up even more memory than budgeted above.  See what they're
using and confirm that you're not going to hit swap.  If the
problem still persists I'd like to hear about it.


Still having problems - even with -L.  Server has 1 GB of memory,
more is on the way I hope.  Anyway - I have the following rules:

fastconcepts:/etc/mail/spamassassin# ls
10_misc.cf 88_FVGT_headers.cf
70_sare_adult.cf   88_FVGT_rawbody.cf
70_sare_bayes_poison_nxm.cf88_FVGT_subject.cf
70_sare_evilnum0.cf88_FVGT_uri.cf
70_sare_genlsubj.cf99_FVGT_DomainDigits.cf
70_sare_header.cf  99_FVGT_meta.cf
70_sare_highrisk.cf99_sare_fraud_post25x.cf
70_sare_html.cfRulesDuJour
70_sare_obfu.cfantidrug.cf
70_sare_oem.cf backhair.cf
70_sare_random.cf  bogus-virus-warnings.cf
70_sare_ratware.cf chickenpox.cf
70_sare_specific.cfcoaching.cf
70_sare_spoof.cf   init.pre
70_sare_stocks.cf  local.cf
70_sare_unsub.cf   mr_wiggly.cf
70_sare_uri.cf random.cf
70_sare_uri0.cftripwire.cf
70_sare_whitelist_rcvd.cf  tsa-list.cf
70_sare_whitelist_spf.cf   useless.cf
72_sare_bml_post25x.cf v310.pre
72_sare_redirect_post3.0.0.cf  v312.pre
88_FVGT_body.cfweeds.cf

I created coaching.cf and tsa-list.cf - they are basically one
(well two technically) liners.

If I need to cut back I'm not sure where to start - the SARE rules
are the ones listed on rulesemporium.com - the FVGT are on
http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets.  Of the
rest, only bogus-virus-warnings.cf is of any substantial size (and
still 1/3 the size of the largest SARE rule).

Should v310.pre be removed if v312.pre is also there?


No. Cut down on the number of children. 12 is ridiculous.

{^_^}


Re: Calling Regex Experts

2006-08-24 Thread jdow

From: "D.J." <[EMAIL PROTECTED]>

I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


And do what with "cat cat dog catapult"?

{^_^}


Re: Where to install imageinfo.pm?

2006-08-24 Thread Kenneth Porter
--On Thursday, August 24, 2006 2:12 PM +0530 BG Mahesh <[EMAIL PROTECTED]> 
wrote:



I am using SA-3.1.4. I am in the process of installing
http://www.rulesemporium.com/plugins.htm
Where do I install
ImageInfo.pm[which
directory]?


On Fedora I put mine in /etc/mail/spamassassin/plugins, and added the path 
to the end of the load line in init.pre:


loadplugin Mail::SpamAssassin::Plugin::ImageInfo 
/etc/mail/spamassassin/plugins/ImageInfo.pm


(That may split across two lines in the mail. It should all be on one line 
in your file.)


I also added the following "informational" rules to 
/etc/mail/spamassassin/imageinfo.cf:


# informational message to test plugin

metaGIF_ATTACH_1__GIF_ATTACH_1
describeGIF_ATTACH_1One GIF attachment
score   GIF_ATTACH_10.001
metaGIF_ATTACH_2P   __GIF_ATTACH_2P
describeGIF_ATTACH_2P   Two GIF attachments
score   GIF_ATTACH_2P   0.001

metaPNG_ATTACH_1__PNG_ATTACH_1
describePNG_ATTACH_1One PNG attachment
score   PNG_ATTACH_10.001
metaPNG_ATTACH_2P   __PNG_ATTACH_2P
describePNG_ATTACH_2P   Two PNG attachments
score   PNG_ATTACH_2P   0.001

metaJPEG_ATTACH_1   __JPEG_ATTACH_1
describeJPEG_ATTACH_1   One JPEG attachment
score   JPEG_ATTACH_1   0.001
metaJPEG_ATTACH_2P  __JPEG_ATTACH_2P
describeJPEG_ATTACH_2P  Two JPEG attachments
score   JPEG_ATTACH_2P  0.001





RE: Adding 'SA scores' to all incoming mails

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, Michael Grey wrote:

> In this example, all emails get an additional header :
> 
> X-Spam-score-breakdown calvin score 6.77/4.5  
> 
> "add_header all score-breakdown calvin score _HITS_/_REQD_ "

And that's better than this:

X-Spam-Status: No, score=3.5 required=5.0 tests=BAYES_50,FROM_EXCESS_QP,
FROM_SUBDOMAIN,HTML_COMMENTS,HTML_EMBED_IMG_04,HTML_MESSAGE,
HTML_MIME_NO_HTML_TAG,MIME_HTML_ONLY,SARE_UNSUB38D,SPF_PASS,
SUBJECT_EXCESS_QP autolearn=disabled version=3.1.3

how?

I'm not saying it shouldn't be done, but that the scores and rule hits
are *already there* so why paste them in yet again?

> On Thu, 24 Aug 2006, list wrote:
> 
> > I'd like SA to make a extra line/section under all my mails where it 
> > tells what score the mail got (or maybe even which rules scored on the 
> > mail)  is there such a setting?
> > 
> > it would help me to finetune my SA.
> 
> You mean, actually paste the score into the body or attach it as
> another MIME body part?
> 
> Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: word doc spams

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, John D. Hardin wrote:

> Could somebody gzip up a raw word-doc spam (complete message pls, not
> from Outlook) and send it to me offlist? I only ever got one and
> didn't keep a copy of the raw message, and I think I'll take a shot at
> a plugin for them so I need an example.

Okay, thanks for the samples. No more needed for now.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



False positives and Bayes

2006-08-24 Thread Justin Lloyd
Title: False positives and Bayes






Hello, all.

A couple of months ago I built new mail servers to replace our existing ones that had aging mail configurations (and disparate OS configurations), running sendmail 8.12.6 and SA 3.0.2. Our configuration now consists of 2 RHEL 4 ES servers that share the load using DNS round-robin, running sendmail 8.13.7 and SpamAssassin 3.1.3, and we are running sa-update and rulesdujour nightly (though actual updates are rare). We use spamass-milter 0.31, which we have configured to drop spams with scores >= 10, thereby dropping about 75% of the incoming email before it gets to our Exchange servers. Speaking of which, these servers do not deliver mail locally, rather all received mail either goes to internal MS Exchange servers or Linux helpdesk and mailing list servers. Also, our company is about 350 people and we receive a good deal of legitimate international email.

Here is our SA configuration from /etc/mail/spamassassin/local.cf:



required_score 5

rewrite_header Subject *** SPAM [_SCORE_] ***

report_safe 0

dcc_path /usr/local/bin/dccproc

razor_config /etc/mail/spamassassin/.razor/razor-agent.conf

dns_available yes

bayes_path /localhost/home/spamd/bayes

bayes_auto_learn_threshold_spam  30

bayes_auto_learn_threshold_nonspam   -0.1

bayes_min_ham_num  10

bayes_min_spam_num 10

auto_whitelist_path /localhost/home/spamd/auto-whitelist

include /etc/mail/spamassassin/whitelist

include /etc/mail/spamassassin/blacklist

Here are the statistics from both mail servers for the past 31 days:

    

Email:  1303815  Autolearn: 608540  AvgScore:  12.23  AvgScanTime:  1.38 sec

Spam:    745609  Autolearn: 139632  AvgScore:  23.36  AvgScanTime:  1.52 sec

Ham: 558206  Autolearn: 468908  AvgScore:  -2.63  AvgScanTime:  1.20 sec

Email:   945103  Autolearn: 284139  AvgScore:  15.33  AvgScanTime:  1.46 sec

Spam:    701327  Autolearn: 131994  AvgScore:  22.30  AvgScanTime:  1.46 sec

Ham: 243776  Autolearn: 152145  AvgScore:  -4.74  AvgScanTime:  1.44 sec

(We think the disparity in mail counts between the two is due to some senders having cached or hard-coded the first one’s IP address and using it rather than MX lookups like normal people do.)

The major problem we are seeing is a number of false positives in the 6-8 point range due to 3.5 points from BAYES_99 on messages that should not be hitting that rule from what we can see. One thing we’ve noticed is that many such messages are from mailing lists and newsletters and from ISPs that shall remain nameless, though many of these also score high due to several rfc-ignorant rules being hit.

We have turned off Bayes in the past (before the upgrade) and are debating doing so again, but first we decided to see what constructive criticism and advice the SA community may have regarding our configuration. Please let me know if any additional information would be useful.



Thanks,

Justin C. Lloyd

Senior Engineer and System Administrator

303-684-4166 Office

720-480-0380 Cell

303-684-4100 Fax

[EMAIL PROTECTED]

DigitalGlobe ®, An Imaging and Information Company






Re: RBL Rules Misfiring

2006-08-24 Thread D . J .
On 8/23/06, 
Stuart Johnston <[EMAIL PROTECTED]> wrote:
As a quick guess, you probably need to fix your Trust Path:http://wiki.apache.org/spamassassin/TrustPath
No, I've got that set properly, as I didn't trust the autodiscovery.  So I've already entered the class C for my MX's and SMTP's there for both trusted_networks and internal_networks.


OK, after Googling around for a bit, I may have stumbled on something... specifically this trust path thing.  I had my trusted_networks and internal_networks set as my SMTP's and MX's class C network.  Because of that, is that causing SA to look at the relay beyond the trusted network as the agent to compare the RBL to?  Come to think of it, this didn't appear (or at least wasn't reported to me) before I set those values.  At any rate, I've completely removed the internal_networks value, and changed the trusted values variable to 
127.0.0.1.  Eventually this will be behind a NAT machine, so I have to set *something*.  If anyone thinks I'm on the right path, let me know.  I'm also going to continue monitoring for this problem, so if it goes away now, I'll let the list know for posterity's sake.  Thanks!
- D.J.


word doc spams

2006-08-24 Thread John D. Hardin
Could somebody gzip up a raw word-doc spam (complete message pls, not
from Outlook) and send it to me offlist? I only ever got one and
didn't keep a copy of the raw message, and I think I'll take a shot at
a plugin for them so I need an example.

TIA.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: Adding 'SA scores' to all incoming mails

2006-08-24 Thread John D. Hardin
On Thu, 24 Aug 2006, list wrote:

> I'd like SA to make a extra line/section under all my mails where it 
> tells what score the mail got (or maybe even which rules scored on the 
> mail)  is there such a setting?
> 
> it would help me to finetune my SA.

You mean, actually paste the score into the body or attach it as
another MIME body part?

Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: "sa-learn -q" patch in FreeBSD

2006-08-24 Thread Mark Martinec
Vivek Khera wrote:
> in the current port for 3.1.4, there are no freebsd-specific patches
> to SA, so whatever this was is no longer there.

You are one day behind  :)

> On Aug 23, 2006, at 5:01 PM, Justin Mason wrote:
> > anyone know what this is/does?
> >   http://cia.navi.cx/stats/project/FreeBSD/.message/32ba98d/xml

No idea why it is there, but apparently adds option -q (=quiet)
to sa-learn and to spamassassin, suppressing  the:
  print "$phrase tokens from $learnedcount message(s)
 ($messagecount message(s) examined)\n"
and the:
  print "$count message(s) examined.\n"

Mark


RE: Adding 'SA scores' to all incoming mails

2006-08-24 Thread Michael Grey
Check the docs for 'add_header' in local.cf or user_prefs.
The key words here are 'add_header all ' then the text  and variables you
want to have displayed; the Rule Scoring is another 'variable' that can be
sourced.

In this example, all emails get an additional header :

X-Spam-score-breakdown calvin score 6.77/4.5  

"add_header all score-breakdown calvin score _HITS_/_REQD_ "

Good luck...

Michael Grey


-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 24, 2006 2:32 PM
To: list
Cc: users@spamassassin.apache.org
Subject: Re: Adding 'SA scores' to all incoming mails

On Thu, 24 Aug 2006, list wrote:

> I'd like SA to make a extra line/section under all my mails where it 
> tells what score the mail got (or maybe even which rules scored on the 
> mail)  is there such a setting?
> 
> it would help me to finetune my SA.

You mean, actually paste the score into the body or attach it as
another MIME body part?

Are the X-Spam-* headers not sufficient?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Taking my gun away because I *might* shoot someone is like cutting
  my tongue out because I *might* yell "Fire!" in a crowded theater.
  -- Peter Venetoklis
---
 26 days until Talk Like a Pirate day



Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:> > D.J. wrote:> > > On 8/24/06, Bart Schaefer <
[EMAIL PROTECTED]> wrote:> > > > On 8/24/06, D. J. <[EMAIL PROTECTED] > wrote:> > > > >> > > > > I'm expecting these type of strings for sure:
> > > > >> > > > > cat> > > > > dog> > > > > cat dog> > > > > dog cat> > > > >> > > > > But I may get something like this too:
> > > > >> > > > > cat cat dog> > > > > dog dog> > > > >> > > > > Essentially I want it to match if anything other than cat or
> > > > > dog is in the string.> > > >> > > > That constraint means you have to construct a regex that can be> > > > anchored at both beginning and end of string, 
e.g.> > > > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in> > > > the context of a spamassassin rule, except maybe one matching> > > > against a specific header.
> > >> > > That's the idea... I've got the RELAY_COUNTRIES plugin that I want> > > it to place a small score if the relay server is not in the US or> > > Canada.  However, I'm not sure if the plugin will list the same
> > > country multiple times, which is where my uncertainty in the "cat> > > cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/)> > > seems to be working well, but if I have a spammer smart enough to
> > > manage to bounce his spam originating in China off of somewhere in> > > the US before it hits my MX, then that rule will fail.  Am I> > > possibly too paranoid?> >
> > Ok.  Try this one:> >> >$value =~ /\b(?!cat\b|dog\b)\w+\b/i> >> > This will match any word in the string as long as that word is not> > "cat" or "dog".
>> OK, we're actually really close.  That actually matched everything I> didn't want to match... we just have to get it to do the opposite of> that.  I have 6 test strings I tested against in a test script:
>> cat> dog> cat dog> dog cat> bird> cat bird>> It matched the top four (incorrectly).Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).Test program:@strings = ( "cat", "dog", "cat dog", "dog cat", "bird", "cat bird", "caterwaul" );
for $str (@strings) {if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {print "$str -- MATCHED\n";}else {print "$str -- no match\n";
}}Output:cat -- no matchdog -- no matchcat dog -- no matchdog cat -- no matchbird -- MATCHEDcat bird -- MATCHEDcaterwaul -- MATCHED--
BowieBINGO!  I still had my negative in there, I only copied the / to / part of the regex.  You sir, are the man!


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
> > D.J. wrote:
> > > On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
> > > > On 8/24/06, D. J. <[EMAIL PROTECTED] > wrote:
> > > > > 
> > > > > I'm expecting these type of strings for sure:
> > > > > 
> > > > > cat
> > > > > dog
> > > > > cat dog
> > > > > dog cat
> > > > > 
> > > > > But I may get something like this too:
> > > > > 
> > > > > cat cat dog
> > > > > dog dog
> > > > > 
> > > > > Essentially I want it to match if anything other than cat or
> > > > > dog is in the string.
> > > > 
> > > > That constraint means you have to construct a regex that can be
> > > > anchored at both beginning and end of string, e.g.
> > > > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in
> > > > the context of a spamassassin rule, except maybe one matching
> > > > against a specific header.
> > > 
> > > That's the idea... I've got the RELAY_COUNTRIES plugin that I want
> > > it to place a small score if the relay server is not in the US or
> > > Canada.  However, I'm not sure if the plugin will list the same
> > > country multiple times, which is where my uncertainty in the "cat
> > > cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/)
> > > seems to be working well, but if I have a spammer smart enough to
> > > manage to bounce his spam originating in China off of somewhere in
> > > the US before it hits my MX, then that rule will fail.  Am I
> > > possibly too paranoid?
> > 
> > Ok.  Try this one:
> > 
> >$value =~ /\b(?!cat\b|dog\b)\w+\b/i
> > 
> > This will match any word in the string as long as that word is not
> > "cat" or "dog".
> 
> OK, we're actually really close.  That actually matched everything I
> didn't want to match... we just have to get it to do the opposite of
> that.  I have 6 test strings I tested against in a test script:  
> 
> cat
> dog
> cat dog
> dog cat
> bird
> cat bird
> 
> It matched the top four (incorrectly).

Are you sure you used it correctly?  This is a positive match (=~), not a
negative match (!~).

Test program:
@strings = ( "cat", "dog", "cat dog", "dog cat", "bird",
 "cat bird", "caterwaul" );
for $str (@strings) {
if ($str =~ /\b(?!cat\b|dog\b)\w+\b/i) {
print "$str -- MATCHED\n";
}
else {
print "$str -- no match\n";
}
}

Output:
cat -- no match
dog -- no match
cat dog -- no match
dog cat -- no match
bird -- MATCHED
cat bird -- MATCHED
caterwaul -- MATCHED

-- 
Bowie


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:> > On 8/24/06, D. J. <[EMAIL PROTECTED]
> wrote:> > >> > > I'm expecting these type of strings for sure:> > >> > > cat> > > dog> > > cat dog> > > dog cat> > >
> > > But I may get something like this too:> > >> > > cat cat dog> > > dog dog> > >> > > Essentially I want it to match if anything other than cat or dog is
> > > in the string.> >> > That constraint means you have to construct a regex that can be> > anchored at both beginning and end of string, e.g.> > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
> > context of a spamassassin rule, except maybe one matching against a> > specific header.>> That's the idea... I've got the RELAY_COUNTRIES plugin that I want it> to place a small score if the relay server is not in the US or
> Canada.  However, I'm not sure if the plugin will list the same> country multiple times, which is where my uncertainty in the "cat cat> dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems
> to be working well, but if I have a spammer smart enough to manage to> bounce his spam originating in China off of somewhere in the US> before it hits my MX, then that rule will fail.  Am I possibly too
> paranoid?Ok.  Try this one:   $value =~ /\b(?!cat\b|dog\b)\w+\b/iThis will match any word in the string as long as that word is not"cat" or "dog".--Bowie
OK, we're actually really close.  That actually matched everything I didn't want to match... we just have to get it to do the opposite of that.  I have 6 test strings I tested against in a test script:
catdogcat dogdog catbirdcat birdIt matched the top four (incorrectly).


Training Spamassassin

2006-08-24 Thread Edward Diener
For my IMAP mail account my e-mail host has setup Spamassassin to be 
automatically trained by using a 'spam-to-learn' and 'ham-to-learn' IMAP 
folders for my mailbox on the server. I had assiduously been moving 
messages not already marked as [SPAM] by Spamassassin into the 
'spam-to-learn' folder. I had thought that the 'ham-to-learn' folder was 
for messages which Spamassassin had marked as [SPAM] but which really 
were not spam, so considering these were very few I had moved almost no 
messages into my 'ham-to-learn' folder.


Very little changes have happened in Spamassassin's ability to mark 
messages as [SPAM], despite the fact that my messages on this mailbox 
run at least 30-1 'spam'-normal and I get about 250 messages a day. 
Spamassassin was only marking maybe 5% of my total spam messages as 
spam. I then asked my mailbox host why Spamassassin was not catching 
more spam and he told me that until Spamassassin has processed at least 
200 messages from both the 'spam-to-learn' and 'ham-to-learn' folders, 
it does very little to pick up spam messages.


Is this true ? Am I supposed to be putting copies of messages which 
Spamassassin has not marked as spam and which are not spam into my 
'ham-to-learn' folder, as opposed to messages which Spamassassin has 
erroneously marked as spam ?


Even in the latter case, I have so few messages sent to this mailbox 
which are not spam that it will take up a few weeks to get 200+ messages 
processed by Spamassassin in the 'ham-to-learn' folder. Until that time 
Spamassassin will continue to not mark spam as such. Is this the normal 
way in which Spamassassin works or is my mail host incorrect ?




RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
> > On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:
> > > 
> > > I'm expecting these type of strings for sure:
> > > 
> > > cat
> > > dog
> > > cat dog
> > > dog cat
> > > 
> > > But I may get something like this too:
> > > 
> > > cat cat dog
> > > dog dog
> > > 
> > > Essentially I want it to match if anything other than cat or dog is
> > > in the string.
> > 
> > That constraint means you have to construct a regex that can be
> > anchored at both beginning and end of string, e.g.
> > /\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
> > context of a spamassassin rule, except maybe one matching against a
> > specific header.
> 
> That's the idea... I've got the RELAY_COUNTRIES plugin that I want it
> to place a small score if the relay server is not in the US or
> Canada.  However, I'm not sure if the plugin will list the same
> country multiple times, which is where my uncertainty in the "cat cat
> dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems
> to be working well, but if I have a spammer smart enough to manage to
> bounce his spam originating in China off of somewhere in the US
> before it hits my MX, then that rule will fail.  Am I possibly too
> paranoid?

Ok.  Try this one:

   $value =~ /\b(?!cat\b|dog\b)\w+\b/i

This will match any word in the string as long as that word is not
"cat" or "dog".

-- 
Bowie


Re: Another SARE channel with the most used rules available

2006-08-24 Thread Vivek Khera


On Aug 24, 2006, at 7:26 AM, [EMAIL PROTECTED] wrote:

Yes, downloading the gpg and using "sa-update import" doesn't have  
that problem though. So, how to extract this public key alone from  
the public key ring to copy over to the sa-update public key ring?  
Any idea on this is welcome :)


gpg --armor --export KEYID

the man page is amazingly helpful ;-)



smime.p7s
Description: S/MIME cryptographic signature


Re: "sa-learn -q" patch in FreeBSD

2006-08-24 Thread Vivek Khera


On Aug 23, 2006, at 5:01 PM, Justin Mason wrote:


anyone know what this is/does?

  http://cia.navi.cx/stats/project/FreeBSD/.message/32ba98d/xml

--j.


in the current port for 3.1.4, there are no freebsd-specific patches  
to SA, so whatever this was is no longer there.




Re: bayes autolearn acting up

2006-08-24 Thread lists



Since upgrading to 3.14, when I turn on bayes auto-learn with:

bayes_auto_learn 1

and I set the learn boundaries with:

bayes_auto_learn_threshold_nonspam-3.5
bayes_auto_learn_threshold_spam   15.5

I get unexpected auto-learning.  Example:  I just saw a spam come
through that scored 9.9, which is enough for it to be tagged as spam,
but it should not be auto-learned as spam.  But, in the header it
clearly reads:

X-Spam-Status:
Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE,
MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
version=3.1.4


Any ideas?
SA does not autolearn based on the final message score. So, toss  
the 9.9

out the window. That's not the number SA compares to the 15.5.

For learning SA uses what the message score would have been if: 1) the
AWL is off. 2) Bayes was disabled, including shifting what scoreset is
used for all the other rules. 3) all white/blacklists are disabled.  
This

is often *quite* different from the final score.

However, in this case I don't entirely understand... The default SA  
3.1

scores are:

score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
score DCC_CHECK 0 1.37 0 2.17
score DIGEST_MULTIPLE 0 0.233 0 0.765
score HTML_40_50 0.611 0 0.497 0.496
score HTML_MESSAGE 0.001
score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
score RAZOR2_CHECK 0 0.5 0 0.5
score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234

Adding the set1 scores up, the learning score should have been 4.753.

Have you modified any rule scores?



Thanks for trying to help Matt.  No, I don't think I have changed any  
of those scores.  I understand the basics of how the autolearn  
works.  For a long time, with the settings above, it would usually  
only autolearn spams with extremely high scores (well over 15).  Now,  
basically EVERY mail tagged as spam is being autolearned as spam  
whether it has scored 30 or 5.2.  The other weird issue is that  
anything that is not being tagged as spam is also being autolearned  
as ham.  (i.e. mails with scores of 3.5)  which is absolutely not  
what I want.


Thanks,
Devin


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bart Schaefer <[EMAIL PROTECTED]> wrote:
On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:>> I'm expecting these type of strings for sure:>> cat> dog> cat dog> dog cat
>> But I may get something like this too:>> cat cat dog> dog dog>> Essentially I want it to match if anything other than cat or dog is in the> string.That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g./\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in thecontext of a spamassassin rule, except maybe one matching against aspecific header.
That's the idea... I've got the RELAY_COUNTRIES plugin that I want it to place a small score if the relay server is not in the US or Canada.  However, I'm not sure if the plugin will list the same country multiple times, which is where my uncertainty in the "cat cat dog" scenario came in.  So far my original rule ( !~ /cat|dog/) seems to be working well, but if I have a spammer smart enough to manage to bounce his spam originating in China off of somewhere in the US before it hits my MX, then that rule will fail.  Am I possibly too paranoid?



Re: Calling Regex Experts

2006-08-24 Thread Bart Schaefer

On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:


I'm expecting these type of strings for sure:

cat
dog
cat dog
dog cat

But I may get something like this too:

cat cat dog
dog dog

Essentially I want it to match if anything other than cat or dog is in the
string.


That constraint means you have to construct a regex that can be
anchored at both beginning and end of string, e.g.
/\A(\s*(cat|dog)\s*)+\Z/.  I'm not sure that ever makes sense in the
context of a spamassassin rule, except maybe one matching against a
specific header.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
I'm not quite clear on what you want here.  Your example should NOThave matched on "cat dog bird" since it contains one of your terms.
It would have matched on "bird", since it doesn't.Oops... that's what I meant.  It doesn't match (though I want it to) because it contains one of the terms.


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> OK, I'm stumped.  I need to create a regex that will match if> anything other than two terms I've specified exist.>> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or> dog.>> I tried ...>> $value !~ /cat|dog/>> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one> of my two terms.  So what do I need to do?  Thanks in advance!I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.It would have matched on "bird", since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than "cat" or "dog".
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieThe regex...$value !~ /^(?:cat|dog)$/
...incorrectly matches a string such as "cat dog" or "dog cat" where both terms are present.  It does however work properly for something like "dog bird".


Re: Calling Regex Experts

2006-08-24 Thread D . J .
On 8/24/06, Bowie Bailey <[EMAIL PROTECTED]> wrote:
D.J. wrote:> OK, I'm stumped.  I need to create a regex that will match if> anything other than two terms I've specified exist.>> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or> dog.>> I tried ...>> $value !~ /cat|dog/>> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one> of my two terms.  So what do I need to do?  Thanks in advance!I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.It would have matched on "bird", since it doesn't.If you want to match any string that doesn't include your terms, you
do it just like you said.$value !~ /cat|dog/If you want to match any string which does not exactly match yourterms, do this:$value !~ /^(?:cat|dog)$/This will match on anything other than "cat" or "dog".
If this doesn't help, give us some more examples of things you expectto match and things you don't expect to match.--BowieI'm expecting these type of strings for sure:
catdogcat dogdog catBut I may get something like this too:cat cat dogdog dogEssentially I want it to match if anything other than cat or dog is in the string.


RE: Calling Regex Experts

2006-08-24 Thread Bowie Bailey
D.J. wrote:
> OK, I'm stumped.  I need to create a regex that will match if
> anything other than two terms I've specified exist. 
> 
> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or
> dog.  
> 
> I tried ...
> 
> $value !~ /cat|dog/
> 
> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one
> of my two terms.  So what do I need to do?  Thanks in advance!  

I'm not quite clear on what you want here.  Your example should NOT
have matched on "cat dog bird" since it contains one of your terms.
It would have matched on "bird", since it doesn't.

If you want to match any string that doesn't include your terms, you
do it just like you said.

$value !~ /cat|dog/

If you want to match any string which does not exactly match your
terms, do this:

$value !~ /^(?:cat|dog)$/

This will match on anything other than "cat" or "dog".

If this doesn't help, give us some more examples of things you expect
to match and things you don't expect to match.

-- 
Bowie


Adding 'SA scores' to all incoming mails

2006-08-24 Thread list
I'd like SA to make a extra line/section under all my mails where it 
tells what score the mail got (or maybe even which rules scored on the 
mail)  is there such a setting?


it would help me to finetune my SA.

tnx


Re: Calling Regex Experts

2006-08-24 Thread Alan Premselaar
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1



D.J. wrote:
> OK, I'm stumped.  I need to create a regex that will match if anything
> other than two terms I've specified exist.
> 
> So for example, I have two terms I like, say "cat" and "dog".  I want
> the rule to match if a string contains anything other than cat or dog.
> 
> I tried ...
> 
> $value !~ /cat|dog/
> 
> ...but this had the unintended consequence of still matching a string
> like "cat dog bird" or "cat bird" since the string does contain one of
> my two terms.  So what do I need to do?  Thanks in advance!
> 
> - D.J.


D.J.,

 you're probably best off using META rules for this.  So you could have
something like (completely untested and off the top of my head in the
middle of the night):

body __CAT  /cat/
body __DOG  /dog/

meta NOT_CAT_AND_DOG(!__CAT && !__DOG)

you should definitely check the man pages and/or wiki about writing
rules to do this properly, but that should get you started.

Alan
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.1 (Darwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE7dfoE2gsBSKjZHQRAozpAKC+edJGc52qWz1qguOQReCLUy3z9ACgzFpn
V20guvwnlLaKHy3Aiy8FLQs=
=eGwC
-END PGP SIGNATURE-


Calling Regex Experts

2006-08-24 Thread D . J .
OK, I'm stumped.  I need to create a regex that will match if anything other than two terms I've specified exist.So for example, I have two terms I like, say "cat" and "dog".  I want the rule to match if a string contains anything other than cat or dog.
I tried ...$value !~ /cat|dog/ ...but this had the unintended consequence of still matching a string like "cat dog bird" or "cat bird" since the string does contain one of my two terms.  So what do I need to do?  Thanks in advance!
- D.J.


Re: RBL Rules Misfiring

2006-08-24 Thread D . J .
On 8/24/06, D. J. <[EMAIL PROTECTED]> wrote:

D.J. wrote:> Hello all.>> I searched my archive of the list, and couldn't find a similar issue.
> This is probably something I've misconfigured, but here goes.  Running> SA 3.14 via the Mail::SpamAssassin Perl plugin from amavisd-new.  Have> been running into a problem where some dynamic RBL lists are firing just
> because the IP is in the headers, not necessarily because it's the IP> talking to my MTA.  They are indeed IPs in the list but shouldn't be> firing because they're really using their ISP's mail servers as you can
> see later in the headers.  I'm *really* hoping this isn't intended> operation and it's just something I've blundered somehow.  Below is a> piece of one of the message notifications I receive.  I've been watching
> this on a couple small domains I own before putting it on our main one,> and it's a good thing!>> Thanks in advance for the help.>> - D.J.>>> Content analysis details:   (
10.9 points, 5.0 required)>>  pts rule name  description>  --> -->  1.4 MSGID_FROM_MTA_ID  Message-Id for external message added locally
> -0.0 SPF_PASS   SPF: sender matches SPF record>  0.0 HTML_MESSAGE   BODY: HTML included in message>  0.0 BAYES_50   BODY: Bayesian spam probability is 40 to 60%> [score: 
0.4964]>  2.2 RCVD_IN_SORBS_SOCKSRBL: SORBS: sender is open SOCKS proxy server> [
24.140.8.46  listed in> dnsbl.sorbs.net <
http://dnsbl.sorbs.net>]>  2.0 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP
> address> [24.140.8.46 <
http://24.140.8.46> listed in> dnsbl.sorbs.net
 ]>  2.6 RCVD_IN_DSBL   RBL: Received via a relay in 
list.dsbl.org> <
http://list.dsbl.org>> []>  0.7 RCVD_IN_NJABL_PROXYRBL: NJABL: sender is an open proxy
> [24.140.8.46 <
http://24.140.8.46> listed in> combined.njabl.org <
http://combined.njabl.org>]>  1.9 RCVD_IN_NJABL_DUL  RBL: NJABL: dialup sender did non-local SMTP
> [24.140.8.46
  listed in> 
combined.njabl.org ]>  
1.8 MISSING_SUBJECTMissing Subject: header> -1.8 AWLAWL: From: address is in the auto white-list>> Return-Path: > Received: from 

smtp-1.sssnet.com > (
nat-147.sssnet.com  [24.140.1.147> <
http://24.140.1.147>])> by test.sssnet.com <

http://test.sssnet.com> (Postfix) with ESMTP> id 663292B803E> for ; Wed, 23 Aug 2006 14:58:41 -0400 (EDT)> Received: (qmail 11376 invoked by uid 507); 23 Aug 2006 18:58:42 -
> Received: from 24.140.8.46 <
http://24.140.8.46> by smtp-1.sssnet.com> <

http://smtp-1.sssnet.com> (envelope-from , uid 501) with> qmail-scanner-1.25st>  (clamdscan: 0.88.2/1715. spamassassin: 3.0.3. perlscan: 1.25st.>  Clear:RC:1(

24.140.8.46 ):SA:0(1.2/14.0):.>  Processed in 0.727458 secs); 23 Aug 2006 18:58:42 -
> X-Spam-Status: No, hits=1.2 required=14.0> X-Spam-Level: +
> Received: from cable-8-46.sssnet.com <
http://cable-8-46.sssnet.com>> (HELO SERVER) ([
24.140.8.46 ])>   (envelope-sender )>   by 0 (
qmail-ldap-1.03) with SMTP>   for ; 23 Aug 2006 18:58:41 -
> From: "Sue Repp" > To: "'Mary Richardson'" > Subject:> Date: Wed, 23 Aug 2006 14:58:53 -0400> MIME-Version: 1.0> Content-Type: multipart/alternative;
> boundary="=_NextPart_000__01C6C6C4.ABD60F20"> X-Mailer: Microsoft Office Outlook, Build 11.0.5510> Thread-Index: AcbG5izxOwnp3dUpR7iOx6AZ33ceQQ==> X-MimeOLE: Produced By Microsoft MimeOLE 
V6.00.2900.2962> X-Qmail-Scanner-Message-ID: <[EMAIL PROTECTED]
> 
[EMAIL PROTECTED]>>> Message-Id: <[EMAIL PROTECTED]
> 
[EMAIL PROTECTED]>>On 8/23/06, Stuart Johnston <
[EMAIL PROTECTED]> wrote:
As a quick guess, you probably need to fix your Trust Path:http://wiki.apache.org/spamassassin/TrustPath
No, I've got that set properly, as I didn't trust the autodiscovery.  So I've already entered the class C for my MX's and SMTP's there for both trusted_networks and internal_networks.





Re: Do I have too many rules? [WAS: timeout help]

2006-08-24 Thread Josh Trutwin
On Thu, 24 Aug 2006 10:11:28 -0400
Bowie Bailey <[EMAIL PROTECTED]> wrote:

> Josh Trutwin wrote:
> > 
> > Still having problems - even with -L.  Server has 1 GB of
> > memory, more is on the way I hope.
> 
> You said previously that you were running 12 children.  With 1GB
> of RAM, I would suggest that you drop it to 6 and see what
> happens.

Thanks - I had backed off to 8, still had probs so now I'm at 4.

> > Anyway - I have the following rules:
> > 
> > fastconcepts:/etc/mail/spamassassin# ls
> > 10_misc.cf 88_FVGT_headers.cf
> > 70_sare_adult.cf   88_FVGT_rawbody.cf
> > 70_sare_bayes_poison_nxm.cf88_FVGT_subject.cf
> > 70_sare_evilnum0.cf88_FVGT_uri.cf
> > 70_sare_genlsubj.cf99_FVGT_DomainDigits.cf
> > 70_sare_header.cf  99_FVGT_meta.cf
> > 70_sare_highrisk.cf99_sare_fraud_post25x.cf
> > 70_sare_html.cfRulesDuJour
> > 70_sare_obfu.cfantidrug.cf
> > 70_sare_oem.cf backhair.cf
> > 70_sare_random.cf  bogus-virus-warnings.cf
> > 70_sare_ratware.cf chickenpox.cf
> > 70_sare_specific.cfcoaching.cf
> > 70_sare_spoof.cf   init.pre
> > 70_sare_stocks.cf  local.cf
> > 70_sare_unsub.cf   mr_wiggly.cf
> > 70_sare_uri.cf random.cf
> > 70_sare_uri0.cftripwire.cf
> > 70_sare_whitelist_rcvd.cf  tsa-list.cf
> > 70_sare_whitelist_spf.cf   useless.cf
> > 72_sare_bml_post25x.cf v310.pre
> > 72_sare_redirect_post3.0.0.cf  v312.pre
> > 88_FVGT_body.cfweeds.cf
> > 
> > I created coaching.cf and tsa-list.cf - they are basically one
> > (well two technically) liners.
> > 
> > If I need to cut back I'm not sure where to start - the SARE
> > rules are the ones listed on rulesemporium.com - the FVGT are on
> > http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets.  Of
> > the rest, only bogus-virus-warnings.cf is of any substantial
> > size (and still 1/3 the size of the largest SARE rule).
> 
> antidrug.cf can be removed.  These rules are already included in
> the latest SA versions.

Thanks - done.  This should probably be removed from the
rules_du_jour script then or marked Deprecated.  I also was able to
remove mr_wrigley.cf.

> 70_sare_uri.cf and 70_sare_uri0.cf are overlapping.  You should
> use one or the other, not both.  70_sare_uri.cf contains all of
> the SARE uri rules.  70_sare_uri0.cf contains only the uri rules
> that did not hit any ham during testing.

That's strange - 70_sare_uri.cf is not even available on the rules
emporium page anymore - must've been downloaded before it was
split.  I removed em both and went with 70_sare_uri_eng.cf

> > Should v310.pre be removed if v312.pre is also there?
> 
> No, v312.pre contains plugins that were added with SA 3.1.2 and
> v310.pre contains plugins added with SA 3.1.0.  You should not
> remove any of these files.

Thanks - that was my guess.  :)

Josh


Problem with version

2006-08-24 Thread WrackWeb - Jean Respen

Hello,

I've built spamassasin 3.1.4, I had some problem, so I installed it with 
CPAN and then i overwrited it with a source version hehe and now 
everything is fine BUT when i see the headers of a email i can see

(spamassassin: 3.0.3. perlscan: 2.01st.
But a spamd -V shows

spamd -V
SpamAssassin Server version 3.1.4
  running on Perl 5.8.4
  with SSL support (IO::Socket::SSL 0.96)

Why are the versions different?

Thanks,

--
Jean "Wrack" Respen
[EMAIL PROTECTED]
http://www.wrackweb.net


RE: Do I have too many rules? [WAS: timeout help]

2006-08-24 Thread Bowie Bailey
Josh Trutwin wrote:
> 
> Still having problems - even with -L.  Server has 1 GB of memory,
> more is on the way I hope.

You said previously that you were running 12 children.  With 1GB of
RAM, I would suggest that you drop it to 6 and see what happens.

> Anyway - I have the following rules:
> 
> fastconcepts:/etc/mail/spamassassin# ls
> 10_misc.cf 88_FVGT_headers.cf
> 70_sare_adult.cf   88_FVGT_rawbody.cf
> 70_sare_bayes_poison_nxm.cf88_FVGT_subject.cf
> 70_sare_evilnum0.cf88_FVGT_uri.cf
> 70_sare_genlsubj.cf99_FVGT_DomainDigits.cf
> 70_sare_header.cf  99_FVGT_meta.cf
> 70_sare_highrisk.cf99_sare_fraud_post25x.cf
> 70_sare_html.cfRulesDuJour
> 70_sare_obfu.cfantidrug.cf
> 70_sare_oem.cf backhair.cf
> 70_sare_random.cf  bogus-virus-warnings.cf
> 70_sare_ratware.cf chickenpox.cf
> 70_sare_specific.cfcoaching.cf
> 70_sare_spoof.cf   init.pre
> 70_sare_stocks.cf  local.cf
> 70_sare_unsub.cf   mr_wiggly.cf
> 70_sare_uri.cf random.cf
> 70_sare_uri0.cftripwire.cf
> 70_sare_whitelist_rcvd.cf  tsa-list.cf
> 70_sare_whitelist_spf.cf   useless.cf
> 72_sare_bml_post25x.cf v310.pre
> 72_sare_redirect_post3.0.0.cf  v312.pre
> 88_FVGT_body.cfweeds.cf
> 
> I created coaching.cf and tsa-list.cf - they are basically one
> (well two technically) liners.
> 
> If I need to cut back I'm not sure where to start - the SARE rules
> are the ones listed on rulesemporium.com - the FVGT are on
> http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets.  Of the
> rest, only bogus-virus-warnings.cf is of any substantial size (and
> still 1/3 the size of the largest SARE rule).

antidrug.cf can be removed.  These rules are already included in the
latest SA versions.

70_sare_uri.cf and 70_sare_uri0.cf are overlapping.  You should use
one or the other, not both.  70_sare_uri.cf contains all of the SARE
uri rules.  70_sare_uri0.cf contains only the uri rules that did not
hit any ham during testing.

> Should v310.pre be removed if v312.pre is also there?

No, v312.pre contains plugins that were added with SA 3.1.2 and
v310.pre contains plugins added with SA 3.1.0.  You should not remove
any of these files.

-- 
Bowie


Re: How to whitelist_from <> ?

2006-08-24 Thread Philip Prindeville
Matt Kettler wrote:

>Philip Prindeville wrote:
>  
>
>>>
>>>  
>>>
>>Well, yes, especially since the IP address of the sender is reserved for
>>a machine that does ticketing and auto-replies exclusively (I was going
>>to use whitelist_from_rcvd and not just whitelist_from).
>>
>>
>
>At that point, you should be able to use:
>
> whitelist_from_rcvd * rdns.host.name
>
>Which will effectively white-list the host.
>  
>

There's no way to whitelist just the empty address then?  Rather than
everything?

-Philip



Re: Do I have too many rules? [WAS: timeout help]

2006-08-24 Thread Josh Trutwin
On Sat, 19 Aug 2006 18:06:59 -0400
"Daryl C. W. O'Shea" <[EMAIL PROTECTED]> wrote:

> Josh Trutwin wrote:
> > On Fri, 18 Aug 2006 12:16:51 -0400
> > "Daryl C. W. O'Shea" <[EMAIL PROTECTED]> wrote:
> > 
> >> Josh Trutwin wrote:
> >>> I've recently had a server experience some really slow spam
> >>> processing - I'm not sure what's going on but I notice a lot
> >>> of timeouts in the mail log:
> >>>
> >>> Aug 18 09:20:21 www spamd[27673]: timeout with empty $@
> >>> at /usr/local/share/perl/5.8.4/Mail/SpamAssassin/Timeout.pm
> >>> line 182,  line 1126. Aug 18 09:22:02 www spamd[27674]:
> >>> timeout with empty $@
> >>> Any suggestions?
> >>>
> >>> Debian linux - spamd 3.0.4 with pyzor/dcc/razor
> >>>
> >>> spamd running with:
> >>>
> >>> /usr/bin/spamd -d -D -q -x -H /etc/razor --max-children=12
> >>> --socketpath=/var/spool/spamassassin/spamd.sock -u spamd
> >> Unless you've got at least 600 or more MB of free RAM just for
> >> spamd's use, you've got too many children and are swap
> >> thrashing.  Back of the --max-children number.
> > 
> > I was getting the same results with less values - the box has 1
> > GB
> > - more is on the way though.  I disabled network tests with -L
> > and things work great again so something along that line is the
> > culprit.
> 
> Using -L processes messages faster, thus requiring less children
> to handle the load.  This error is caused by a system not being
> able to restore the child's config fast enough after it's done
> processing a message.  It's always due to one of two things...
> high load, or high load caused by swap thrashing.
> 
> If you're using a lot of add-on rulesets your children may be
> taking up even more memory than budgeted above.  See what they're
> using and confirm that you're not going to hit swap.  If the
> problem still persists I'd like to hear about it.

Still having problems - even with -L.  Server has 1 GB of memory,
more is on the way I hope.  Anyway - I have the following rules:

fastconcepts:/etc/mail/spamassassin# ls
10_misc.cf 88_FVGT_headers.cf
70_sare_adult.cf   88_FVGT_rawbody.cf
70_sare_bayes_poison_nxm.cf88_FVGT_subject.cf
70_sare_evilnum0.cf88_FVGT_uri.cf
70_sare_genlsubj.cf99_FVGT_DomainDigits.cf
70_sare_header.cf  99_FVGT_meta.cf
70_sare_highrisk.cf99_sare_fraud_post25x.cf
70_sare_html.cfRulesDuJour
70_sare_obfu.cfantidrug.cf
70_sare_oem.cf backhair.cf
70_sare_random.cf  bogus-virus-warnings.cf
70_sare_ratware.cf chickenpox.cf
70_sare_specific.cfcoaching.cf
70_sare_spoof.cf   init.pre
70_sare_stocks.cf  local.cf
70_sare_unsub.cf   mr_wiggly.cf
70_sare_uri.cf random.cf
70_sare_uri0.cftripwire.cf
70_sare_whitelist_rcvd.cf  tsa-list.cf
70_sare_whitelist_spf.cf   useless.cf
72_sare_bml_post25x.cf v310.pre
72_sare_redirect_post3.0.0.cf  v312.pre
88_FVGT_body.cfweeds.cf

I created coaching.cf and tsa-list.cf - they are basically one
(well two technically) liners.

If I need to cut back I'm not sure where to start - the SARE rules
are the ones listed on rulesemporium.com - the FVGT are on
http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets.  Of the
rest, only bogus-virus-warnings.cf is of any substantial size (and
still 1/3 the size of the largest SARE rule).

Should v310.pre be removed if v312.pre is also there?

Thanks,

Josh


RE: Filtering Aliases/Forwarders

2006-08-24 Thread DuBois, Joseph
Joanne,

Thanks for info, yeah saw the variables I could make substitutions on
and will probably do that once I get it up and running so I can make
better rules, but I am just trying to get it running right now.

For my tests, I am just trying to get it to work. I was sending emails
to myself from work to home accounts with the proper stuff to set off
the rules, but for some reason it doesn't seem to be catching it(and
rewritting the subjects).

For example I sent an email with a subject TEST and in the body SPAM. I
would expect to receive the email at my home email account with the
subject line rewritten, but it's not happening. The hosting provider
won't give me any help, thus turning to the list. Once I get it working
I will starting writing more unique rules to try and filter out all the
truly junk mail I get. Verizon has already shut off my emails once,
because I have about 23 public emails accounts forward to a single
account, thus really need to try and start cleaning it up.

On the CPANEL, it says it's enabled and that's about he only status I
seem to be able to check. It does have a "configuration" button which
brings up a form, where I can see that my local rules are set (from my
user_prefs file). So it seems to be at least reading that. Not sure
which version we are running, but I sent a request into the Hosting
provider to get that information along with any other configuration
information he can give me since he will not help me.

So I can only think, that one it is not parsing Alias/forward emails? Or
something else is wrong.

Again any help in getting this running is greatly appreciated! 

Thanks all.
 

-Original Message-
From: jdow [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 23, 2006 6:49 PM
To: users@spamassassin.apache.org
Subject: Re: Filtering Aliases/Forwarders

Joseph, may I make a slight suggestion for you?

For the rewrite try something about the same size that makes eyeball
filtering ham out of the spam folder much easier:
rewrite_header subject * Rated SPAM: _SCORE(00) *

Then the header subject will start with something like this:
"* Rated SPAM: 019.8 *". It'll be followed by the original subject, of
course. You filter to spam if "* Rated SPAM:" is seen. And you can sort
by subject to bring the low scores to the top.

And for demonstration or test rules I'd use low scores unless you
specifically wanted to see a hit. Then I'd search for something
gibberish in the text. Hm, actually I wonder if "gibberish" itself would
be a safe rule for testing. It almost never appears in normal mail and
spammers USUALLY are averse to calling their mail gibberish.
So {^_-} The scores I use run for rule testing are in the 0.001 to
0.1 range. Once the look good I give them real scores.


{^_^}   Joanne
- Original Message -
From: "DuBois, Joseph" <[EMAIL PROTECTED]>


Well met,
 
Just activated SpamAssassin on my website (by my web hosting provider)
and wanted to do some simple tests which I read from the Wiki site and
FAQ. When it didn't run I opened a ticket with my provider and he said
he didn't support it and I needed to find help else where. So here I am.
Right now, I'm just trying some simple tests to get my
Aliases/Forwarders (which get sent through my site) and forwarded onto
my ISP providers email account.
 
i.e. a public email [EMAIL PROTECTED] would get forwarded onto my
local isp provider at verizon, or comcast depending on who I have for a
particular month, so that way I don't have to change my email every
month.
 
So for my test, I set up the following basic local rules in
~/.spamassassin/user_prefs file.
 
I assume this would take any email with the word spam in the BODY or
test in SUBJECT and rewrite the SUBJECT with the new HEADER. But for
some reason it does not appear to be working.

body LOCAL_DEMONSTRATION_RULE   /spam/
score LOCAL_DEMONSTRATION_RULE 6.0
describe LOCAL_DEMONSTRATION_RULE   This is a simple test rule
header LOCAL_DEMONSTRATION_SUBJECT  Subject =~ /\btest\b/i
score LOCAL_DEMONSTRATION_SUBJECT   2
required_score  5
rewrite_header subject * Rated SPAM: Junk This! *
 
 
Does it not work for Aliases/Forwarders? Do you have to have a special
Client? I am using BAT by RitLABs, and/or Webbrowser.
 
Thanks!

Joseph DuBois, Lead Application Specialist
Application Standards & Specialty Projects
Children's Hospital Boston
[EMAIL PROTECTED]



 





Re: Another SARE channel with the most used rules available

2006-08-24 Thread [EMAIL PROTECTED]

Justin Mason wrote:


hey, btw, it might be better to extract the gpg public key from "gpg",
instead of copying over the entire public key ring -- since that will (a)
overwrite any existing SA-update keys, including the system ones,
and (b) will trust any existing GPG correspondents to publish SA updates!


Yes, downloading the gpg and using "sa-update import" doesn't have that 
problem though. So, how to extract this public key alone from the public 
key ring to copy over to the sa-update public key ring? Any idea on this 
is welcome :)


cheers,
skar.

--
OpenProtect - The email virus/spam filter
http://openprotect.com



Re: Filtering spam in national language

2006-08-24 Thread jdow

Find an Italian who wants to work with say the SARE ninjas for some
Italian specific rule versions, perhaps?

{^_^}
- Original Message - 
From: "cmon" <[EMAIL PROTECTED]>

To: 
Sent: Thursday, August 24, 2006 03:01
Subject: Filtering spam in national language



I work in an italian company, we are receiving some spam written in (very
bad) italian language, obviously produced by some automatic translator.
Although their content is heavily pornographic, the spam score is very low,
because they don't match any of the porn-specific rules, which are designed
only for english language.
Does anybody know how can we extend the basic rules to add support for
italian language pornografic spam?

Thanks, Carlo



Re: Where to install imageinfo.pm?

2006-08-24 Thread Andrew

BG Mahesh wrote:


hi

I am using SA-3.1.4. I am in the process of installing 
http://www.rulesemporium.com/plugins.htm
Where do I install ImageInfo.pm 
 [which directory]?




On my FreeBSD box, I put ImageInfo.pm here:
/usr/local/lib/perl5/site_perl/5.8.8/Mail/SpamAssassin/Plugin/

Andrew



Re: a new kind of spam (with images)

2006-08-24 Thread Stephane Bentebba




fanx all, 

i setted up fuzzyocr yesterday and it gives pretty good result
i need some time to well understand all :
- sometimes, using spamc -R or spamassassin -t, i can see fuzzy ocr
filter displaying score results
- but looking in spam folder and at the report of the marked as spam
mail : i can't see the fuzzy ocr text and it seems that the mail has
this processing : the image is converted in text then goes through the
normal filtering process as the mail would has been received in the
text format

anyhow, i put here som tips on how i did because it is not so obvious :


<<==

note : i use a redhat like : http://whiteboxlinux.org 
# cat /etc/whitebox-release
White Box Enterprise Linux release 3.0 (Liberation Respin 2)
# uname -a
Linux empereur.rungis 2.4.21-27.EL #1 Mon Feb 28 19:03:06 EST 2005 i686
i686 i386 GNU/Linux
# spamassassin -V
SpamAssassin version 3.0.4
  running on Perl version 5.8.0
and qmail

## first reference : http://wiki.apache.org/spamassassin/FuzzyOcrPlugin
(fanx to decoder)

## prerequisites :
## to check if you have perl module String::Approx installed ?
[EMAIL PROTECTED] root]# perl -e 'use String::Approx'
if you get no error : this is good, else do this : 

wget
http://search.cpan.org/CPAN/authors/id/J/JH/JHI/String-Approx-3.26.tar.gz
tar xvzf String-Approx-3.26.tar.gz
cd String-Approx-3.26
perl Makefile.PL
make
make test
make install

## netpbm and other
rpm -qa | grep -i netpbm
rpm -qa | grep -iE "giflib|libungif"

## gocr ? on sourceforge
wget http://ovh.dl.sourceforge.net/sourceforge/jocr/gocr-0.40.tar.gz
tar tvzf gocr-0.40.tar.gz
cd gocr-0.40
wget http://users.own-hero.net/~decoder/fuzzyocr/gocr-segfault.patch
patch -p0 < gocr-segfault.patch

./configure --with-netpbm=yes   (IMPORTANT : you need explicit the
option)
make
make examples
make install

ln -s /usr/local/bin/gocr /usr/bin/gocr   (I needed this because it was
not in path)

## giftext source and patch rpm -qf `type -p giftext`
rpm -qa | grep -i libungif
cp /usr/bin/giftext /usr/bin/giftext.ori (move the original file in
*.ori)

wget http://users.own-hero.net/~decoder/fuzzyocr/giftext-segfault.patch
wget
http://ovh.dl.sourceforge.net/sourceforge/libungif/libungif-4.1.4.tar.bz2
tar xjvf libungif-4.1.4.tar.bz2
cd libungif-4.1.4
(cd util; patch -p0 < ../../giftext-segfault.patch)
./configure
gmake (or make)
(gmake install : i didn't launch it : just copied giftext binary)
cp util/giftext /usr/bin/giftext
    strange : previous file was an executable, the new is only a shell
script... um... but it works

## da FuzzyOcr plugin 
wget http://users.own-hero.net/~decoder/fuzzyocr/fuzzyocr-2.1.tar.gz
cd /etc/mail/spamassassin/
tar xvzf fuzzyocr-2.1.tar.gz

BUG : 
my jpegtopnm which comes from my redhat distro doesn't handle the -
(dash) argument to read on its standard input
so i made a wrapper to cancel its use (...wow...) : 
<<
# cat /usr/bin/jpegtopnm
#! /bin/sh
BINAIRE="/usr/bin/jpegtopnm.ori"
if [ "$1" == "-" ]; then
    $BINAIRE
else
    $BINAIRE $@
fi
>>

else, you get this error : 
<<
[EMAIL PROTECTED] ajustement_spam]# echo glassware.gif | jpegtopnm
Not a JPEG file: starts with 0x67 0x6c
[EMAIL PROTECTED] ajustement_spam]# echo glassware.gif | jpegtopnm -
jpegtopnm: Can't open -.  Errno=No such file or directory(2).
>>
==>>

fanx great to all



jdow a écrit :
From: "Spamassassin
List" <[EMAIL PROTECTED]>
  
  
  

  Spamassassin List wrote:


  Stephane Bentebba wrote:

hi all,
  
  
i am more or less happy with my spamassassin configuration
  
works good for one year
  
but i have problem with a new kind of spam which easylly go
  
throught it :
  
spam which has poor text, poor token, or none, and a subject
  
always changing
  
the only thing which remain the same is the image incoporated in it
  
it get always very low hit (bellow 3)
  
subject on the image in the body is either "breaking news
  
concerning..." or "we have a runner !"
  
would it be possible to find a solution ?
  
add / modify a test to look at first bytes of an attachement and
  
recognize the image ?
  
i can send you samples of this spam if you like... (prefer not to
  
attach them)
  

Have a look at FuzzyOCR

http://wiki.apache.org/spamassassin/FuzzyOcrPlugin


Works very well for me - I'm using it in conjuction with ImageInfo

and since I'm using them those image spams get through VERY rarely

  
They will also block off legit emails too
  

How so?

  
  
I wouldn't expect any from FuzzyOCR but ImageInfo certainly has the
chance to block legit mail.
  


Sorry, I meant ImageInfo plugin.. I have many legit emails blocked by
this plugin.

  
  
Redu

Filtering spam in national language

2006-08-24 Thread cmon
I work in an italian company, we are receiving some spam written in (very bad) italian language, obviously produced by some automatic translator.Although their content is heavily pornographic, the spam score is very low, because they don't match any of the porn-specific rules, which are designed only for english language.
Does anybody know how can we extend the basic rules to add support for italian language pornografic spam?Thanks, Carlo


Re: How remove vigra gif message ?

2006-08-24 Thread decoder
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

FuzzyOcr recognizes this image with the following scanset: $gocr -l
100 -i -

- -l 140 also works, but not as good.


Chris

Philippe Couas wrote:
> Message Hi
>
> How could i remover theses messages ? Regard Philippe
>
> - Original Message - *From:* John
>  *To:* [EMAIL PROTECTED]
>  *Sent:* Thursday, August 24, 2006 5:14
> PM *Subject:* Full of health? Then don't click!
>
> 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.5 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFE7W/ZJQIKXnJyDxURAmfCAKCVse/frwZVwetA+vOgCG4brfjLwQCgnBkV
IbmE/aNQJ3sp256mbKyxFZE=
=dYwE
-END PGP SIGNATURE-



Where to install imageinfo.pm?

2006-08-24 Thread BG Mahesh
hiI am using SA-3.1.4. I am in the process of installing http://www.rulesemporium.com/plugins.htmWhere do I install 
ImageInfo.pm
 [which directory]?-- --B.G. Maheshhttp://www.greynium.com/http://www.oneindia.in/
http://www.click.in/ - Free Indian Classifieds


Re: Train from Outlook?

2006-08-24 Thread Jeremy Fairbrass
I use a nifty tool called OLSpamCop to achieve this functionality with my
Outlook. OLSpamCop is an Outlook plugin, it adds a new toolbar to Outlook
and basically allows you to select an email, hit either a "spam" or "ham"
button on the toolbar, and OLSpamCop will forward the email to an address
you've specified in the options - a different address depending on which
button you hit. It was designed for sending spam to SpamCop, but can be used
to forward the spam (or ham) to any address you specify, eg. to your mail
server and then to SpamAssassin for learning, eg. if you have set up "this
is spam" and "this is ham" receiving email addresses on your server, as my
server (MDaemon) does. When authenticated emails are forwarded to either of
these addresses on my server, it automatically runs the Bayes learning on
them accordingly. Thus this Outlook plugin works perfectly for me.

You can find it at http://www.olspamcop.org/. Oh yeah, and it's freeware...!

Cheers,
Jeremy

---
"Christopher Mills" <[EMAIL PROTECTED]> wrote in message
news:[EMAIL PROTECTED]
Tell me something, is there a pluggin for outlook that would allow me to
train spamassassin on the web server?
Eg, messages come in, end up in my Junk Mail folder, can i somehow select
them, and click a button with this 'addin' and have it find our web server
and train spam assassin with the data in my local inbox?  That would be a
very cool addon if someone could develop it.






Re: How to whitelist_from <> ?

2006-08-24 Thread Matt Kettler
Philip Prindeville wrote:
>
>> 
>
> Well, yes, especially since the IP address of the sender is reserved for
> a machine that does ticketing and auto-replies exclusively (I was going
> to use whitelist_from_rcvd and not just whitelist_from).

At that point, you should be able to use:

 whitelist_from_rcvd * rdns.host.name

Which will effectively white-list the host.


Re: bayes autolearn acting up

2006-08-24 Thread Matt Kettler
[EMAIL PROTECTED] wrote:
> Hello,
>
> Since upgrading to 3.14, when I turn on bayes auto-learn with:
>
> bayes_auto_learn 1
>
> and I set the learn boundaries with:
>
> bayes_auto_learn_threshold_nonspam-3.5
> bayes_auto_learn_threshold_spam   15.5
>
> I get unexpected auto-learning.  Example:  I just saw a spam come
> through that scored 9.9, which is enough for it to be tagged as spam,
> but it should not be auto-learned as spam.  But, in the header it
> clearly reads:
>
> X-Spam-Status: 
> Yes, score=9.9 required=5.0 tests=AWL,BAYES_99,
> DATE_IN_PAST_03_06,DCC_CHECK,DIGEST_MULTIPLE,HTML_40_50,HTML_MESSAGE,
> MIME_HTML_ONLY,RAZOR2_CHECK,RCVD_IN_WHOIS_INVALID autolearn=spam
> version=3.1.4
>
>
> Any ideas?
SA does not autolearn based on the final message score. So, toss the 9.9
out the window. That's not the number SA compares to the 15.5.

For learning SA uses what the message score would have been if: 1) the
AWL is off. 2) Bayes was disabled, including shifting what scoreset is
used for all the other rules. 3) all white/blacklists are disabled. This
is often *quite* different from the final score.

However, in this case I don't entirely understand... The default SA 3.1
scores are:

score DATE_IN_PAST_03_06 0.736 0 1.122 0.478
score DCC_CHECK 0 1.37 0 2.17
score DIGEST_MULTIPLE 0 0.233 0 0.765
score HTML_40_50 0.611 0 0.497 0.496
score HTML_MESSAGE 0.001
score MIME_HTML_ONLY 0.414 0.001 0.389 0.001
score RAZOR2_CHECK 0 0.5 0 0.5
score RCVD_IN_WHOIS_INVALID 0 2.151 0 2.234

Adding the set1 scores up, the learning score should have been 4.753.

Have you modified any rule scores?






Re: bayes autolearn acting up

2006-08-24 Thread jdow

From: <[EMAIL PROTECTED]>


Hello,

Since upgrading to 3.14, when I turn on bayes auto-learn with:

bayes_auto_learn 1

and I set the learn boundaries with:

bayes_auto_learn_threshold_nonspam-3.5


This doesn't answer your question. But, I suspect a -3.5 here will
all but turn off learning on ham.

{o.o}