Re: Q. about spam directed towards highest MX Record?

2006-09-27 Thread Daryl C. W. O'Shea

Rob McEwen wrote:

(CCing Marc Perkel because I seem to recall him knowing about this)

Not that I'd ever outright block based on this one factor alone, but...

Does anyone have any stats about what percentage of spam is directed towards
the highest MX Record? (that is, where there is more than one MX record?)

Also, has anyone ever seen ANY legit mail go to the highest MX record when
no mail server failure occurred?


I get lots of mail from a number of different Domino servers delivered 
to my lowest preference MXes.  I've always suspected it was something 
IBM had done to Domino to "improve queue performance" but I've never 
looked into it.



Daryl


Re: really slow spamd scan

2006-09-27 Thread Olivier Nicole
> > 14 seconds may be just the delay for the various network tests to
> > respond.
> You mean the test form SA? I have googled for this kind of situations
> and I found I am the slowest. If I stop the spamd, the delivery will
> be much faster.

I mean it depends how your SA is configured.

Some of the test are based on data on remove databases (all the RBL
test, Razor/Pysor, Dcc...) depending on your network connectivity, it
can take some time before you get a response for these various tests.

But once again, this is not an issue because it does not create any
load on your server: while SA is waiting for a network test to finish,
the server can process other emails.

I have an average SA processing time comprised between 10 and 15
seconds.

Olivier


Re: sa-learn and "Caught" spams

2006-09-27 Thread Dave Pooser
> For instance, given the explanations above, I'll
> start a system to automatically learn from my 'checkspam' folder, but not
> my 'highspam' folder.

Remember that your 'highspam' may be separated from 'checkspam' largely
based on network tests; I often see identical messages with a 6-8 point
variance depending on the connecting IP and the envelope sender. So if I
don't feed the Bayes the first message because it scored a safe 12 points,
that might mean the second sneaks through because it hits BAYES_20 instead
of BAYES_50, or whatever.

My own theory is "Learn 'em all and let Bayes sort 'em out."
-- 
Dave Pooser
Cat-Herder-in-Chief, Pooserville.com
"NOTHING says love like a monkey. It's a fuzzy screeching
bundle of tenderness!" -- QueenOfWands.net




Re: really slow spamd scan

2006-09-27 Thread Deephay

On 9/28/06, Olivier Nicole <[EMAIL PROTECTED]> wrote:

> I am quite new to SA (a week of SA life), and the SA is working, the
> thing is, SA is incredibly slow on my server (2.8GHZ CPU + 2GB Memory
> + Qmail + Qmail-scanner).  Here's a typical scan log:
>
> result: . 0 - SPF_PASS scantime=14.7,size=1689  ...

Hi,

Problem is not that it is slow.

That SA takes 14 seconds to deliver a message is not an issue, email
is not a real time process anyway and transiting email from one
gateway to another can take minutes or hours.


The scantime=14.7 does not mean the scan time of spamassassin?


Problem would be is SA would make high CPU load on your server.

14 seconds may be just the delay for the various network tests to
respond.

You mean the test form SA? I have googled for this kind of situations
and I found I am the slowest. If I stop the spamd, the delivery will
be much faster.



Bests

Olivier


Thanks very much for the suggestion!

Deephay


Re: Q. about spam directed towards highest MX Record?

2006-09-27 Thread Dave Pooser
> Also, has anyone ever seen ANY legit mail go to the highest MX record when
> no mail server failure occurred?

I've seen a tiny amount-- little enough that I earlier set my primary to
dump any messages received from my tertiary MX into a quarantine folder for
my review, but since I got ImageInfo.pm working properly I haven't noticed
any spam make it through mail3 unscathed.
-- 
Dave Pooser
Cat-Herder-in-Chief
Pooserville.com
"Dogs are what puppies turn into if you don't eat 'em before
they go all stringy." --Sgt. Schlock 




Re: really slow spamd scan

2006-09-27 Thread Olivier Nicole
> I am quite new to SA (a week of SA life), and the SA is working, the
> thing is, SA is incredibly slow on my server (2.8GHZ CPU + 2GB Memory
> + Qmail + Qmail-scanner).  Here's a typical scan log:
> 
> result: . 0 - SPF_PASS scantime=14.7,size=1689  ...

Hi,

Problem is not that it is slow. 

That SA takes 14 seconds to deliver a message is not an issue, email
is not a real time process anyway and transiting email from one
gateway to another can take minutes or hours.

Problem would be is SA would make high CPU load on your server.

14 seconds may be just the delay for the various network tests to
respond.

Bests

Olivier


really slow spamd scan

2006-09-27 Thread Deephay

Greetings all,

I am quite new to SA (a week of SA life), and the SA is working, the
thing is, SA is incredibly slow on my server (2.8GHZ CPU + 2GB Memory
+ Qmail + Qmail-scanner).  Here's a typical scan log:

result: . 0 - SPF_PASS scantime=14.7,size=1689  ...
.

And I have checked the SA wiki and found there is a note saying if you
are using UTF-8 locale,  the performance can be low. What I am
wondering is: Will it be that slow?
Any suggestion is appreciated.

Deephay


Re: FORGED_YAHOO_RCVD?

2006-09-27 Thread Matt Kettler
What's your trusted_networks look like? Based on the headers below
you'll need to set it manually.

By default SA assumes that all the "private range" hosts are part of
your network, and the first non-private. However, in this case, the
first non-private is yahoo's server. That's bad.



Jim Davis wrote:
> This autoresponse from Yahoo abuse crept over the spam line, mostly
> because of a hit on FORGED_YAHOO_RCVD... but it's not clear from the
> headers why that would be.  This is a from a Fedora Core 5 system
> running SpamAssassin 3.1.3 under amavisd-new 2.4.2:
>
>> Return-Path: <[EMAIL PROTECTED]>
>> Received: from xenopodid.cs.arizona.edu (xenopodid.cs.arizona.edu
>> [192.12.69.105])
>> by email.cs.arizona.edu (8.13.3/8.13.3) with ESMTP id
>> k8RFY9pl088354
>> for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006
>> 08:34:09
>> -0700 (MST)
>> (envelope-from [EMAIL PROTECTED])
>> Received: from localhost (xenopodid.cs.arizona.edu [127.0.0.1])
>> by xenopodid.cs.arizona.edu (Postfix) with ESMTP id 6FCBA6DCCEA
>> for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006
>> 08:34:09
>> -0700 (MST)
>> X-Virus-Scanned: amavisd-new at cs.arizona.edu
>> X-Spam-Flag: YES
>> X-Spam-Score: 5.019
>> X-Spam-Level: *
>> X-Spam-Status: Yes, score=5.019 tagged_above=- required=5
>> tests=[BAYES_40=-0.185, DNS_FROM_RFC_ABUSE=0.2,
>> DNS_FROM_RFC_POST=1.708, DNS_FROM_RFC_WHOIS=1.447,
>> FORGED_YAHOO_RCVD=1.849]
>> Received: from xenopodid.cs.arizona.edu ([127.0.0.1])
>> by localhost (xenopodid.cs.arizona.edu [127.0.0.1])
>> (amavisd-new,
>> port 10024)
>> with ESMTP id QAQkAwgDuuF1 for
>> <[EMAIL PROTECTED]>;
>> Wed, 27 Sep 2006 08:34:03 -0700 (MST)
>> Received: from cheltenham.cs.arizona.edu (cheltenham.cs.arizona.edu
>> [192.12.69.60])
>> by xenopodid.cs.arizona.edu (Postfix) with ESMTP id 90AAC6DCCD1
>> for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006
>> 08:34:03
>> -0700 (MST)
>> Received: from mail-relay1.yahoo.com (mail-relay1.yahoo.com
>> [216.145.48.34])
>> by cheltenham.cs.arizona.edu (8.13.4/8.13.4) with ESMTP id
>> k8RFY0Gj014456
>> for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:34:03
>> -0700 (MST)
>> (envelope-from [EMAIL PROTECTED])
>> Received: from speedster.cc.kana.corp.yahoo.com
>> (speedster.cc.kana.corp.yahoo.com [207.126.228.28])
>> by mail-relay1.yahoo.com (8.13.6/8.13.6/mr1) with SMTP id
>> k8RFMTSi086721
>> for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:22:29
>> -0700 (PDT)
>> Message-Id: <[EMAIL PROTECTED]>
>> Precedence: bulk
>> Auto-Submitted: auto-replied
>> Date: Wed, 27 Sep 2006 08:22:28 -0700
>> To: [EMAIL PROTECTED]
>> Subject: A message from Yahoo! Customer Care  (KMM37445094V70533L0KM)
>> From: Yahoo! Mail <[EMAIL PROTECTED]>
>> Reply-To: Yahoo! Mail <[EMAIL PROTECTED]>
>> MIME-Version: 1.0
>> Content-Type: text/plain; charset = "us-ascii"
>> Content-Transfer-Encoding: 7bit
>> X-Mailer: KANA Response 7.0.1.142
>> X-UID: 371043
>>
>> Thank you for contacting Yahoo! Customer Care to answer your question. A
>> support representative will get back to you within 48 hours regarding
>> your issue. Until then, feel free to visit our online help center at
>> http://help.yahoo.com/
>> for answers if you have not already done so.
>



Re: Received header unparseable

2006-09-27 Thread benthere-nine

A second attempt tests much better.  Added at line 747:

   # Received: from  ([10.0.0.6]) by myfirewalll; Thu,
   # 13 Mar 2003 06:26:21 -0500 (EST)
if (/^from \(\[(${IP_ADDRESS})\]\) by myfirewall/) {
  $mta_looked_up_dns = 1;
  $helo = $1; $ip = $1; $by = 'myfirewall'; goto enough;
}


benthere-nine wrote:
> 
> In a desperate newbie attempt to fix this problem myself, I added the
> following lines to Received.pm at line 895:
> 
># Received: from  ([10.0.0.6]) by myfirewalll; Thu,
># 13 Mar 2003 06:26:21 -0500 (EST)
> if (/^from \(\[(${IP_ADDRESS})\]\) by myfirewall/) {
>$ip = $1; $by = 'my.firewall.ip.addr'; goto enough;
> }
> 
> Ummm, it didn't work, but it didn't break anything.  How can I make this
> work?  Add a "$helo =" ?
> 
> Thanks.
> 
>  
> 
> benthere-nine wrote:
>> 
>> My firewall puts a received header on every e-mail it
>> forwards to SA 3.1.5:
>> 
>> Received: from f66108.upc-f.chello.nl ([80.56.66.108])
>> by myfirewall; Tue, 26 Sep 2006 12:35:52 -0500
>> (Central Daylight Time)
>> 
>> But when my firewall can't find a DNS entry for the
>> e-mail's last relay IP address, it just puts in a
>> blank space:
>> 
>> Received: from  ([201.19.179.63]) by myfirewall; Tue,
>> 26 Sep 2006 12:35:53 -0500 (Central Daylight Time)
>> 
>> 20_head_tests.cf hits on this as an UNPARSEABLE_RELAY.
>>  SA isn't able to look up that IP address on all the
>> network tests.
>> 
>> I'm e-mailing Tech Support for the company that
>> publishes the firewall software, but is there anything
>> that can be done on the SA side?
>> 
>> Thank you very much.
>> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Received-header-unparseable-tf2340368.html#a6539503
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn and "Caught" spams

2006-09-27 Thread Daniel Staal

--As of September 27, 2006 5:43:28 PM -0700, Kelson is alleged to have said:


Daniel T. Staal wrote:

True.  So...  Optimal is obviously to train, once and correctly, on all
messages.  Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.

So the exact balance is a complicated question.  ;)


I just train on everything.  If it's already learned from a message, it
takes a few resources for it to recognize that, but almost certainly less
time than it would have taken me to separate them out.


--As for the rest, it is mine.

Depends on the setup.  For instance, given the explanations above, I'll 
start a system to automatically learn from my 'checkspam' folder, but not 
my 'highspam' folder.  I have procmail automatically sort my spam by score, 
so I can pay extra attention to low-scoring spam.  (Which is more likely to 
be ham which was misplaced than the high-scoring spam.)


So, since I *already* have them separated out, I can avoid the 
double-check.  ;)


Anyway, I just knew that there was an automatic system, and at the very 
least there is *some* load to re-learning, even if a full analysis is 
skipped.  It would be interesting to see how much it actually is, compared 
to an easy filter.  If I find time, I may try to figure out a good test.


Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---



Re: an stupide config question

2006-09-27 Thread Matt Kettler
Philippe Couas wrote:
> Hi,
>  
> I have migrate from Spamassassin 2.63 to 3.15.1, that' seems running,
> somes mail are flaged and rpm -a seee new version.
> But previously rules and local.cf was in /etc/mail/spamassasin, and
> theses files are not modified by my rpm -Uvh.
The "Stock" rules should not have been in /etc/mail/spamassassin.

Only add-on rules and your local.cf belong under any part of /etc/. This
is true for ALL versions of SA (or at least going back to 2.2x).

The "stock" rules (50_scores.cf and friends) should be in
/usr/share/spamassassin, or possibly /usr/local/share/spamassassin.


>  
> I want know if config files are always in same directory ??
The should be. However, the best way to be sure is to check. With the
/usr/share/ ones, some packages use /usr/local/share.. be sure not to
end up with duplicates.

>  
> Regards
>  
> Philippe COUAS
> Responsable Développement
> INFODEV S.A.
>  



Re: sa-learn and "Caught" spams

2006-09-27 Thread Matt Kettler
Bill Horne wrote:
>
> I have a "follow on" question, so I'll add it to this thread:
>
> Assuming that it's a good idea to feed "Caught" spams through sa-learn
> in order to reinforce the tokens that might not have been autolearned,
> how do I tell SA to ignore the " SPAM " notice in the subject? I
> have ignore-header commands in local.cf for the "X-Spam-Status: Yes" and
> other spam headers, but how do I skip only a portion of the subject?

Provided it's a markup your SpamAssassin generated, SA will
automatically ignore it when learning.


Re: sa-learn and "Caught" spams

2006-09-27 Thread Kelson

Daniel T. Staal wrote:

True.  So...  Optimal is obviously to train, once and correctly, on all
messages.  Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.

So the exact balance is a complicated question.  ;)


I just train on everything.  If it's already learned from a message, it 
takes a few resources for it to recognize that, but almost certainly 
less time than it would have taken me to separate them out.


--
Kelson Vibber
SpeedGate Communications 


Re: sa-learn and "Caught" spams

2006-09-27 Thread Matt Kettler
Daniel T. Staal wrote:
>
> While I in general agree with this, I was under the impression that
> spamassassin will auto-learn from messages it marks.  (At least, past a
> certain threshold.)  
Actually, that's not entirely true. There's more than just a threshold.
Actually, the score you see isn't even the score compared against the
threshold.

Score computation generalities:
1) The score is computed as if bayes was disabled. This includes
changing the score set.
2) Any rule with the "noautolearn" tflag ie: white/blacklist
commands, is discarded.

>From there, the criteria to learn as spam using this "learning score" are:
1) score above threshold (default 12.0)
2) at least 3.0 points from header rules
3) at least 3.0 points from body rules
3) Existing bayes learning must not result in the message matching a
BAYES_* rule with a score less than -1.0
4) The bayes R/W lock must be available on the first try. ie: no
other autolearn, manual learn or expiry processes are running.

And note that because of 2 and 3, the score needs to be over 6.0,
regardless of what you have the threshold set to.
If any of the above aren't met, autolearning will not happen.

In general the autolearner tries very hard to be ABSOLUTELY POSITIVE a
message is spam before autolearning it.

So relying on autolearning to learn all or even most of your spam isn't
a very good idea. It's not going to learn all your spam. It just won't.
> In which case, feeding the spam messages to it again
> would bias the database towards spam, as the messages are being learned
> twice.
>   
Actually, As Jim pointed out, it will skip message-id's that are already
in the bayes DB.

Also, this skip isn't particularly slow, so you're not wasting a ton of
CPU by re-feeding messages that were already auto learned.

> So the question would have to be: Does Spamassassin automatically update
> the Bayes database from (some/any) messages it flags as spam or ham?
>   
Some, yes.
Most, no.
Score less than 6, never.





Re: Non-blocklisted embedded URLs are getting hits on URIBL_AB_SURBL and URIBL_PH_SURBL in SpamAssassin 3.1.5

2006-09-27 Thread Theo Van Dinter
On Wed, Sep 27, 2006 at 02:26:41PM -0700, Donald Craig wrote:
> I'm getting matches whenever I have an embedded URL
> on URIBL_AB_SURBL and URIBL_PH_SURBL -

You're not by chance using the opendns.{com,org} folks for DNS, are you?

-- 
Randomly Selected Tagline:
"You can tell that I got this out from the newspaper because it looks like I
 cut it out with a spatula." - Jim Duncan


pgpeRfl5P6q7N.pgp
Description: PGP signature


Non-blocklisted embedded URLs are getting hits on URIBL_AB_SURBL and URIBL_PH_SURBL in SpamAssassin 3.1.5

2006-09-27 Thread Donald Craig
I'm getting matches whenever I have an embedded URL
on URIBL_AB_SURBL and URIBL_PH_SURBL -
unless the URL is actually in URIBL_SBL, in which case the
logic for all the flavors of URIBL_XX_SURBL seems
to work correctly.  I have verified the
absence of the incorrectly matching URLs from SURBL
with lookups in http://www.rulesemporium.com/cgi-bin/uribl.cgi

This is SpamAssassin 3.1.5, all was fine in 3.1.2.

For now I have set both those tests to 0.00.

Don Craig







RE: duplicate emails

2006-09-27 Thread Steve Ingraham
Loren Wilton wrote:
>occa_phishing.cf
>occa_replica.cf

>I have no knowledge of these.

>From the rules you show these aren't particularly worthwhile (nor all
that 
>well written rules).  There are a number of SARE rules that cover this
area 
>much more thoroughly, and I believe these days even a number of
standard 
>rules in this area.  I'd dump these files.

>I forget how you said you have SA integrated.  If you are using
spamc/spamd 
>as the interface then you can just kill spamd and restart it.
Depending on 
>your system distribution the script to do that sometimes has various
names 
>and locations.

>Not every setup uses spamd though. I think Mailscanner integrates SA 
>directly, and in this case you have to bounce mailscanner.

>So what are the pieces of your mail system again?  And which OS distro?

>Someone will likely know what to knock over the head to restart SA on
that 
>configuration.

This machine is running RedHat AS 3 with qmail and spamassassin 3.0.4.
We are running spamd.  So you are saying that I should find where spamd
is running and restart it?  "Knocking over the head" is what either I
need or this crazy machine needs.

For all who read this today I am not sure I will be able to see posts or
not.  I will definitely not be able to see them until tomorrow morning.
Whenever I am able to see them again I will say that your posts are
valuable to me and I hope to be able to utilize your expertise so please
let me know what you think.

One last note about my outage today, as a last check I looked at our
internal email server that is running Microsoft Exchange 2000.  When I
examined that machine the C: drive which has a 6 GB C: drive partition
only had 116 MB free.  I have been attempting to free up space on it to
see if that may be causing my sporadic email delivery problems.

I am still attempting to figure out what is causing my problems so I
appreciate all advice.

Steve Ingraham


Re: sa-learn and "Caught" spams

2006-09-27 Thread Bill Horne
On Wed, 2006-09-27 at 06:37 +, Mike Woods wrote:
> Hi guys, bit of a query regarding sa-learn and messages that have 
> already been tagged as spam.
> 
> We have spamassassin scanning mail via amavisd and sending any caught 
> spams to a spam folder in the users accounts (using plus addressing), 
> we've also been getting users to drop any missed spams into this spam 
> folder so we can train spamassassin on them, at present I have a script 
> that moves *only* the missed spams to a master folder for sa-learn, my 
> question is simple, would there be any benefit in including the mails 
> identified as spam in this process, I know sa-learn looks for common 
> patterns in spams to identify them as spam but im unsure if adding known 
> spams in would be beneficial in this ?


I have a "follow on" question, so I'll add it to this thread:

Assuming that it's a good idea to feed "Caught" spams through sa-learn
in order to reinforce the tokens that might not have been autolearned,
how do I tell SA to ignore the " SPAM " notice in the subject? I
have ignore-header commands in local.cf for the "X-Spam-Status: Yes" and
other spam headers, but how do I skip only a portion of the subject?

TIA.

Bill


RE: Newbie Rule Question

2006-09-27 Thread John D. Hardin
On Wed, 27 Sep 2006, Shue, Daniel G. wrote:

> # Catch anything from 8:00 PM to 6:00 AM and score it
> header  RCVD_AT_NIGHT   Date =~ /..., .. ...  [0,2][0-5]:..:..*/
> score   RCVD_AT_NIGHT   0.001
> describeRCVD_AT_NIGHT   Email was received between 8:00PM and
> 6:00AM

If you want to score based on when the message was *received* I
suggest you match against the Received: header *your* mail relay adds.

The Date: header is subject to forgery, will be in an unpredictable
time zone, and is supposed to indicate when the message was *sent*.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  People seem to have this obsession with objects and tools as being
  dangerous in and of themselves, as though a weapon will act of its
  own accord to cause harm. A weapon is just a force multiplier. It's
  *humans* that are (or are not) dangerous.
---



RE: Stats of rules ?

2006-09-27 Thread Bowie Bailey
Chris wrote:
> On Tuesday 26 September 2006 2:50 pm, Bowie Bailey wrote:
> > Noc Phibee wrote:
> > > Hi
> > > 
> > > on my spamassassin server, i use a lot of rules ..
> > > personnal and downloaded.
> > > 
> > > Anyone know if they have a tools for know in 24h or 48h
> > > if a rules are used or not ?
> > 
> > If you just want to know if the rule is getting hits, you can do a
> > simple grep against your maillog file.
> > 
> > For more in-depth stats, try this script:
> > 
> > http://www.rulesemporium.com/programs/sa-stats.txt
> > 
> > Rename it to sa-stats.pl before you run it.
> 
> Your script is still running great over here, if he's looking for
> something different than what sa-stats.pl provies and if your script
> is for public consumption, you may want to suggest it to him. I've
> also got it running daily in a cronjob is he wants something like
> that. 

My script can be used as well, although it is more for add-on rules in
particular.  It does not give stats on any of the built-in rules.

I'm attaching an updated version.  I have fixed the rulename detection
so that it will pick up on the fuzzyocr rules now (it will list their
score as 0 since they don't have a score line associated).

-- 
Bowie



sa-addon-stats.pl
Description: Binary data


RE: Newbie Rule Question

2006-09-27 Thread Shue, Daniel G.
Ok guys, I figured it out... w/ Loren's help of course! :) Here's what I
came up with:

# Catch anything from 8:00 PM to 6:00 AM and score it
header  RCVD_AT_NIGHT   Date =~ /..., .. ...  [0,2][0-5]:..:..*/
score   RCVD_AT_NIGHT   0.001
describeRCVD_AT_NIGHT   Email was received between 8:00PM and
6:00AM

# Catch anything from 3:00PM to 5:00PM for TESTING
#header RCVD_DURING_DAY Date =~ /..., .. ...  1[5,6,7]:..:..*/
#score  RCVD_DURING_DAY -0.001
#describe   RCVD_DURING_DAY TESTING - Email was received between
3:00PM and 5:00PM

What I can't figure out though is why you have to do "Date =~" why
not "Date: ".  Is that a SA understood variable thing?  I don't know
what to do with the time zone? Any ideas on that? I'm going to test it
out tonight and see how it goes, let me know if your interested in the
out come.  I think it's a good idea myself, our people here will never
get ham after 8 PM.  And as you can see... my scores are really low
right now, I may move them up some but not too much.

-Original Message-
From: Brent Kennedy [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 27, 2006 3:39 PM
To: users@spamassassin.apache.org
Subject: RE: Newbie Rule Question
Importance: Low

Nice, I like that!  Most of our spam also comes in during the wee hours
of
the morning.. I think adding a half point or even a point would help
even
more.  Though, I have trained and continue to train both of my servers
and
they are pretty effective.

We get 3500 mails a day of which 70% are classified as spam and that
doesn't
count the email addresses I have receive blocked on(about another 1500
emails for them).

-Brent

-Original Message-
From: Loren Wilton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 27, 2006 3:04 PM
To: users@spamassassin.apache.org
Subject: Re: Newbie Rule Question

I need to check, "Date: Wed, 27 Sep 2006 14:17:17 -0400" and I've looked

Quite ignoring the arguments people will make against this (including
me)
you could do something like the following.  Of course remember the date
header is when the mail was made in whatever timezone it was made, not
in
YOUR timezone.  For that you would want to check the timestamp in the
received header that your system adds, not in the date header.

#Catch 1700 to 0700

headerA_BAD_TIMEDate =~ /\d\s(?:1[789]|2\d|0[01234567]):/
score  A_BAD_TIME0.2


Loren




RE: Newbie Rule Question

2006-09-27 Thread Brent Kennedy
Nice, I like that!  Most of our spam also comes in during the wee hours of
the morning.. I think adding a half point or even a point would help even
more.  Though, I have trained and continue to train both of my servers and
they are pretty effective.

We get 3500 mails a day of which 70% are classified as spam and that doesn't
count the email addresses I have receive blocked on(about another 1500
emails for them).

-Brent

-Original Message-
From: Loren Wilton [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 27, 2006 3:04 PM
To: users@spamassassin.apache.org
Subject: Re: Newbie Rule Question

I need to check, "Date: Wed, 27 Sep 2006 14:17:17 -0400" and I've looked

Quite ignoring the arguments people will make against this (including me)
you could do something like the following.  Of course remember the date
header is when the mail was made in whatever timezone it was made, not in
YOUR timezone.  For that you would want to check the timestamp in the
received header that your system adds, not in the date header.

#Catch 1700 to 0700

headerA_BAD_TIMEDate =~ /\d\s(?:1[789]|2\d|0[01234567]):/
score  A_BAD_TIME0.2


Loren




spamassassin 3.1.4

2006-09-27 Thread Richard
installed this today, removed bogofilter...
also installed spamc, notice one of the suggested installs
was libnet-ident-perl, is anyone using this, with spamassassin ?
or is this a sparate module by itself.

Regards -
Richard


Re: Q. about spam directed towards highest MX Record?

2006-09-27 Thread DAve

Rob McEwen wrote:

(CCing Marc Perkel because I seem to recall him knowing about this)

Not that I'd ever outright block based on this one factor alone, but...

Does anyone have any stats about what percentage of spam is directed towards
the highest MX Record? (that is, where there is more than one MX record?)



Our lowest priority MX is just a store and forward box left over from 
when backup MXs were useful. We only keep it around because a few 
(getting fewer) clients say the PC magazine pundits say you need one. So 
they pay.


We do all the normal user validation, greylisting, RBLs, same as our 
other servers but the spammers insist on using it.


Here are the stats for yesterday;

total messages   total viruses   total spam
---
120,242  1,681   106,102


Also, has anyone ever seen ANY legit mail go to the highest MX record when
no mail server failure occurred?



Just about any MS Exchange server. I have never had a valid message from 
qmail/Sendmail/Postfix/Exim go to that server. Always Exchange, and 
generally from a small business with a "shrink wrap admin" running the 
mail services.


DAve


--
Three years now I've asked Google why they don't have a
logo change for Memorial Day. Why do they choose to do logos
for other non-international holidays, but nothing for
Veterans?

Maybe they forgot who made that choice possible.


Re: Newbie Rule Question

2006-09-27 Thread Loren Wilton

I need to check, "Date: Wed, 27 Sep 2006 14:17:17 -0400" and I've looked

Quite ignoring the arguments people will make against this (including me) 
you could do something like the following.  Of course remember the date 
header is when the mail was made in whatever timezone it was made, not in 
YOUR timezone.  For that you would want to check the timestamp in the 
received header that your system adds, not in the date header.


#Catch 1700 to 0700

headerA_BAD_TIMEDate =~ /\d\s(?:1[789]|2\d|0[01234567]):/
score  A_BAD_TIME0.2


   Loren



Re: Newbie Rule Question

2006-09-27 Thread Peter Smith
> Hi folks,
>   I'm a newbie to SA and have looked at a few tutorials on writing
> custom rules, but they all seem to be too simple for what I want to do.
> That, or I'm not smart enough to figure it out on my own.  What I'm
> needing is some guidance on how to write a custom rule that looks at the
> creation time in the header, and if its between 8 PM to 7 AM score
> it appropriately.  I know that this might make some of you cringe at the
> thought of it, but I would say that 95% - 99% of all of ham is sent
> between 8 AM to 6 PM.  The only this that really comes through later
> than that may be some valid newsletters that really don't make a
> difference.  And besides, I'm not talking about scoring it at 999.000,
> just add maybe .8 or even 1.000.  I think it would be a handy little
> rule if I had the brains to figure it out.  Here is the header tag that
> I need to check, "Date: Wed, 27 Sep 2006 14:17:17 -0400" and I've looked
> at the date rules built into SA but I can't see how I could possibly
> create a rule with regex to do what I want. Well... maybe. let me
> think on it, if anyone has any ideas, please let me know!

How about creating two configuration files, each with a different threshold. 
Then use
cron to switch which file is used at 8am and 6pm.

Cheers,
Peter Smith


Newbie Rule Question

2006-09-27 Thread Shue, Daniel G.
Hi folks,
I'm a newbie to SA and have looked at a few tutorials on writing
custom rules, but they all seem to be too simple for what I want to do.
That, or I'm not smart enough to figure it out on my own.  What I'm
needing is some guidance on how to write a custom rule that looks at the
creation time in the header, and if its between 8 PM to 7 AM score
it appropriately.  I know that this might make some of you cringe at the
thought of it, but I would say that 95% - 99% of all of ham is sent
between 8 AM to 6 PM.  The only this that really comes through later
than that may be some valid newsletters that really don't make a
difference.  And besides, I'm not talking about scoring it at 999.000,
just add maybe .8 or even 1.000.  I think it would be a handy little
rule if I had the brains to figure it out.  Here is the header tag that
I need to check, "Date: Wed, 27 Sep 2006 14:17:17 -0400" and I've looked
at the date rules built into SA but I can't see how I could possibly
create a rule with regex to do what I want. Well... maybe. let me
think on it, if anyone has any ideas, please let me know!

Thanks a bunch!


This email and any files transmitted with it are confidential and intended for 
use only by the individual or entity named above.  If you are not the intended 
recipient or the employee or agent responsible for delivering this message to 
the intended recipient, you are hereby notified that any disclosure, 
dissemination, distribution, copying of this communication, or unauthorized use 
is strictly prohibited.  Please notify us immediately by reply email and then 
delete this message from your system.   Please note that any views or opinions 
presented in this email are solely those of the author and do not necessarily 
represent those of Randolph County Government.  This email and any file 
attachments have been scanned for potential viruses; however, the recipient 
should check this email for the presence of viruses and/or malicious code.  
Randolph County accepts no liability for any damage transmitted via this email.


Q. about spam directed towards highest MX Record?

2006-09-27 Thread Rob McEwen
(CCing Marc Perkel because I seem to recall him knowing about this)

Not that I'd ever outright block based on this one factor alone, but...

Does anyone have any stats about what percentage of spam is directed towards
the highest MX Record? (that is, where there is more than one MX record?)

Also, has anyone ever seen ANY legit mail go to the highest MX record when
no mail server failure occurred?

Thanks!

Rob McEwen
PowerView Systems
[EMAIL PROTECTED]
(478) 475-9032




Re: [qmailtoaster] duplicate emails

2006-09-27 Thread Loren Wilton

Hi,

have a look at rulesemporium.com
There are descriptions of the rules, and definitely you should use only 
one out pof each set

of similar named ones

Wolfgang Hamann


Be careful there.  It depends on what you mean by "similarly named".

It is perfectly valid to have


70_sare_html0.cf
70_sare_html1.cf
70_sare_html2.cf
70_sare_html3.cf
70_sare_html4.cf


on your system.  Each higher numbered file adds more 'dangerous' tests to 
the previous one.


It would NOT be valid to have only


70_sare_html1.cf
70_sare_html3.cf
70_sare_html4.cf


These depend on html0, so you would need that.  And it would not make much 
sense to have 3 and 4 without 2.


Also, the following would be wrong:


70_sare_html.cf
70_sare_html0.cf
70_sare_html1.cf
70_sare_html2.cf


The "html" file includes html0, html1, and I belive most all the others 
except maybe html4.  So if you had the above configuration you would have a 
whole lot of duplicate rules.


The other thing to look out for is versioned files:


70_sare_whitelist_pre30.cf
72_sare_bml_post23x.cf
99_sare_fraud_post25x.cf


This is legal, IF you are running 2.63.  However, assuming we had


70_sare_whitelist_pre30.cf
70_sare_whitelist_post30.cf


It would NOT be valid to have BOTH of those in your configuration.

   Loren



Re: sa-learn and "Caught" spams

2006-09-27 Thread Loren Wilton

Which means, for the orginal question, that re-learning the already caught
spams will have very little effect other than wasting some processor
cycles.  Doing what he is doing right now is probably best.


This is assuming that they were auto-learned.  Not all system are configured 
for auto-learning.  (Mine isn't.)  So in that case, if you don't manually 
learn, they don't get learned.


   Loren



Re: duplicate emails

2006-09-27 Thread Loren Wilton

   occa_phishing.cf
   occa_replica.cf



I have no knowledge of these.


From the rules you show these aren't particularly worthwhile (nor all that 
well written rules).  There are a number of SARE rules that cover this area 
much more thoroughly, and I believe these days even a number of standard 
rules in this area.  I'd dump these files.


I forget how you said you have SA integrated.  If you are using spamc/spamd 
as the interface then you can just kill spamd and restart it.  Depending on 
your system distribution the script to do that sometimes has various names 
and locations.


Not every setup uses spamd though. I think Mailscanner integrates SA 
directly, and in this case you have to bounce mailscanner.


So what are the pieces of your mail system again?  And which OS distro? 
Someone will likely know what to knock over the head to restart SA on that 
configuration.


   Loren



RE: sa-learn and "Caught" spams

2006-09-27 Thread Bowie Bailey
Mike Woods wrote:
> The internet is a great place for raising more questions than it
> answers :D 
> 
> Given all the opinions I think I will move the caught spam's into the
> learning cycle however i'm also going to make sure that each spam is
> only ever fed through the system once, this wont be a problem since I
> already make use of their checksums to avoid duplicating files and I
> had intended to use it to remove old spam anyway.

Why not simply turn off autolearning?  Then you can feed everything to
sa-learn and not worry about it.

-- 
Bowie


RE: sa-learn and "Caught" spams

2006-09-27 Thread Rosenbaum, Larry M.
> From: Mike Woods [mailto:[EMAIL PROTECTED]
> 
> The internet is a great place for raising more questions than it
answers
> :D
> 
> Given all the opinions I think I will move the caught spam's into the
> learning cycle however i'm also going to make sure that each spam is
> only ever fed through the system once, this wont be a problem since I
> already make use of their checksums to avoid duplicating files and I
had
> intended to use it to remove old spam anyway.

If you look at the X-Spam-Status header, it will tell you if the message
was already autolearned:

X-Spam-Status: Yes, score=19.5 required=5.0 tests=...
  (list of tests)...
autolearn=spam version=3.1.5



RE: Migrate dependencies problem

2006-09-27 Thread Nigel Frankcom
On Wed, 27 Sep 2006 12:50:37 -0400, Bowie Bailey
<[EMAIL PROTECTED]> wrote:

>Benny Pedersen wrote:
>> On Wed, September 27, 2006 16:26, Sietse van Zanen wrote:
>> > It's best to use cpan for this. It's very easy to use and will
>> > automagically resolve any dependencies.
>> 
>> just one problem with cpan is it will not solve rpm depndice
>> 
>> > Other way is find the modules on http://rpmfind.net/
>> > Specify your search as perl-net-dns etc.
>> 
>> package maintainer needs to make it better
>
>I attempted to install SA via rpm one time.  After fighting with Perl
>module dependencies for a couple of hours, I gave up and installed it
>with a single CPAN command.

I've found yum to be about the easiest. It seems CPAN has been
throwing all sorts of errors lately on CentOS. I've ended up
installing the Perl modules through yum as well. Thus far it's been a
painless task (cue all hell breaking loose on my next install) :-D

Nigel


RE: Bayes poisoning (was Re: your mail)

2006-09-27 Thread Bowie Bailey
Peter Smith wrote:
> > > The messages are simply a random stream of words, with punctuation
> > > scattered in them. No HTML, no URLs being advertised, no excessive
> > > capitalisation, just meaningless text.
> 
> I'm cautious about feeding these messages to sa-learn as spam, in
> case it has a negative impact on genuine messages. The punctuation is
> pretty good - full stops every dozen words or so, the odd comma. In
> fact, it's probably better punctuation than most of my users use:) At
> the moment I'm just black-listing host or netblocks which this junk
> is coming from. 

As long as you learn the messages as spam, they will have no negative
impact.  The only way these messages could cause problems is if they
get autolearned as ham instead of spam.

-- 
Bowie


RE: Migrate dependencies problem

2006-09-27 Thread Bowie Bailey
Benny Pedersen wrote:
> On Wed, September 27, 2006 16:26, Sietse van Zanen wrote:
> > It's best to use cpan for this. It's very easy to use and will
> > automagically resolve any dependencies.
> 
> just one problem with cpan is it will not solve rpm depndice
> 
> > Other way is find the modules on http://rpmfind.net/
> > Specify your search as perl-net-dns etc.
> 
> package maintainer needs to make it better

I attempted to install SA via rpm one time.  After fighting with Perl
module dependencies for a couple of hours, I gave up and installed it
with a single CPAN command.

-- 
Bowie


FORGED_YAHOO_RCVD?

2006-09-27 Thread Jim Davis
This autoresponse from Yahoo abuse crept over the spam line, mostly 
because of a hit on FORGED_YAHOO_RCVD... but it's not clear from the 
headers why that would be.  This is a from a Fedora Core 5 system running 
SpamAssassin 3.1.3 under amavisd-new 2.4.2:



Return-Path: <[EMAIL PROTECTED]>
Received: from xenopodid.cs.arizona.edu (xenopodid.cs.arizona.edu
[192.12.69.105])
by email.cs.arizona.edu (8.13.3/8.13.3) with ESMTP id
k8RFY9pl088354
for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:34:09
-0700 (MST)
(envelope-from [EMAIL PROTECTED])
Received: from localhost (xenopodid.cs.arizona.edu [127.0.0.1])
by xenopodid.cs.arizona.edu (Postfix) with ESMTP id 6FCBA6DCCEA
for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:34:09
-0700 (MST)
X-Virus-Scanned: amavisd-new at cs.arizona.edu
X-Spam-Flag: YES
X-Spam-Score: 5.019
X-Spam-Level: *
X-Spam-Status: Yes, score=5.019 tagged_above=- required=5
tests=[BAYES_40=-0.185, DNS_FROM_RFC_ABUSE=0.2,
DNS_FROM_RFC_POST=1.708, DNS_FROM_RFC_WHOIS=1.447,
FORGED_YAHOO_RCVD=1.849]
Received: from xenopodid.cs.arizona.edu ([127.0.0.1])
by localhost (xenopodid.cs.arizona.edu [127.0.0.1]) (amavisd-new,
port 10024)
with ESMTP id QAQkAwgDuuF1 for <[EMAIL PROTECTED]>;
Wed, 27 Sep 2006 08:34:03 -0700 (MST)
Received: from cheltenham.cs.arizona.edu (cheltenham.cs.arizona.edu
[192.12.69.60])
by xenopodid.cs.arizona.edu (Postfix) with ESMTP id 90AAC6DCCD1
for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:34:03
-0700 (MST)
Received: from mail-relay1.yahoo.com (mail-relay1.yahoo.com
[216.145.48.34])
by cheltenham.cs.arizona.edu (8.13.4/8.13.4) with ESMTP id
k8RFY0Gj014456
for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:34:03 -0700 (MST)
(envelope-from [EMAIL PROTECTED])
Received: from speedster.cc.kana.corp.yahoo.com
(speedster.cc.kana.corp.yahoo.com [207.126.228.28])
by mail-relay1.yahoo.com (8.13.6/8.13.6/mr1) with SMTP id
k8RFMTSi086721
for <[EMAIL PROTECTED]>; Wed, 27 Sep 2006 08:22:29 -0700 (PDT)
Message-Id: <[EMAIL PROTECTED]>
Precedence: bulk
Auto-Submitted: auto-replied
Date: Wed, 27 Sep 2006 08:22:28 -0700
To: [EMAIL PROTECTED]
Subject: A message from Yahoo! Customer Care  (KMM37445094V70533L0KM)
From: Yahoo! Mail <[EMAIL PROTECTED]>
Reply-To: Yahoo! Mail <[EMAIL PROTECTED]>
MIME-Version: 1.0
Content-Type: text/plain; charset = "us-ascii"
Content-Transfer-Encoding: 7bit
X-Mailer: KANA Response 7.0.1.142
X-UID: 371043

Thank you for contacting Yahoo! Customer Care to answer your question. A
support representative will get back to you within 48 hours regarding
your issue. Until then, feel free to visit our online help center at
http://help.yahoo.com/
for answers if you have not already done so.


RE: your mail

2006-09-27 Thread Bowie Bailey
John D. Hardin wrote:
> On Wed, 27 Sep 2006, Peter Smith wrote:
> 
> > The messages are simply a random stream of words, with punctuation
> > scattered in them. No HTML, no URLs being advertised, no excessive
> > capitalisation, just meaningless text.
> 
> Technically, then, it's not spam. Spam requires a commercial message
> of some sort. :)

That depends on whose definition you use.  I would say that any
unsolicited and unwanted email qualifies as spam.

> > As such, SA is finding very little to complain about, and is even
> > lowering the scoring because the bayes filtering deems it to be
> > good.
> 
> I'm torn about whether or not to train on such messages. I do hand
> training so I keep pretty tight control over what gets trained.

I use a very simple criteria for Bayes training.  If it's something I
want in the inbox, I train it as ham.  If it's something I don't want
in the inbox, I train it as spam.

Messages with random garbage in them are definitely in the second set.
:)

-- 
Bowie


RE: [qmailtoaster] duplicate emails

2006-09-27 Thread hamann . w
Hi,

have a look at rulesemporium.com
There are descriptions of the rules, and definitely you should use only one out 
pof each set
of similar named ones

Wolfgang Hamann

>> 70_sare_evilnum1.cf
>> 70_sare_evilnum2.cf
>> 70_sare_header0.cf
>> 70_sare_header.cf
>> 70_sare_header_eng.cf
>> 70_sare_html0.cf
>> 70_sare_html1.cf
>> 70_sare_html2.cf
>> 70_sare_html3.cf
>> 70_sare_html4.cf
>> 70_sare_html_eng.cf
>> 70_sare_oem.cf
>> 70_sare_random.cf
>> 70_sare_ratware.cf
>> 70_sare_specfic.cf
>> 70_sare_uri0.cf
>> 70_sare_uri.cf
>> 70_sare_whitlelist.cf
>> 70_sare_whitelist_pre30.cf
>> 72_sare_bml_post23x.cf
>> 99_sare_fraud_post25x.cf
>> antidrug.cf
>> blacklist.cf
>> blacklist-uri.cf
>> bogus-virus-warnings.cf
>> 
>> 



RE: Migrate dependencies problem

2006-09-27 Thread Benny Pedersen

On Wed, September 27, 2006 16:26, Sietse van Zanen wrote:
> It's best to use cpan for this. It's very easy to use and will automagically 
> resolve any
> dependencies.

just one problem with cpan is it will not solve rpm depndice

> Other way is find the modules on http://rpmfind.net/
> Specify your search as perl-net-dns etc.

package maintainer needs to make it better


-- 
"This message was sent using 100% recycled spam mails."



Re: sa-learn and "Caught" spams

2006-09-27 Thread Mike Woods

The internet is a great place for raising more questions than it answers :D

Given all the opinions I think I will move the caught spam's into the 
learning cycle however i'm also going to make sure that each spam is 
only ever fed through the system once, this wont be a problem since I 
already make use of their checksums to avoid duplicating files and I had 
intended to use it to remove old spam anyway.


Ta guys, much food for thought :D

--
Mike Woods
Systems Administrator


an stupide config question

2006-09-27 Thread Philippe Couas
Title: Message



Hi,
 
I have migrate from 
Spamassassin 2.63 to 3.15.1, that' seems running, somes mail are flaged and rpm 
-a seee new version.
But previously rules 
and local.cf was in /etc/mail/spamassasin, and theses files are not modified by 
my rpm -Uvh.
 
I want know if 
config files are always in same directory ??
 
Regards
 

Philippe COUAS
Responsable Développement
INFODEV S.A.
 


Re: sa-learn and "Caught" spams

2006-09-27 Thread Daniel T. Staal
On Wed, September 27, 2006 11:38 am, Nels Lindquist said:
> Daniel T. Staal wrote:
>
>> On Wed, September 27, 2006 11:10 am, Jim Maul said:
>>
>>> I believe that SA will not learn a message it has seen before so
>>> multiple sa-learn's will not have any affect.
>>
>> Actually, that was my impression too.
>>
>> Which means, for the orginal question, that re-learning the already
>> caught spams will have very little effect other than wasting some
>> processor cycles.  Doing what he is doing right now is probably best.
>
> Except that there's a significant difference between "already caught" and
> "already learned" spam.  The threshold for learning is much higher (and
> has specific requirements WRT point contributions of various types) so
> it's definitely possible to have, for example, a message that was
> correctly flagged as spam entirely due to network tests that was not
> auto-learned.  Training such messages then reinforces Bayes on the
> content side, so future messages that look similar but perhaps have a new
> URL that hasn't hit the blacklists yet can still be flagged.

True.  So...  Optimal is obviously to train, once and correctly, on all
messages.  Sending a message through that has been trained will consume
*some* resources, but less then one that still needs to be learned.

So the exact balance is a complicated question.  ;)

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---



Re: sa-learn and "Caught" spams

2006-09-27 Thread Nels Lindquist
Daniel T. Staal wrote:

> On Wed, September 27, 2006 11:10 am, Jim Maul said:
>
>> I believe that SA will not learn a message it has seen before so
>> multiple sa-learn's will not have any affect.
> 
> Actually, that was my impression too.
> 
> Which means, for the orginal question, that re-learning the already caught
> spams will have very little effect other than wasting some processor
> cycles.  Doing what he is doing right now is probably best.

Except that there's a significant difference between "already caught"
and "already learned" spam.  The threshold for learning is much higher
(and has specific requirements WRT point contributions of various types)
so it's definitely possible to have, for example, a message that was
correctly flagged as spam entirely due to network tests that was not
auto-learned.  Training such messages then reinforces Bayes on the
content side, so future messages that look similar but perhaps have a
new URL that hasn't hit the blacklists yet can still be flagged.


Nels Lindquist


Re: Received header unparseable

2006-09-27 Thread benthere-nine

In a desperate newbie attempt to fix this problem myself, I added the
following lines to Received.pm at line 895:

   # Received: from  ([10.0.0.6]) by myfirewalll; Thu,
   # 13 Mar 2003 06:26:21 -0500 (EST)
if (/^from \(\[(${IP_ADDRESS})\]\) by myfirewall/) {
   $ip = $1; $by = 'my.firewall.ip.addr'; goto enough;
}

Ummm, it didn't work, but it didn't break anything.  How can I make this
work?  Add a "$helo =" ?

Thanks.

 

benthere-nine wrote:
> 
> My firewall puts a received header on every e-mail it
> forwards to SA 3.1.5:
> 
> Received: from f66108.upc-f.chello.nl ([80.56.66.108])
> by myfirewall; Tue, 26 Sep 2006 12:35:52 -0500
> (Central Daylight Time)
> 
> But when my firewall can't find a DNS entry for the
> e-mail's last relay IP address, it just puts in a
> blank space:
> 
> Received: from  ([201.19.179.63]) by myfirewall; Tue,
> 26 Sep 2006 12:35:53 -0500 (Central Daylight Time)
> 
> 20_head_tests.cf hits on this as an UNPARSEABLE_RELAY.
>  SA isn't able to look up that IP address on all the
> network tests.
> 
> I'm e-mailing Tech Support for the company that
> publishes the firewall software, but is there anything
> that can be done on the SA side?
> 
> Thank you very much.
> 

-- 
View this message in context: 
http://www.nabble.com/Received-header-unparseable-tf2340368.html#a6529150
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: sa-learn and "Caught" spams

2006-09-27 Thread Daniel T. Staal
On Wed, September 27, 2006 11:10 am, Jim Maul said:
> Daniel T. Staal wrote:
>> On Wed, September 27, 2006 10:43 am, Matt Kettler said:
>>> Mike Woods wrote:
 Hi guys, bit of a query regarding sa-learn and messages that have
 already been tagged as spam.

 We have spamassassin scanning mail via amavisd and sending any caught
 spams to a spam folder in the users accounts (using plus addressing),
 we've also been getting users to drop any missed spams into this spam
 folder so we can train spamassassin on them, at present I have a
 script that moves *only* the missed spams to a master folder for
 sa-learn, my question is simple, would there be any benefit in
 including the mails identified as spam in this process, I know
 sa-learn looks for common patterns in spams to identify them as spam
 but im unsure if adding known spams in would be beneficial in this ?
>>> YES. There is DEFINITELY a benefit to learning messages tagged as spam.
>>> Even if they got BAYES_99.
>>>
>>> Why? because spam mutates over time, and even if a spam got bayes_99,
>>> it
>>> may still have new variants of "hot" words in it that will help it keep
>>> hitting the same kind of spam as it changes. If you wait till this kind
>>> of message mutates enough to no longer be bayes_99, you've put yourself
>>> behind the curve, and now you have to catch up to the new variant.
>>
>> While I in general agree with this, I was under the impression that
>> spamassassin will auto-learn from messages it marks.  (At least, past a
>> certain threshold.)  In which case, feeding the spam messages to it
>> again
>> would bias the database towards spam, as the messages are being learned
>> twice.
>
> I believe that SA will not learn a message it has seen before so
> multiple sa-learn's will not have any affect.

Actually, that was my impression too.

Which means, for the orginal question, that re-learning the already caught
spams will have very little effect other than wasting some processor
cycles.  Doing what he is doing right now is probably best.

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---



RE: duplicate emails

2006-09-27 Thread Steve Ingraham
>sa-blacklist.cf
>sa-blacklist.current.uri.cf

>Get rid of these!  They are evil and probably the root of your problem!

> (They are also long depreciated and very out of date, so wouldn't be
doing 
>much even if they didn't kill your system.)

I have removed those from /etc/mail/spamassassin.

>occa_phishing.cf
>occa_replica.cf

>I have no knowledge of these.

As far as I can tell these are rules files created by the previous
system manager.  I am not aware if they are functional or not.  The
occa_phishing.cf file was setup to stop phishing emails.Here is the
content of that file:

body OCCA_PHISH_COMFED_RULE   /Commercial Federal/
score OCCA_PHISH_COMFED_RULE 0.2
describe OCCA_PHISH_COMFED_RULE   This rule tries to eliminate
phishing using comfed

The occa_replica.cf file was set up to stop spam emails for replica
rolex watches.  Here is the content of that file:

body OCCA_ROLEX_RULE   /Rolex/
score OCCA_ROLEX_RULE 0.1
describe OCCA_ROLEX_RULE   This rule tries to eliminate Rolex
replica spam

body OCCA_REPLICA_RULE  /Replica/
score OCCA_REPLICA_RULE 0.1
describe OCCA_REPLICA_RULE  replica watches

meta OCCA_REPL_ROL_RULE (OCCA_ROLEX_RULE + OCCA_REPLICA_RULE > .1)
score OCCA_REPL_ROL_RULE 2

>random.cf
>random.current.cf

>I'm not sure what the 'current' one is, but I strongly suspect one of
these 
>is not necessary.

They look identical.  I removed the random.current.cf file.

>antidrug.cf

>You shoudn't be using this unless you are on 2.6x or earlier.  Since
3.0 
>antidrug has been part of the stock rules.

We are running 3.0.4.  I removed antidrug.cf

>blacklist.cf
>blacklist-uri.cf

>I'm not sure what these are, but they may be early versions of
sa->blacklist, 
>and probably a bad thing to have.

Removed.

>70_sare_whitelist_pre30.cf
>72_sare_bml_post23x.cf
>99_sare_fraud_post25x.cf

>And which version of SA are you running?  This tells me you are on
2.6x, 
>since you have to be after 2.5x and before 3.0.  Is that really true?
If 
>so, upgrade to a current version of SA and drop the whitelist_pre30 and

>replace it with whitelist, and possibly pick other versions of the
other >two 
>files.

The information on the Spamassassin version shows that we are running
version 3.0.4.  I suspect there were several rules sets that were from
previous versions of SA.  I am not sure how long the previous
administrator was using SA.  I received no information concerning SA
when I came on board so I am trying to understand how and why all of our
systems are set up the way they are.  This information about what should
or should not be used is very helpful.

I also have another question concerning the spamassassin control.  I
understand that I should be able to restart spamassassin by using:

/etc/init.d/spamassassin restart

However, there are no files in /etc/init.d/ for spamassassin so I get an
error message stating:

Bash: etc/init.d/spamassassin: no such file or directory.

The only way I have been able to restart spamassassin is to restart the
server.  If spamassassin is not in /etc/init.d where would it be and how
can I find it?

Thank you,
Steve Ingraham



Re: sa-learn and "Caught" spams

2006-09-27 Thread Jim Maul

Daniel T. Staal wrote:

On Wed, September 27, 2006 10:43 am, Matt Kettler said:

Mike Woods wrote:

Hi guys, bit of a query regarding sa-learn and messages that have
already been tagged as spam.

We have spamassassin scanning mail via amavisd and sending any caught
spams to a spam folder in the users accounts (using plus addressing),
we've also been getting users to drop any missed spams into this spam
folder so we can train spamassassin on them, at present I have a
script that moves *only* the missed spams to a master folder for
sa-learn, my question is simple, would there be any benefit in
including the mails identified as spam in this process, I know
sa-learn looks for common patterns in spams to identify them as spam
but im unsure if adding known spams in would be beneficial in this ?

YES. There is DEFINITELY a benefit to learning messages tagged as spam.
Even if they got BAYES_99.

Why? because spam mutates over time, and even if a spam got bayes_99, it
may still have new variants of "hot" words in it that will help it keep
hitting the same kind of spam as it changes. If you wait till this kind
of message mutates enough to no longer be bayes_99, you've put yourself
behind the curve, and now you have to catch up to the new variant.


While I in general agree with this, I was under the impression that
spamassassin will auto-learn from messages it marks.  (At least, past a
certain threshold.)  In which case, feeding the spam messages to it again
would bias the database towards spam, as the messages are being learned
twice.



I believe that SA will not learn a message it has seen before so 
multiple sa-learn's will not have any affect.



So the question would have to be: Does Spamassassin automatically update
the Bayes database from (some/any) messages it flags as spam or ham?



I would think only if you try to reverse/forget the original learning.



Daniel T. Staal


-Jim


Re: sa-learn and "Caught" spams

2006-09-27 Thread Daniel T. Staal
On Wed, September 27, 2006 10:43 am, Matt Kettler said:
> Mike Woods wrote:
>> Hi guys, bit of a query regarding sa-learn and messages that have
>> already been tagged as spam.
>>
>> We have spamassassin scanning mail via amavisd and sending any caught
>> spams to a spam folder in the users accounts (using plus addressing),
>> we've also been getting users to drop any missed spams into this spam
>> folder so we can train spamassassin on them, at present I have a
>> script that moves *only* the missed spams to a master folder for
>> sa-learn, my question is simple, would there be any benefit in
>> including the mails identified as spam in this process, I know
>> sa-learn looks for common patterns in spams to identify them as spam
>> but im unsure if adding known spams in would be beneficial in this ?
>
> YES. There is DEFINITELY a benefit to learning messages tagged as spam.
> Even if they got BAYES_99.
>
> Why? because spam mutates over time, and even if a spam got bayes_99, it
> may still have new variants of "hot" words in it that will help it keep
> hitting the same kind of spam as it changes. If you wait till this kind
> of message mutates enough to no longer be bayes_99, you've put yourself
> behind the curve, and now you have to catch up to the new variant.

While I in general agree with this, I was under the impression that
spamassassin will auto-learn from messages it marks.  (At least, past a
certain threshold.)  In which case, feeding the spam messages to it again
would bias the database towards spam, as the messages are being learned
twice.

So the question would have to be: Does Spamassassin automatically update
the Bayes database from (some/any) messages it flags as spam or ham?

Daniel T. Staal

---
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---



Bayes poisoning (was Re:)

2006-09-27 Thread Peter Smith
> Are you runing net tests?  It sounds like someone has a broken zombie net
> that is supposed to be sending out gif spams, but they forgot the images.
> Net tests would probably catch these easily.

Well I'm using the following:

score DCC_CHECK 1.0
score PYZOR_CHECK 1.0
score RAZOR_CHECK 1.0
score RCVD_IN_BL_SPAMCOP_NET 3.0
score X_CHINESE_RELAY 1.5
score X_KOREAN_RELAY 1.5
score X_SPAMHAUS 1.5

But none of these are being triggered by the offending messages. Are there 
other net
checks I should be using?

Thanks,
Peter Smith


Bayes poisoning (was Re: your mail)

2006-09-27 Thread Peter Smith

>> The messages are simply a random stream of words, with punctuation
>> scattered in them. No HTML, no URLs being advertised, no excessive
>> capitalisation, just meaningless text.
>
> Technically, then, it's not spam. Spam requires a commercial message
> of some sort. :)

Yeah, I think I said 'junk' rather than spam. I wonder if such mail has a name?

> I would agree that it's an attempt to poison your bayes database,
> assuming that you have autolearn turned on, either by skewing the
> scores towards ham or by bloating the database.

Do you think the perpetrators are poisoning the bayes db with a view to sending 
spam at
a later date? We aren't a big organisation - few hundred mail boxes - so it 
seems rather
long lengths for a spammer to go to. Another suggestion was that the spammer had
intended to attach an image, which hadn't got through. Given the technical 
competence of
many spammers, it seems more likely they screwed up and forgot to attach the 
image. But
I'm just guessing here.

>> Any thoughts on what I can do about these messages? Even with
>> bayes turned off, they would still fail to score more than say 2
>> or 3. Each message contains a different paragraph of random text,
>> so it's not possible to pick out keywords; and the messages are
>> coming from dialup machines, so blocking IP isn't going to be very
>> effective.
>
> Look for punctuation? A good deal of the random bayes poison at one
> time was totally without punctuation.

I'm cautious about feeding these messages to sa-learn as spam, in case it has a 
negative
impact on genuine messages. The punctuation is pretty good - full stops every 
dozen
words or so, the odd comma. In fact, it's probably better punctuation than most 
of my
users use:) At the moment I'm just black-listing host or netblocks which this 
junk is
coming from.

Apologies for not setting a subject in my original mail by the way

Peter Smith



Re: sa-learn and "Caught" spams

2006-09-27 Thread Matt Kettler
Mike Woods wrote:
> Hi guys, bit of a query regarding sa-learn and messages that have
> already been tagged as spam.
>
> We have spamassassin scanning mail via amavisd and sending any caught
> spams to a spam folder in the users accounts (using plus addressing),
> we've also been getting users to drop any missed spams into this spam
> folder so we can train spamassassin on them, at present I have a
> script that moves *only* the missed spams to a master folder for
> sa-learn, my question is simple, would there be any benefit in
> including the mails identified as spam in this process, I know
> sa-learn looks for common patterns in spams to identify them as spam
> but im unsure if adding known spams in would be beneficial in this ?

YES. There is DEFINITELY a benefit to learning messages tagged as spam.
Even if they got BAYES_99.

Why? because spam mutates over time, and even if a spam got bayes_99, it
may still have new variants of "hot" words in it that will help it keep
hitting the same kind of spam as it changes. If you wait till this kind
of message mutates enough to no longer be bayes_99, you've put yourself
behind the curve, and now you have to catch up to the new variant.

In general: DO NOT intentionally try to bias the training of your bayes
database for any reason. That's just self-inflicted bayes poison. If
it's spam, train it as spam. Do not hold back because of "ham-like"
content. Do not hold back because it was already tagged. If it's spam,
train it as such. The same goes for nonspam training. Don't hold back
training any emails that you don't want to be tagged, even if they
contain "spam words".

SpamAssassin's bayes system will handle the gray cases just fine. It
does particularly well at this because of the chi-squared combining, as
compared to the results of simple averaging.


Re: your mail

2006-09-27 Thread John D. Hardin
On Wed, 27 Sep 2006, Peter Smith wrote:

> The messages are simply a random stream of words, with punctuation
> scattered in them. No HTML, no URLs being advertised, no excessive
> capitalisation, just meaningless text.

Technically, then, it's not spam. Spam requires a commercial message
of some sort. :)

> As such, SA is finding very little to complain about, and is even
> lowering the scoring because the bayes filtering deems it to be
> good.

I'm torn about whether or not to train on such messages. I do hand
training so I keep pretty tight control over what gets trained.

I would agree that it's an attempt to poison your bayes database,
assuming that you have autolearn turned on, either by skewing the
scores towards ham or by bloating the database.

> Any thoughts on what I can do about these messages? Even with
> bayes turned off, they would still fail to score more than say 2
> or 3. Each message contains a different paragraph of random text,
> so it's not possible to pick out keywords; and the messages are
> coming from dialup machines, so blocking IP isn't going to be very
> effective.

Look for punctuation? A good deal of the random bayes poison at one
time was totally without punctuation.

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...every time I sit down in front of a Windows machine I feel as
  if the computer is just a place for the manufacturers to put their
  advertising.-- fwadling on Y! SCOX
--



RE: Migrate dependencies problem

2006-09-27 Thread Sietse van Zanen
Title: Message



It's best to use cpan for this. It's very easy to use and will automagically resolve any dependencies.
 
Other way is find the modules on http://rpmfind.net/
Specify your search as perl-net-dns etc.
 
-Sietse
 


From: Philippe CouasSent: Wed 27-Sep-06 16:15To: users@spamassassin.apache.orgSubject: Migrate dependencies problem

Hi, 
 
I want Migrate from SpamAssasin 2.63 to 3.15.1 on my MailServer on Redhat9
 
1 i use perl 5.8.0
2 i have stoped spamd
3 run "sa-relearn --rebuild"
4 rpm -Uvh spamassassin-3.1.5-1.rh9.rf.i386.rpm  
  warning: spamassassin-3.1.5-1.rh9.rf.i386.rpm: V3 DSA signature: NOKEY, key ID 6b8d79e6 error: Failed dependencies:     perl(Digest::SHA1) is needed by spamassassin-3.1.5-1.rh9.rf     perl(Net::DNS) is needed by spamassassin-3.1.5-1.rh9.rf     perl(Time::HiRes) is needed by spamassassin-3.1.5-1.rh9.rf  
 
    Where could i found theses perls optional packages, and how install them ?
 
Regards 
Philippe
 
 

Philippe COUAS
Responsable Développement
INFODEV S.A.
 


Re: [qmailtoaster] duplicate emails

2006-09-27 Thread Loren Wilton

   sa-blacklist.cf
   sa-blacklist.current.uri.cf

Get rid of these!  They are evil and probably the root of your problem! 
(They are also long depreciated and very out of date, so wouldn't be doing 
much even if they didn't kill your system.)


   occa_phishing.cf
   occa_replica.cf

I have no knowledge of these.

   random.cf
   random.current.cf

I'm not sure what the 'current' one is, but I strongly suspect one of these 
is not necessary.


   antidrug.cf

You shoudn't be using this unless you are on 2.6x or earlier.  Since 3.0 
antidrug has been part of the stock rules.


   blacklist.cf
   blacklist-uri.cf

I'm not sure what these are, but they may be early versions of sa-blacklist, 
and probably a bad thing to have.


   70_sare_whitelist_pre30.cf
   72_sare_bml_post23x.cf
   99_sare_fraud_post25x.cf

And which version of SA are you running?  This tells me you are on 2.6x, 
since you have to be after 2.5x and before 3.0.  Is that really true?  If 
so, upgrade to a current version of SA and drop the whitelist_pre30 and 
replace it with whitelist, and possibly pick other versions of the other two 
files.


   Loren




- Original Message - 
From: "Steve Ingraham" <[EMAIL PROTECTED]>
To: ; ; 


Sent: Wednesday, September 27, 2006 6:47 AM
Subject: RE: [qmailtoaster] duplicate emails


Jdow wrote:

Steve, it might help if you listed which rule sets. There are some
which are obscenely large and others that are obsolete. Maybe we
can prune the list for you a little.


As some have mentioned I may have too many rules.  I would like to know
what is a must have and what I should not use.  Here is a list of what
is currently in the /etc/mail/spamassassin/ folder:

CURRENTLY IN /ETC/MAIL/SPAMASSASSIN
70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_evilnum0.cf
occa_phishing.cf
occa_replica.cf
sa-blacklist.cf
sa-blacklist.current.uri.cf
tripwire.cf
chickenpox.cf
init.pre
random.cf
random.current.cf
weeds2.cf
local.cf

The rules below were moved yesterday and placed in a different folder.
Once I moved these and restarted spamassassin by rebooting the server it
was no longer bogging down and duplicating emails.

REMOVED YESTERDAY FROM /ETC/MAIL/SPAMASSASSIN
70_sare_evilnum1.cf
70_sare_evilnum2.cf
70_sare_header0.cf
70_sare_header.cf
70_sare_header_eng.cf
70_sare_html0.cf
70_sare_html1.cf
70_sare_html2.cf
70_sare_html3.cf
70_sare_html4.cf
70_sare_html_eng.cf
70_sare_oem.cf
70_sare_random.cf
70_sare_ratware.cf
70_sare_specfic.cf
70_sare_uri0.cf
70_sare_uri.cf
70_sare_whitlelist.cf
70_sare_whitelist_pre30.cf
72_sare_bml_post23x.cf
99_sare_fraud_post25x.cf
antidrug.cf
blacklist.cf
blacklist-uri.cf
bogus-virus-warnings.cf


Here is the content of my config file for rules_du_jour:

TRUSTED_RULESETS="TRIPWIRE ANTIDRUG SARE_EVILNUMBERS0 BLACKLIST
BLACKLIST_URI RANDOMVAL BOGUSVIRUS SARE_ADULT SARE_FRAUD
SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM SARE_HEADER SARE_HEADER0
SARE_HEADER_ENG SARE_HTML0 SARE_HTML1 SARE_HTML2 SARE_HTML3 SARE_HTML4
SARE_HTML_ENG SARE_RATWARE SARE_SPECIFIC SARE_URI SARE_BML_POST25X
SARE_WHITELIST SARE_WHITELIST_PRE30"
SA_DIR="/etc/mail/spamassassin"
MAIL_ADDRESS="[EMAIL PROTECTED]"
SA_RESTART="killall -HUP spamd"

I have quite a few users who get a lot of spam, especially
pharmaceuticals and stocks delivered to their mailboxes.  They are why I
began trying to work on the spamassassin filtering.  An interesting note
I have observed but do not understand why it is happening.  When I
updated the rules on Monday, many users started seeing an increase
number of spam in their mailboxes.  One user who was getting a great
deal of duplicate emails was also seeing a huge increase in the total
numbers of spam emails.  Where she would receive 100 spam emails per day
before the rules_du_jour update, afterwards she was seeing 800 or 900
spam emails per day.  Much of it was porn spam that she was not seeing
before the update to the rules files.

I would appreciate any advice and/or education offered on the spam
filtering.

Thanks,
Steve Ingraham


{^_^}
- Original Message - 
From: "Steve Ingraham" <[EMAIL PROTECTED]>



Steve Ingraham wrote:

I need help with a problem.  Our users are seeing some multiple
duplicate emails coming from the same sender.  This is not occurring
with every email so there does not seem to be any pattern to which
incoming emails will be duplicated and which ones won't.  They are also
reporting that duplicate emails are sent when they send to an outside
email.  Has anyone experienced this problem before?  What could be
causing this to occur and what can I do to stop this?  I am running
qmailtoaster and spamassassin as an external email gateway.  There has
been nothing changed with qmail but I did update some rules in SA using
rules_du_jour yesterday.  Would these rules updates cause this problem?
If so, what would have changed?



Jake Vickers wrote:

If your system is low on resources (ie: RAM), then the spamd process can
take too long, making Toaster think the ma

Migrate dependencies problem

2006-09-27 Thread Philippe Couas
Title: Message



Hi, 

 
I want Migrate from 
SpamAssasin 2.63 to 3.15.1 on my MailServer on Redhat9
 
1 i use perl 
5.8.0
2 i have stoped 
spamd
3 run "sa-relearn 
--rebuild"
4 rpm -Uvh 
spamassassin-3.1.5-1.rh9.rf.i386.rpm  

  
warning: spamassassin-3.1.5-1.rh9.rf.i386.rpm: V3 DSA signature: NOKEY, key 
ID 
6b8d79e6 
error: Failed 
dependencies: 
    perl(Digest::SHA1) is needed by 
spamassassin-3.1.5-1.rh9.rf 
    perl(Net::DNS) is needed by 
spamassassin-3.1.5-1.rh9.rf 
    perl(Time::HiRes) is needed by 
spamassassin-3.1.5-1.rh9.rf  

 
    
Where could i found theses perls optional packages, and how install them 
?
 
Regards 

Philippe
 
 

Philippe COUAS
Responsable Développement
INFODEV S.A.
 


Re: sa-learn and "Caught" spams

2006-09-27 Thread Matthias Haegele

Mike Woods schrieb:
Hi guys, bit of a query regarding sa-learn and messages that have 
already been tagged as spam.


We have spamassassin scanning mail via amavisd and sending any caught 
spams to a spam folder in the users accounts (using plus addressing), 
we've also been getting users to drop any missed spams into this spam 
folder so we can train spamassassin on them, at present I have a script 
that moves *only* the missed spams to a master folder for sa-learn, my 
question is simple, would there be any benefit in including the mails 
identified as spam in this process, I know sa-learn looks for common 
patterns in spams to identify them as spam but im unsure if adding known 
spams in would be beneficial in this ?


If i understand you right:
i relearn the messages that get *only* e.g.: BAYES_00 BAYES_50 BAYES_80 
 etc., it seems to help.
At a closer look there is only a percentage of these "retrained" 
messages added to bayes-database ...



Ta for any help!

---
Mike Woods
Systems Administrator


hth
MH



RE: [qmailtoaster] duplicate emails

2006-09-27 Thread Steve Ingraham
Jdow wrote:
>Steve, it might help if you listed which rule sets. There are some
>which are obscenely large and others that are obsolete. Maybe we
>can prune the list for you a little.

As some have mentioned I may have too many rules.  I would like to know
what is a must have and what I should not use.  Here is a list of what
is currently in the /etc/mail/spamassassin/ folder:

CURRENTLY IN /ETC/MAIL/SPAMASSASSIN
70_sare_adult.cf
70_sare_bayes_poison_nxm.cf
70_sare_evilnum0.cf
occa_phishing.cf
occa_replica.cf
sa-blacklist.cf
sa-blacklist.current.uri.cf
tripwire.cf
chickenpox.cf
init.pre
random.cf
random.current.cf
weeds2.cf
local.cf

The rules below were moved yesterday and placed in a different folder.
Once I moved these and restarted spamassassin by rebooting the server it
was no longer bogging down and duplicating emails.

REMOVED YESTERDAY FROM /ETC/MAIL/SPAMASSASSIN
70_sare_evilnum1.cf
70_sare_evilnum2.cf
70_sare_header0.cf
70_sare_header.cf
70_sare_header_eng.cf
70_sare_html0.cf
70_sare_html1.cf
70_sare_html2.cf
70_sare_html3.cf
70_sare_html4.cf
70_sare_html_eng.cf
70_sare_oem.cf
70_sare_random.cf
70_sare_ratware.cf
70_sare_specfic.cf
70_sare_uri0.cf
70_sare_uri.cf
70_sare_whitlelist.cf
70_sare_whitelist_pre30.cf
72_sare_bml_post23x.cf
99_sare_fraud_post25x.cf
antidrug.cf
blacklist.cf
blacklist-uri.cf
bogus-virus-warnings.cf


Here is the content of my config file for rules_du_jour:

TRUSTED_RULESETS="TRIPWIRE ANTIDRUG SARE_EVILNUMBERS0 BLACKLIST
BLACKLIST_URI RANDOMVAL BOGUSVIRUS SARE_ADULT SARE_FRAUD
SARE_BAYES_POISON_NXM SARE_OEM SARE_RANDOM SARE_HEADER SARE_HEADER0
SARE_HEADER_ENG SARE_HTML0 SARE_HTML1 SARE_HTML2 SARE_HTML3 SARE_HTML4
SARE_HTML_ENG SARE_RATWARE SARE_SPECIFIC SARE_URI SARE_BML_POST25X
SARE_WHITELIST SARE_WHITELIST_PRE30"
SA_DIR="/etc/mail/spamassassin"
MAIL_ADDRESS="[EMAIL PROTECTED]"
SA_RESTART="killall -HUP spamd"

I have quite a few users who get a lot of spam, especially
pharmaceuticals and stocks delivered to their mailboxes.  They are why I
began trying to work on the spamassassin filtering.  An interesting note
I have observed but do not understand why it is happening.  When I
updated the rules on Monday, many users started seeing an increase
number of spam in their mailboxes.  One user who was getting a great
deal of duplicate emails was also seeing a huge increase in the total
numbers of spam emails.  Where she would receive 100 spam emails per day
before the rules_du_jour update, afterwards she was seeing 800 or 900
spam emails per day.  Much of it was porn spam that she was not seeing
before the update to the rules files.

I would appreciate any advice and/or education offered on the spam
filtering.

Thanks,
Steve Ingraham


{^_^}
- Original Message - 
From: "Steve Ingraham" <[EMAIL PROTECTED]>


Steve Ingraham wrote: 

I need help with a problem.  Our users are seeing some multiple
duplicate emails coming from the same sender.  This is not occurring
with every email so there does not seem to be any pattern to which
incoming emails will be duplicated and which ones won't.  They are also
reporting that duplicate emails are sent when they send to an outside
email.  Has anyone experienced this problem before?  What could be
causing this to occur and what can I do to stop this?  I am running
qmailtoaster and spamassassin as an external email gateway.  There has
been nothing changed with qmail but I did update some rules in SA using
rules_du_jour yesterday.  Would these rules updates cause this problem?
If so, what would have changed?

 

Jake Vickers wrote:

If your system is low on resources (ie: RAM), then the spamd process can
take too long, making Toaster think the mail got lost somewhere, so it
resends it.
Might want to check and see how much RAM you're using.

 

I want to thank everyone who posted a reply on my inquiry.  I believe
Jake Vickers was right about the problem.  The RAM on the email server
was bogged down since yesterday when I updated the various .cf files
using rules_du_jour.  I had included just a handful of rules from RDJ
but it appears that RDJ utilizes much too much of my server resources to
use it to update my spamassassin rules.  It was slowing down the server
so much that simple functions were not responding.  This appears to have
affected the delivery of emails.  In fact I noticed that my original
message to these mail lists took several hours to post and were
duplicated also.  I resolved the problem by moving the various rules .cf
files out of the /etc/mail/spamassassin folder and restarting
spamassassin.

 

If anyone has a simple way of updating rules for spamassassin I would
welcome your input.  I still need to update the rules as I have been
getting a great number of emails coming through to users.  Specifically,
we are getting a lot of the pharmaceutical spam and the stock spam.

 

Again, thanks to everyone for the posts.

Steve Ingraham




sa-learn and "Caught" spams

2006-09-27 Thread Mike Woods
Hi guys, bit of a query regarding sa-learn and messages that have 
already been tagged as spam.


We have spamassassin scanning mail via amavisd and sending any caught 
spams to a spam folder in the users accounts (using plus addressing), 
we've also been getting users to drop any missed spams into this spam 
folder so we can train spamassassin on them, at present I have a script 
that moves *only* the missed spams to a master folder for sa-learn, my 
question is simple, would there be any benefit in including the mails 
identified as spam in this process, I know sa-learn looks for common 
patterns in spams to identify them as spam but im unsure if adding known 
spams in would be beneficial in this ?


Ta for any help!

---
Mike Woods
Systems Administrator


Re: performance question

2006-09-27 Thread Patrick Ben Koetter
* [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Hi,
> 
> As we have seen the amount of incoming mail increase by 25% in the last 
> few months, our customer is willing to invest in an extra mail relay.
> I was thinking about a system with Sun's T1 chipset, (like the sunfire 
> T1000), I'm thinking the threaded nature of this chipset would work well 
> with the type of computing going on on a typical mailrelay (lots of 
> processes all doing relatively short bursts of cpu) ? Any ideas ? 

Good CPU, Large RAM for ramdisk (temp files during message testing) and fast
disks.

[EMAIL PROTECTED]




> 
> Tom.
> 
> 
> 
> 
> 
> Martin Hepworth <[EMAIL PROTECTED]>
> 27/09/2006 11:33
>  
> To: [EMAIL PROTECTED]
> cc: users@spamassassin.apache.org
> Subject:Re: performance question
> 
> 
> [EMAIL PROTECTED] wrote:
> > Hi,
> > 
> > I would like your opinion if our  mailrelay is properly tuned:
> > 
> > I have a mailrelay (sendmail / mimedefang / spamassassin with fuzzyocr, 
> > razor and dcc) running on a Sun V20Z with 6 GB Ram and  2 AMD 1.8Ghz 
> cpu's 
> > on Solaris 10.
> > it currently handles 95000 mails per day (most of it spam ofcourse). 
> Load 
> > is currently constantly around 6 - 7, average scan time of a mail is 
> about 
> > 7 seconds. %io according to top is almost 0.
> > 
> > mimedefang spool is running on a ramdisk, and all the software is at the 
> 
> > most recent version
> > 
> > Does this seem like a normal load ? I had hoped that our upgrade from 
> > solaris 9 to solaris 10 would have drastically improved the number of 
> > emails the system could process (or lower the load for the same amout of 
> 
> > emails), but I don't notice a major improvement.
> > 
> > I'd appreciate your input on this matter.
> > 
> > regards,
> > tom.
> > 
> > 
> Don't confuse the load average figure with 'overloaded' systems.
> 
> the load figure just means X processes are waiting for some resource 
> (CPU or disk or network or).
> 
> Depending on your setup you may find it useful to use milter-ahead (or a 
> free equivalent) to drop email to unknown users before you send it off 
> to spamassassin etc for further processing. I drop over 2/3rds of my 
> traffic that way.
> 
> Given it's only 7 seconds to scan the email I'd says your system is more 
> than handling the traffic being processed.
> 
> -- 
> Martin Hepworth
> Senior Systems Administrator
> Solid State Logic
> Tel: +44 (0)1865 842300
> 
> **
> 
> This email and any files transmitted with it are confidential and
> intended solely for the use of the individual or entity to whom they
> are addressed. If you have received this email in error please notify
> the system manager.
> 
> This footnote confirms that this email message has been swept
> for the presence of computer viruses and is believed to be clean.  
> 
> **
> 
> 
> 

-- 
state of mind
Agentur für Kommunikation und Design

Patrick KoetterTel: 089 45227227
Echinger Strasse 3 Fax: 089 45227226
85386 Eching   Web: http://www.state-of-mind.de


Re: performance question

2006-09-27 Thread Ralf Hildebrandt
* [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Hi,
> 
> As we have seen the amount of incoming mail increase by 25% in the last 
> few months, our customer is willing to invest in an extra mail relay.
> I was thinking about a system with Sun's T1 chipset, (like the sunfire 
> T1000), I'm thinking the threaded nature of this chipset would work well 
> with the type of computing going on on a typical mailrelay (lots of 
> processes all doing relatively short bursts of cpu) ? Any ideas ? 

Mailrelays are I/O bound, not CPU bound.
At least mine is.

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED]
Charite - Universitätsmedizin BerlinTel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-BerlinFax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]


Re: performance question

2006-09-27 Thread tomvo
Hi,

As we have seen the amount of incoming mail increase by 25% in the last 
few months, our customer is willing to invest in an extra mail relay.
I was thinking about a system with Sun's T1 chipset, (like the sunfire 
T1000), I'm thinking the threaded nature of this chipset would work well 
with the type of computing going on on a typical mailrelay (lots of 
processes all doing relatively short bursts of cpu) ? Any ideas ? 

Tom.





Martin Hepworth <[EMAIL PROTECTED]>
27/09/2006 11:33
 
To: [EMAIL PROTECTED]
cc: users@spamassassin.apache.org
Subject:Re: performance question


[EMAIL PROTECTED] wrote:
> Hi,
> 
> I would like your opinion if our  mailrelay is properly tuned:
> 
> I have a mailrelay (sendmail / mimedefang / spamassassin with fuzzyocr, 
> razor and dcc) running on a Sun V20Z with 6 GB Ram and  2 AMD 1.8Ghz 
cpu's 
> on Solaris 10.
> it currently handles 95000 mails per day (most of it spam ofcourse). 
Load 
> is currently constantly around 6 - 7, average scan time of a mail is 
about 
> 7 seconds. %io according to top is almost 0.
> 
> mimedefang spool is running on a ramdisk, and all the software is at the 

> most recent version
> 
> Does this seem like a normal load ? I had hoped that our upgrade from 
> solaris 9 to solaris 10 would have drastically improved the number of 
> emails the system could process (or lower the load for the same amout of 

> emails), but I don't notice a major improvement.
> 
> I'd appreciate your input on this matter.
> 
> regards,
> tom.
> 
> 
Don't confuse the load average figure with 'overloaded' systems.

the load figure just means X processes are waiting for some resource 
(CPU or disk or network or).

Depending on your setup you may find it useful to use milter-ahead (or a 
free equivalent) to drop email to unknown users before you send it off 
to spamassassin etc for further processing. I drop over 2/3rds of my 
traffic that way.

Given it's only 7 seconds to scan the email I'd says your system is more 
than handling the traffic being processed.

-- 
Martin Hepworth
Senior Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

**

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.  

**





RE:

2006-09-27 Thread Michael Scheidell

> -Original Message-
> From: Peter Smith [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, September 26, 2006 8:08 PM
> To: users@spamassassin.apache.org
> Subject: 
> 
> 
> Hi,
> 
> Over the last week, my machine (Fedora, SA 3.1.3, qmail, 
> qmail-scanner-queue.pl) has been recieving a fair amount of 
> junk mail which is not being tagged as spam; in fact the 
> total scores are negative.
> 
> The messages are simply a random stream of words, with 
> punctuation scattered in them. No HTML, no URLs being 
> advertised, no excessive capitalisation, just meaningless 
> text. The message headers are pretty clean too, apart from 
> the From field being false.

A right wing conspiracy to poison the global Bayesian database.



Re: performance question

2006-09-27 Thread Ralf Hildebrandt
* [EMAIL PROTECTED] <[EMAIL PROTECTED]>:
> Hi,
> 
> I would like your opinion if our  mailrelay is properly tuned:
> 
> I have a mailrelay (sendmail / mimedefang / spamassassin with fuzzyocr, 
> razor and dcc) running on a Sun V20Z with 6 GB Ram and  2 AMD 1.8Ghz cpu's 
> on Solaris 10.
> it currently handles 95000 mails per day (most of it spam ofcourse). Load 
> is currently constantly around 6 - 7, average scan time of a mail is about 
> 7 seconds. %io according to top is almost 0.

We have a VERY similar setup and similar amount of mail, but we use
Postfix and amavisd-new (which uses spamassassin with fuzzyocr and
can do it's own defanging). amavisd-new uses a ramdisk as well.

I guess this reduces the number of processes/forks and performs a bit
better.

* Linux 2.6.18
* 2 Xeon 2.8 GHZ Processors and 3 GB RAM.
* Peak load at noon is about (11:30 now): 4.64

The average scan time is below your 7s...

-- 
Ralf Hildebrandt (i.A. des IT-Zentrums) [EMAIL PROTECTED]
Charite - Universitätsmedizin BerlinTel.  +49 (0)30-450 570-155
Gemeinsame Einrichtung von FU- und HU-BerlinFax.  +49 (0)30-450 570-962
IT-Zentrum Standort CBF send no mail to [EMAIL PROTECTED]


Re: performance question

2006-09-27 Thread Martin Hepworth

[EMAIL PROTECTED] wrote:

Hi,

I would like your opinion if our  mailrelay is properly tuned:

I have a mailrelay (sendmail / mimedefang / spamassassin with fuzzyocr, 
razor and dcc) running on a Sun V20Z with 6 GB Ram and  2 AMD 1.8Ghz cpu's 
on Solaris 10.
it currently handles 95000 mails per day (most of it spam ofcourse). Load 
is currently constantly around 6 - 7, average scan time of a mail is about 
7 seconds. %io according to top is almost 0.


mimedefang spool is running on a ramdisk, and all the software is at the 
most recent version


Does this seem like a normal load ? I had hoped that our upgrade from 
solaris 9 to solaris 10 would have drastically improved the number of 
emails the system could process (or lower the load for the same amout of 
emails), but I don't notice a major improvement.


I'd appreciate your input on this matter.

regards,
tom.



Don't confuse the load average figure with 'overloaded' systems.

the load figure just means X processes are waiting for some resource 
(CPU or disk or network or).


Depending on your setup you may find it useful to use milter-ahead (or a 
free equivalent) to drop email to unknown users before you send it off 
to spamassassin etc for further processing. I drop over 2/3rds of my 
traffic that way.


Given it's only 7 seconds to scan the email I'd says your system is more 
than handling the traffic being processed.


--
Martin Hepworth
Senior Systems Administrator
Solid State Logic
Tel: +44 (0)1865 842300

**

This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you have received this email in error please notify
the system manager.

This footnote confirms that this email message has been swept
for the presence of computer viruses and is believed to be clean.   

**



performance question

2006-09-27 Thread tomvo
Hi,

I would like your opinion if our  mailrelay is properly tuned:

I have a mailrelay (sendmail / mimedefang / spamassassin with fuzzyocr, 
razor and dcc) running on a Sun V20Z with 6 GB Ram and  2 AMD 1.8Ghz cpu's 
on Solaris 10.
it currently handles 95000 mails per day (most of it spam ofcourse). Load 
is currently constantly around 6 - 7, average scan time of a mail is about 
7 seconds. %io according to top is almost 0.

mimedefang spool is running on a ramdisk, and all the software is at the 
most recent version

Does this seem like a normal load ? I had hoped that our upgrade from 
solaris 9 to solaris 10 would have drastically improved the number of 
emails the system could process (or lower the load for the same amout of 
emails), but I don't notice a major improvement.

I'd appreciate your input on this matter.

regards,
tom.




Re: Infuriating gif spam...

2006-09-27 Thread Steve [Spamassasin]
Bill Landry wrote:
>> Version 2.3j works much better...  I'd previously been using version
>> 2.3b for which I had an ebuild for gentoo.
>>
>> One thing I have noticed, however, is a number of errors/warnings which
>> spamd sticks into /var/log/messages when it is started:
>>
>> -- 
>> Sep 26 17:20:48 server spamd[25563]: Subroutine new redefined at
>> /etc/mail/spamassassin/FuzzyOcr.pm line 122.
>> -- 
>>
>> Have I somehow loaded this module twice? I didn't get these messages
>> until I upgraded to version 2.3j from 2.3b
>
> No problem here, these are just informational messages that only
> recently showed up for me with the more recent versions of the
> FuzzyOcr plugin, as well.  However, with the two latest versions, it
> only gets written to the log once during start-up, not with each image
> file that gets scanned like I was seeing a few versions back.

Jorge Valdes replied to me (though I can't find his email on the list) - he 
said to look at v310.pre - I had an unnecessary line:

> loadplugin FuzzyOcr /etc/mail/spamassassin/FuzzyOcr.pm

After having commented that out 2.3j works just as well as it did before and I 
don't get the warnings any more.