Confused about how to use sa-update

2010-03-31 Thread Phill Edwards
I have just found out why most of my emails have been getting tagged
as spam this year. It's because of a bug in a rule which causes this
hit to happen when it shouldn't: FH_DATE_PAST_20XXThe date is
grossly in the future. The actual file at fault is 72_active.cf which
is a spamassassin rule file and it can be fixed by getting the new
file via sa-update.

But I don't understand how to use sa-update. I've run it and I can see
all the new rule files in /var/lib/spamassassin/3.002005. However, I
think my rules run off the files in /usr/share/spamassassin/. The wiki
at http://wiki.apache.org/spamassassin/RuleUpdates#Using_sa-update
says NOT to use the --updatedir parameter to put updates in
/usr/share/spamassassin. So how exactly do you get the new rule files
into /usr/share/spamassassin so they start working? Do you just copy
them across manually, or is there a way of getting sa-update to do it
automatically?


keep-alive check?

2010-03-31 Thread David

I've just found that line on the spamc man page:

-K  Perform a keep-alive check of spamd, instead of a full message check.

Someone knows what it means, and what it actually does?





Re: keep-alive check?

2010-03-31 Thread Mariusz Kruk
On Wednesday, 31 of March 2010, David wrote:
 I've just found that line on the spamc man page:
 
 -K  Perform a keep-alive check of spamd, instead of a full message check.
 
 Someone knows what it means, and what it actually does?

It does what it says. Keep-alive means check means just connecting to spamd to 
check whether the daemon is still alive. One does not need to do the full 
message scan for this.

-- 
\/ 
|  k...@epsilon.eu.org   | 
| http://epsilon.eu.org/ | 
/\ 


sa-update

2010-03-31 Thread Andrea Bencini

I installed with yum lhe following pakages:
postfix, amavisd-new and spamassassin.

I have *.cf in /usr/share/spamassassin/ directory and now I would like 
update them.


Is it possible? with sa-update?
If yes which is the complete command to use to update *.cf in 
/usr/share/spamassassin/ directory?


Thanks
Andrea 



Re: Confused about how to use sa-update

2010-03-31 Thread Karsten Bräckelmann
On Wed, 2010-03-31 at 19:15 +1100, Phill Edwards wrote:
 I have just found out why most of my emails have been getting tagged
 as spam this year. It's because of a bug in a rule which causes this
 hit to happen when it shouldn't: FH_DATE_PAST_20XXThe date is
 grossly in the future. The actual file at fault is 72_active.cf which
 is a spamassassin rule file and it can be fixed by getting the new
 file via sa-update.
 
 But I don't understand how to use sa-update. I've run it and I can see
 all the new rule files in /var/lib/spamassassin/3.002005. However, I
 think my rules run off the files in /usr/share/spamassassin/. The wiki

man spamassassin

Pay special attention to the section Configuration Files. sa-update is
doing the right thing, and spamassassin will use the update dir instead
of the base stock rules. You merely need to restart spamd, or whatever
else you are using.


 at http://wiki.apache.org/spamassassin/RuleUpdates#Using_sa-update
 says NOT to use the --updatedir parameter to put updates in
 /usr/share/spamassassin. So how exactly do you get the new rule files
 into /usr/share/spamassassin so they start working? Do you just copy
 them across manually, or is there a way of getting sa-update to do it
 automatically?

Pretty much the same answer as above. Do not move, copy or otherwise
harm those files. :)  sa-update knows, where spamassassin expects the
rules.

  guenther


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Confused about how to use sa-update

2010-03-31 Thread Kai Schaetzl
Phill Edwards wrote on Wed, 31 Mar 2010 19:15:18 +1100:

So, you have finally found sa-update? Wow.

 So how exactly do you get the new rule files
 into /usr/share/spamassassin so they start working?

Run a debug lint and you will see that the /var/lib directory gets used 
when it contains rules. Nothing gets moved.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: sa-update

2010-03-31 Thread Jari Fredriksson
On 31.3.2010 14:02, Andrea Bencini wrote:
 I installed with yum lhe following pakages:
 postfix, amavisd-new and spamassassin.
 
 I have *.cf in /usr/share/spamassassin/ directory and now I would like
 update them.
 
 Is it possible? with sa-update?
 If yes which is the complete command to use to update *.cf in
 /usr/share/spamassassin/ directory?
 

The rules are normally in /usr/lib/spamassassin and sa-update knows
where they are supposed to be in any case.

The simplest case of sa-update is just

 # sa-update

That's all.

But there are options. You may want to include more channels than the
default. A good tutorial can be found from

 http://khopesh.com/wiki/Anti-spam

The author is a member of this list, and also available in irc.
Currently I use my sa-update just as it says in that tutorial, same
channels.

-- 
http://www.iki.fi/jarif/

Q:  Why is it that Mexico isn't sending anyone to the '84 summer games?
A:  Anyone in Mexico who can run, swim or jump is already in LA.



signature.asc
Description: OpenPGP digital signature


Re: sa-update

2010-03-31 Thread Matt Kettler
On 3/31/2010 7:02 AM, Andrea Bencini wrote:
 I installed with yum lhe following pakages:
 postfix, amavisd-new and spamassassin.

 I have *.cf in /usr/share/spamassassin/ directory and now I would like
 update them.

 Is it possible? with sa-update?
 If yes which is the complete command to use to update *.cf in
 /usr/share/spamassassin/ directory?

 Thanks
 Andrea

Don't. Let sa-update put them in /var/lib/spamassassin.

Spamassassin will check this location for new rulesets and use it
instead of /usr/share.

 If you're not sure this is happening, try running spamassassin --lint
-D and check the top of the debug output for what rule paths SA is using.


Limit SA to scan messages 100k and below

2010-03-31 Thread Keith De Souza
Hi Guys,

My current sysadmin has now left the company and I'm new to SA and Exim.
Needless to say I have been assigned the task to
look after the server . I'm hoping I've come to the right place for my
questions to be answered.

The system I have is running on:

Gentoo Base System release 1.12.10
SpamAssassin version 3.2.5
  running on Perl version 5.8.8
Exim version 4.69

Here is my spamd.conf file:

=
SPAMD_OPTS=-m 25 -H -u mail -D

# spamd stores its pid in this file. If you use the -u option to
# run spamd under another user, you might need to adjust it.

PIDFILE=/var/run/spamd.pid

# SPAMD_NICELEVEL lets you set the 'nice'ness of the running
# spamd process

SPAMD_NICELEVEL=1
=

I've read somewhere that the default setting for SA to scan a message is
500k.

Can I reduce this, so that SA scans messages 100k and below?


Many Thanks in advance


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Mikael Syska
Hi

On Wed, Mar 31, 2010 at 2:24 PM, Keith De Souza
kbdeso...@googlemail.com wrote:
 Hi Guys,


[snip]


 I've read somewhere that the default setting for SA to scan a message is
 500k.

 Can I reduce this, so that SA scans messages 100k and below?

Have you tried google first ?
http://www.google.dk/#hl=dasafe=offq=spamd+scan+messages+sizemeta=aq=faqi=aql=oq=gs_rfai=fp=15904d39482f0df0

Maybe this one: http://spamassassin.apache.org/full/3.2.x/doc/spamc.html

I'm no expert at spamc ... but this seems to be the right settings to go for ...

But are there are reason for dropping it?

 Many Thanks in advance




mvh
Mikael Syska


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Karsten Bräckelmann
On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote:
 My current sysadmin has now left the company and I'm new to SA and
 Exim. [...]

 I've read somewhere that the default setting for SA to scan a message
 is 500k.

That's actually the default for spamc. Messages exceeding the threshold
just won't be passed to spamd. SA (and spamd) will check everything it
gets passed.

 Can I reduce this, so that SA scans messages 100k and below?

You need to change whatever glue you are using to pass messages to SA,
and skip the scanning for messages larger than your desired threshold.

That said, IMHO 100k is rather low. Why do you want that particular
threshold?

  guenther


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Mikael Syska
Hi,

Remember to respond to the mailing list ... so other users can follow
this also ...

On Wed, Mar 31, 2010 at 2:54 PM, Keith De Souza
kbdeso...@googlemail.com wrote:
 Hi,

 But are there are reason for dropping it?

 I'm having a few errors in my Exim logs from legitamate senders not coming
 through:

 ===
 2010-03-31 01:22:25 1Nwlbc-0001QS-Ua
 H=host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86]
 F=l...@dukeandearl.com temporarily rejected after DATA
 ===

 And after checking my SA logs:

 ===
 Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 -
 GENESIS_PHONENUMBER07
 scantime=300.0,size=24337,user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=c7d27527.8a78%l...@dukeandearl.com,autolearn=unavailable
 ==

Your required score is very slow ... but thats not the problem.

 I'm trying to understand why is it taking 300.0 seconds to scan a message
 only 24Kb in size??

This is not the way to go ... there could be other problems ... like
SA rules, RBL's timing out ...

Are you running sa-update ?

 I'm begeining to think that because SA is taking so long to scan the
 message, it is timing out
 and hence Exim returning a temporarily reject after DATA.

 My thoughs so far is to perhaps reducing the file size that SA takes to scan
 and see if the scan time reduces.

Are there lots of mails in the queue ?

 I may be wrong in my troublshooting methods but I'm not sure why this is
 happeninig at present.

 Many Thanks




 On 31 March 2010 13:30, Mikael Syska mik...@syska.dk wrote:

 Hi

 On Wed, Mar 31, 2010 at 2:24 PM, Keith De Souza
 kbdeso...@googlemail.com wrote:
  Hi Guys,
 

 [snip]

 
  I've read somewhere that the default setting for SA to scan a message is
  500k.
 
  Can I reduce this, so that SA scans messages 100k and below?

 Have you tried google first ?

 http://www.google.dk/#hl=dasafe=offq=spamd+scan+messages+sizemeta=aq=faqi=aql=oq=gs_rfai=fp=15904d39482f0df0

 Maybe this one: http://spamassassin.apache.org/full/3.2.x/doc/spamc.html

 I'm no expert at spamc ... but this seems to be the right settings to go
 for ...

 But are there are reason for dropping it?

  Many Thanks in advance
 
 
 

 mvh
 Mikael Syska



mvh


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Keith De Souza
Hi

* You need to change whatever glue you are using to pass messages to SA,
and skip the scanning for messages larger than your desired threshold.

*Sorry as I'm new to SA can you elaborated what you mean by glue?
*
That said, IMHO 100k is rather low. Why do you want that particular
threshold?*

Judging from your response, I may be wrong in what I need to do:

Basically I'm having a few errors in my Exim logs from legitamate senders
not coming through:

===
2010-03-31 01:22:25 1Nwlbc-0001QS-Ua H=
host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86] F=
l...@dukeandearl.com temporarily rejected after DATA
===

And after checking my SA logs:

===
Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 -
GENESIS_PHONENUMBER07 *scantime=300.0,size=24337*,
user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=
c7d27527.8a78%l...@dukeandearl.com c7d27527.8a78%25l...@dukeandearl.com
,autolearn=unavailable
==

I'm trying to understand why is it taking 300.0 seconds to scan a message
only 24Kb in size??
I'm begeining to think that because SA is taking so long to scan the
message, it is timing out
and hence Exim returning a temporarily reject after DATA.

My thoughs so far is to perhaps reducing the file size that SA takes to scan
and see if the scan time reduces.
I may be wrong in my troublshooting methods but I'm not sure why this is
happeninig at present.

Many Thanks






2010/3/31 Karsten Bräckelmann guent...@rudersport.de

 On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote:
  My current sysadmin has now left the company and I'm new to SA and
  Exim. [...]

  I've read somewhere that the default setting for SA to scan a message
  is 500k.

 That's actually the default for spamc. Messages exceeding the threshold
 just won't be passed to spamd. SA (and spamd) will check everything it
 gets passed.

  Can I reduce this, so that SA scans messages 100k and below?

 You need to change whatever glue you are using to pass messages to SA,
 and skip the scanning for messages larger than your desired threshold.

 That said, IMHO 100k is rather low. Why do you want that particular
 threshold?

  guenther


 --
 char *t=\10pse\0r\0dtu...@ghno
 \x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
 main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
 c=1:
 (c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
 }}}




Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Keith De Souza
Hi

Oops only realized after I had sent you the message - but will do.

* Are you running sa-update ?*

I might not be, how can I check?

* Are there lots of mails in the queue?

*No mails in the queue. I should also say that, mail is coming in fine
and we are receving it but certain legitamate mail (like the one sent)are
not
and SA take 300.0 second to scan.

I'm also receiving these in my logs:

*spam acl condition: error reading from spamd socket: Connection timed out

*Many Thanks


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Jeff Mincy
   From: Keith De Souza kbdeso...@googlemail.com
   Date: Wed, 31 Mar 2010 14:10:50 +0100
   
   Hi
   
   * You need to change whatever glue you are using to pass messages to SA,
   and skip the scanning for messages larger than your desired threshold.
   
   *Sorry as I'm new to SA can you elaborated what you mean by glue?
   *
   That said, IMHO 100k is rather low. Why do you want that particular
   threshold?*
   
   Judging from your response, I may be wrong in what I need to do:
   
   Basically I'm having a few errors in my Exim logs from legitamate senders
   not coming through:

300 seconds looks like an timeout.   Something is giving up after
waiting 300 seconds.

Note the autolearn=unavailable.   I'd guess that you are getting
locked out from the Bayes database.   You probably had a Bayes expire
running at the same time.   There should be messages about this in a
log file.

If this is the case you can turn off bayes_auto_expire and run expire
from cron.  You could also try learning to the journal and doing
sa-learn --sync periodically from cron.

-jeff

   
   ===
   2010-03-31 01:22:25 1Nwlbc-0001QS-Ua H=
   host81-136-197-86.in-addr.btopenworld.com (mail.duke.tv) [81.136.197.86] F=
   l...@dukeandearl.com temporarily rejected after DATA
   ===
   
   And after checking my SA logs:
   
   ===
   Mar 31 01:25:51 mailserver spamd[5379]: spamd: result: . -4 -
   GENESIS_PHONENUMBER07 *scantime=300.0,size=24337*,
   
user=nobody,uid=8,required_score=3.2,rhost=localhost,raddr=127.0.0.1,rport=42308,mid=
   c7d27527.8a78%l...@dukeandearl.com c7d27527.8a78%25l...@dukeandearl.com
   ,autolearn=unavailable
   ==
   
   I'm trying to understand why is it taking 300.0 seconds to scan a message
   only 24Kb in size??
   I'm begeining to think that because SA is taking so long to scan the
   message, it is timing out
   and hence Exim returning a temporarily reject after DATA.
   
   My thoughs so far is to perhaps reducing the file size that SA takes to scan
   and see if the scan time reduces.
   I may be wrong in my troublshooting methods but I'm not sure why this is
   happeninig at present.
   
   Many Thanks
   
   
   
   
   
   
   2010/3/31 Karsten Bräckelmann guent...@rudersport.de
   
On Wed, 2010-03-31 at 13:24 +0100, Keith De Souza wrote:
 My current sysadmin has now left the company and I'm new to SA and
 Exim. [...]
   
 I've read somewhere that the default setting for SA to scan a message
 is 500k.
   
That's actually the default for spamc. Messages exceeding the threshold
just won't be passed to spamd. SA (and spamd) will check everything it
gets passed.
   
 Can I reduce this, so that SA scans messages 100k and below?
   
You need to change whatever glue you are using to pass messages to SA,
and skip the scanning for messages larger than your desired threshold.
   
That said, IMHO 100k is rather low. Why do you want that particular
threshold?
   
 guenther
   
   
--
char *t=\10pse\0r\0dtu...@ghno
\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8?
c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0;
}}}
   
   


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Bowie Bailey
Keith De Souza wrote:

 I'm trying to understand why is it taking 300.0 seconds to scan a
 message only 24Kb in size??
 I'm begeining to think that because SA is taking so long to scan the
 message, it is timing out
 and hence Exim returning a temporarily reject after DATA.

 My thoughs so far is to perhaps reducing the file size that SA takes
 to scan and see if the scan time reduces.
 I may be wrong in my troublshooting methods but I'm not sure why this
 is happeninig at present.

My first suggestion to anyone who is having problems with SA running
slowly is to check memory usage.

You posted previously that your conf file contained this:

SPAMD_OPTS=-m 25 -H -u mail -D

-m 25 means that you are running 25 spamd processes.  On my system
(with a few extra rulesets), the spamd processes take up about 60-70M
each.  How much memory do you have?  You need to make sure that the
machine doesn't go into swap.  If it does, SA will slow down
dramatically.  Try running the free command to see how much memory you
have available.  If you are close to the edge, you may want to lower the
number of processes.

-H is a command to change the home directory and generally requires an
argument, so I'm not sure what it's doing here.

-u mail means spamd is running as the user mail.  So when you are
testing, manually learning the Bayes db, etc, make sure you are logged
in as mail so that you are using the same settings and databases as spamd.

-D puts spamd into debug mode.  Aside from filling up your logs with
excess debug information, this will probably slightly increase the
memory use and slow down the scanning process.  If you don't need it for
some reason, get rid of it.

-- 
Bowie


Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Keith De Souza wrote:

Sorry as I'm new to SA can you elaborated what you mean by glue?


Geek terminology for the program, script or other mechanism that 
'connects' your MTA and your SA. Ie. The calling MTA or its script must do 
the size check, then decide *whether* to call SA



I'm trying to understand why is it taking 300.0 seconds to scan a message
only 24Kb in size??


1) Server is overloaded. Your load only has to go 10-20% over your 
system's 'maximum capacity' to cause processing times to jump from 20 
seconds up to five minutes or more


2) Something that SA relies upon, like your DNS server, is taking way 
too long to do its job. Check that your DNS has a reasonable timeout 
value. Otherwise it could be waiting for a non-existent domain

This would be the case if the problem occurs for certain addresses,
or more often on spam (which comes from 'unknown' systems) than on 
legitimate mail


3) There may be a 'locking' issue with any databases (Bayes?) that SA 
uses. Again, this may only become a problem under heave load, with too 
many concurrent SA processes



My thoughs so far is to perhaps reducing the file size that SA takes to scan
and see if the scan time reduces.


It is a better idea to try and reduce the number of emails that SA will 
process at the same time.


- C


Re: Scanning large-body spam

2010-03-31 Thread Adam Katz
Alex wrote:
 What settings do people typically have these days for the maximum
 scanned message size? Surprisingly, at least to me, I'm seeing spam in
 the 650k and 700k range, at least a few per hour, and are not scanned.
 
 Does anyone have any suggestions for optimizing the process for spam
 containing just a large image that would therefore bypass the typical
 scanning? Should I be scanning messages that large, then?

Depends on your available CPU resources.  If you always have a low
load average, you can scan larger messages.  My production deployment
is such a workhorse that I've got it set to 1.1MB.

My general advice is that since many spammers will check against a
default SA scan before blasting out their messages, you want something
slightly larger than whatever the default is (actually, in the event
that it has changed between versions, something slightly larger than
the largest default SA has ever shipped with).

Maybe somebody who knows the innards better can comment on how quickly
and efficiently SA can ignore non-text attachments (for those of use
who don't try to decode word documents and PDFs or use OCR on images).

Wasn't some earlier version of SA capable of scanning just the /first/
[size] of an email?  Probably harder to implement within MIME, but
some control to internally truncate remaining pieces (for scanning
only, like the pseudo-headers) would allow scanning beyond the size limit.


Re: Scanning large-body spam

2010-03-31 Thread Henrik K
On Wed, Mar 31, 2010 at 11:05:57AM -0400, Adam Katz wrote:
 
 Wasn't some earlier version of SA capable of scanning just the /first/
 [size] of an email?  Probably harder to implement within MIME, but
 some control to internally truncate remaining pieces (for scanning
 only, like the pseudo-headers) would allow scanning beyond the size limit.

SA 3.3 has special handling for truncated messages and amavisd-new (if it's
your choice of glue) has already done it since 2.6.3. Never encountered a
problem with it. Here are release notes for the record:


- large messages beyond $sa_mail_body_size_limit are now partially passed
  to SpamAssassin and other spam scanners for checking: a copy passed to
  a spam scanner is truncated near or slightly past the indicated limit.
  Large messages are no longer given an almost free passage through spam
  checks.

  Note that message truncation can invalidate a DKIM or DK signature.
  If using (non-default) SpamAssassin rules to assign score points to mail
  with no valid signatures from authors which are expected to always provide
  a valid signature, the message truncation can cause false positives on
  these rules. As a workaround, to a truncated message passed to spam
  scanners, amavisd inserts a header field:
X-Amavis-MessageSize: m, TRUNCATED to n
  which can be captured by SpamAssassin rules, e.g.:
header __TRUNCATED X-Amavis-MessageSize =~ m{\A[^\n]*TRUNCATED}m
  and used in rules like NOTVALID_EBAY to prevent them from triggering.

  Starting with version 3.3.0 of SpamAssassin, its DKIM plugin understands
  the issue and receives undamaged DKIM signature objects directly from
  amavisd, so the above workaround is not needed. Also, a hit on a __TRUNCATED
  rule is automatically generated (explicit header rule is not necessary),
  just in case it might be useful for some purpose.


For other glue, I recommend taking it up with the author to support
truncating properly. (Hmm, I don't think spamc has been enhanced yet..)

Of course we hope that someday SA will have true support for ignoring
useless attachment data.



Re: Limit SA to scan messages 100k and below

2010-03-31 Thread Martin Gregorie
On Wed, 2010-03-31 at 15:06 +0200, Mikael Syska wrote:
  I'm trying to understand why is it taking 300.0 seconds to scan a message
  only 24Kb in size??
 
Use the sysstat tool-set to find out what's going on in your system and
fix that.

I agree with those who say that -m 25 is too large a value. If that's
the problem then you don't need to use the sysstat programs to see it -
just run 'top' and you'll see the swap space used value changing and
that kswapd is busy. Try simply deleting the -m option, which uses the
default of 5 children, and see how SA performance changes. 

To provide some guide numbers I looked at my two SA setups, which both
use the default number of children:

- My SA rule development rig runs on a 1.5GHz CoreDuo laptop with 1GB
  RAM. It can scan my two biggest spam test messages (412KB and 360KB) 
  in 21 seconds: however scan time depends on the message content: the
  412KB message only takes 1 second to scan by itself while the 360KB one 
  takes the other 20 seconds. This set-up uses SA 3.3.0

- My main server is a lot smaller: an 866MHz P3 with 512 MB RAM. It runs
  SA 3.2.5. Here are the numbers from its set of maillogs:

  Messages scanned: 2758
  Message size: min 2072  avg 7223  max 417840   bytes
  Scan times:   min 0.7   avg 2.247 max 21.1 seconds

  I'm using the default SA child process populations. This machine is
  also running getmail, Postfix, Dovecot, named, ntpd, Samba, Apache and
  PostgresQL and is used for Java development as well.


Martin





Re: Scanning large-body spam

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Henrik K wrote:

SA 3.3 has special handling for truncated messages


Excuse me for not *thinking* earlier, but it occurs to me that there is a 
very big drawback to *truncating* a message before passing it to SA, as 
opposed to my original request/suggestion to *flag* (or set a config 
param?) to tell SA to *ignore* parts of a message past a certain size.


I believe it is fairly common practice for MTA's to expect SA to return 
the *entire* message, complete with X-Spam header 'markup', from SA's 
standard output stream. This is particularly important where mail 
classified as *slightly* spammy is delivered to a special spam folder 
based upon the headers added by SA. Or on a system where all mail tagged 
as spam is quarantined. Having SA's markup/explanations is critical to 
analysing false positives/negatives.


So SA needs to read and write the *entire* message, but then be given a 
parameter to keep it from thrashing over the really large ones.


- Charles


SPAM from legit a Yahoo/Gmail account

2010-03-31 Thread Kaleb Hosie
I'm wondering if anyone else has an issue with SPAM that comes from a real 
yahoo or gmail account?

I've noticed a few emails get let into our organization everyday that is sent 
from a free email account such as yahoo and gmail. When I do a rDNS lookup, of 
the IP, it points back to a real server (not a spam server).

Here's an example of one that just got let in:
Mar 31 12:05:34 mailgate2 spamd[14709]: spamd: processing message 
39701.814...@web36505.mail.mud.yahoo.com for apache:48
Mar 31 12:05:38 mailgate2 spamd[14709]: spamd: clean message (-0.1/4.4) for 
apache:48 in 3.8 seconds, 22865 bytes.
Mar 31 12:05:38 mailgate2 spamd[14709]: spamd: result: . 0 - 
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD

The subject of this is email was: Launch of www.girlsandwomen.com  G(irls) 20 
Summit Website

Does anyone have any recommendations on how to fixing that? Thanks!

Kaleb


Is report_safe broken?

2010-03-31 Thread Michael Weber
Greetings!

I upgraded SA from version 3.2.5 to 3.3.1 this morning.

Since that time all of the emails that are marked as spam are being converted 
to attachments.

One other oddity.  If you look close at the rewrite_header Subject line, you 
will count three %'s after the word SPAM.  This is a change I made to test if 
SA was reading this config file at all.  It is.  My email subject lines went 
from %%SPAM%% to %%SPAM%%% just as expected, but the required score stayed at 
5.0 and didn't change to 50.0.

I re-checked the report_safe setting in the local.cf file and it is still set 
to zero as it was before.  I also checked for another cf file changing that 
parameter but there are no other cf files that mention it.  (Or .pre files for 
that matter.)

The user_prefs file in the home directory has nothing that is not commented out 
and has not been changed from the default.

I am calling spamc from a postfix filter line in master.cf, but that hasn't 
been changed since before the upgrade.

As always, what am I missing?

TIA!

Here is my /etc/mail/spamassassin/local.cf file:
===

rewrite_header Subject   %%SPAM%%% (_SCORE_)

add_header all Level _STARS(X)_

# required_score 5.0
required_score 50.0

report_safe 0

ok_locales  en
# ok_languages  en


#   Use Bayesian classifier (default: 1)
#
use_bayes 1

#   Bayesian classifier auto-learning (default: 1)
#
bayes_auto_learn 0







Michael Weber
Network Administrator
 
Allied National, Inc.
4551 W . 107th St.
Suite 100
Overland Park, KS 66207

913-945-4313 is my direct number


E-MAIL CONFIDENTIALITY NOTICE: This communication and any associated
file(s) may contain privileged, confidential or proprietary information
or be protected from disclosure under law (Confidential Information).
Any use or disclosure of this Confidential Information, or taking any
action in reliance thereon, by any individual/entity other than the
intended recipient(s) is strictly prohibited.  This Confidential
Information is intended solely for the use of the
individual(s) addressed. If you are not an intended recipient, you have
received this Confidential Information in error and have an obligation
to promptly inform the sender and permanently destroy, in its entirety,
this Confidential Information (and all copies thereof).  E-mail is
handled in the strictest of confidence by Allied National, however,
unless sent encrypted, it is not a secure communication method and may
have been intercepted, edited or altered during transmission and
therefore is not guaranteed.



Re: SPAM from legit a Yahoo/Gmail account

2010-03-31 Thread Kevin Parris
One likely scenario may be that the spammer managed to hack into an existing 
account, then use it to send out their garbage.  One way to fix that is to 
ensure all humans with computer access always employ best practices for 
choosing and protecting secure passwords.

Another possible scenario is the spammer created their own account just so 
their spam would look more legitimate.  This is another human behavior issue 
for which (like the one above) there is unlikely ever to be an acceptable 
technological solution.

You're never going to stop ALL the spam, and for situations that represent, as 
you said, only a few the effort to catch them is often more trouble than it's 
worth - or the problem may just go away (the freemail host notices and closes 
the account) by the time you start trying to think of a solution.

 Kaleb Hosie kho...@spectraaluminum.com 03/31/10 12:18 PM 
I'm wondering if anyone else has an issue with SPAM that comes from a real 
yahoo or gmail account?

I've noticed a few emails get let into our organization everyday that is sent 
from a free email account such as yahoo and gmail. When I do a rDNS lookup, of 
the IP, it points back to a real server (not a spam server).

Here's an example of one that just got let in:
Mar 31 12:05:34 mailgate2 spamd[14709]: spamd: processing message 
39701.814...@web36505.mail.mud.yahoo.com for apache:48
Mar 31 12:05:38 mailgate2 spamd[14709]: spamd: clean message (-0.1/4.4) for 
apache:48 in 3.8 seconds, 22865 bytes.
Mar 31 12:05:38 mailgate2 spamd[14709]: spamd: result: . 0 - 
DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD

The subject of this is email was: Launch of www.girlsandwomen.com  G(irls) 20 
Summit Website

Does anyone have any recommendations on how to fixing that? Thanks!

Kaleb



Spamhaus Uncovers Fake DNSBL: nszones.com

2010-03-31 Thread Neil Schwartzman
Spamhaus has uncovered a fake spam filter company which was pirating and
selling DNSBL data stolen from major anti-spam systems including Spamhaus,
CBL and SURBL, republishing the stolen data under the name nszones.com.

more: http://www.spamhaus.org/organization/statement.lasso?ref=8
--
Neil Schwartzman
Senior Director
Security Strategy, Receiver Services
Return Path Inc.
[303] 999-3217
Tweets: ReturnPathHelp



Re: Scanning large-body spam

2010-03-31 Thread Alex
Hi,

 Does anyone have any suggestions for optimizing the process for spam
 containing just a large image that would therefore bypass the typical
 scanning? Should I be scanning messages that large, then?

 Depends on your available CPU resources.  If you always have a low
 load average, you can scan larger messages.  My production deployment
 is such a workhorse that I've got it set to 1.1MB.

Will messages this large have the benefit of bayes? What would be the
impact on the corresponding sa-learn of a message of that size?
Perhaps only learn the header and body components that aren't an
attachment somehow?

Thanks,
Alex


Re: SPAM from legit a Yahoo/Gmail account

2010-03-31 Thread Alex
Hi,

 I've noticed a few emails get let into our organization everyday that is sent 
 from a free email account such as yahoo and gmail. When I do a rDNS lookup, 
 of the IP, it points back to a real server (not a spam server).

 Here's an example of one that just got let in:
 Mar 31 12:05:34 mailgate2 spamd[14709]: spamd: processing message 
 39701.814...@web36505.mail.mud.yahoo.com for apache:48

That's a yahoo message ID, but did it in fact come from yahoo?

 Mar 31 12:05:38 mailgate2 spamd[14709]: spamd: result: . 0 - 
 DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,T_RP_MATCHES_RCVD

Where did you get that T_RP_MATCHES_RCVD rule and what does it do? Is
it something you wrote to match on yahoo.com sender?

I've put together a few rules that match on freemail domains with
particular contents (typically a URI) in the body for instances just
such as this. If you're really having trouble, post a message to
pastebin.com and a message to the list here with that link, so we can
help further.

Best,
Alex


Re: Scanning large-body spam

2010-03-31 Thread Mark Martinec
On Wednesday March 31 2010 18:05:52 Charles Gregory wrote:
 Excuse me for not *thinking* earlier, but it occurs to me that there is a
 very big drawback to *truncating* a message before passing it to SA, as
 opposed to my original request/suggestion to *flag* (or set a config
 param?) to tell SA to *ignore* parts of a message past a certain size.

 I believe it is fairly common practice for MTA's to expect SA to return
 the *entire* message, complete with X-Spam header 'markup', from SA's
 standard output stream. This is particularly important where mail
 classified as *slightly* spammy is delivered to a special spam folder
 based upon the headers added by SA. Or on a system where all mail tagged
 as spam is quarantined. Having SA's markup/explanations is critical to
 analysing false positives/negatives.
 
 So SA needs to read and write the *entire* message, but then be given a
 parameter to keep it from thrashing over the really large ones.

There are some drawbacks in depriving SpamAssassin of the full message
and letting it work on a truncated message, appropriately marked as one.
But even the message header alone often carries half the value of score
quality. Adding to that the first 400 kB of a body already covers plenty
of information about a message. It would be better of course to let SA
have access to a full or summarized info about the rest of the message
(like its attachments) too, but doing without is not too bad. Comparing
the quality of a score on a partial message, to not having any score
at all (and passing any big message as clean) makes a decision trivial
(it just needs to be done).

 I believe it is fairly common practice for MTA's to expect SA to return
 the *entire* message, complete with X-Spam header 'markup', from SA's
 standard output stream.

Sure, but this is an implementation detail. There is no underlying reason
that spamc could not keep the original message and only feed part of it
to spamd, then merge the results back and do the final message editing
(like inserting/editing header fields) by itself. Or to modify spamd and
let it handle arbitrary size messages by avoiding its current paradigm
of keeping the entire message in memory.

Anyway, the amavisd glue to SpamAssassin does just that: let SpamAssassin
see only the first 400 kB (configurable) of a large message, then edit
the original message based on results obtained from SpamAssassin. This
offers best of both worlds: handles arbitrary size messages, and avoids
SpamAssassin slurping it all in memory. The tricky details are in editing
the message, and ensuring that DKIM and DK signatures survive (which is
done by using an out-of-band channel between a caller and SA with its
plugins, as provided by SA 3.3).

  Mark


Re: Scanning large-body spam

2010-03-31 Thread Charles Gregory

On Wed, 31 Mar 2010, Mark Martinec wrote:
 and let it handle arbitrary size messages by avoiding its current 
paradigm of keeping the entire message in memory.


Is there really a problem with the in-memory size? I would have thought 
the major concern was the processing time for evaluating 'full' (and 
rawbody?) rules on a large message



Anyway, the amavisd glue to SpamAssassin does just that: let SpamAssassin
see only the first 400 kB (configurable) of a large message, then edit
the original message based on results obtained from SpamAssassin.


Good for amavis-d, but not for those of us relying on SA to do the whole 
job, and not have our MTA's perform any further message modification


I would be interested in having some of the developers offer an opinion on 
this. Where is the real 'cost' in running SA against a large message? Is 
it just the memory used? Or is it, as I suspect, the use of 'full' rules?


- Charles


Re: Scanning large-body spam

2010-03-31 Thread Mark Martinec
On Wednesday March 31 2010 23:43:25 Charles Gregory wrote:
 Is there really a problem with the in-memory size? I would have thought
 the major concern was the processing time for evaluating 'full' (and
 rawbody?) rules on a large message

Yes, sure, the main issue is with evaluating regexp rules over
a large message. Nevertheless, even now keeping 50 copies of
100 MB memory-footprint child processes is not to be underestimated.
Add to that several copies (raw, decoded, array of lines, ...)
of a large message in perl's data structures can be a big deal.
And bear in mind that once a process running perl extends its
virtual memory, it cannot shrink back, so it stays huge forever
after processing one large message.


  Mark


Re: Is report_safe broken?

2010-03-31 Thread Matt Kettler
On 3/31/2010 12:34 PM, Michael Weber wrote:
 Greetings!

 I upgraded SA from version 3.2.5 to 3.3.1 this morning.

 Since that time all of the emails that are marked as spam are being converted 
 to attachments.

 One other oddity.  If you look close at the rewrite_header Subject line, you 
 will count three %'s after the word SPAM.  This is a change I made to test if 
 SA was reading this config file at all.  It is.  My email subject lines went 
 from %%SPAM%% to %%SPAM%%% just as expected, but the required score stayed at 
 5.0 and didn't change to 50.0.

 I re-checked the report_safe setting in the local.cf file and it is still set 
 to zero as it was before.  I also checked for another cf file changing that 
 parameter but there are no other cf files that mention it.  (Or .pre files 
 for that matter.)

 The user_prefs file in the home directory has nothing that is not commented 
 out and has not been changed from the default.

 I am calling spamc from a postfix filter line in master.cf, but that hasn't 
 been changed since before the upgrade.

 As always, what am I missing?
   

That should work fine.

Did you run spamassassin --lint, to see if SA can parse the
configuration files? (this should run with no output, but if there's a
parse error, it will complain) SA could be tripping on an illegal
character and skipping several lines of your config file...

Given it's taking the rewrite_header option, it's obvious you've got the
right local.cf and restarted spamd, etc, so there's something else amiss.




 TIA!

 Here is my /etc/mail/spamassassin/local.cf file:
 ===

 rewrite_header Subject   %%SPAM%%% (_SCORE_)

 add_header all Level _STARS(X)_

 # required_score 5.0
 required_score 50.0

 report_safe 0

 ok_locales  en
 # ok_languages  en


 #   Use Bayesian classifier (default: 1)
 #
 use_bayes 1

 #   Bayesian classifier auto-learning (default: 1)
 #
 bayes_auto_learn 0





   



Re: Confused about how to use sa-update

2010-03-31 Thread Phill Edwards
 But I don't understand how to use sa-update. I've run it and I can see
 all the new rule files in /var/lib/spamassassin/3.002005. However, I
 think my rules run off the files in /usr/share/spamassassin/. The wiki
 at http://wiki.apache.org/spamassassin/RuleUpdates#Using_sa-update
 says NOT to use the --updatedir parameter to put updates in
 /usr/share/spamassassin. So how exactly do you get the new rule files
 into /usr/share/spamassassin so they start working? Do you just copy
 them across manually, or is there a way of getting sa-update to do it
 automatically?

OK - I've found an answer to my own question at
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6269#c41 - The
rules in /usr/share/spamassassin are not consulted if a directory
/var/lib/spamassassin/3.x exists. The sa-update only updates the
latter.

One other question I have which is more about this mailing list - is
the list actually still active? It seems to have extremely low traffic
for a product like spamassassin. And I've also found it can be quite
difficult to get replies to questions. It makes me wonder whether I'm
actually posting to the right place! Is this the official spamassassin
mailing list?


Re: Confused about how to use sa-update

2010-03-31 Thread Matt Kettler
On 3/31/2010 9:10 PM, Phill Edwards wrote:
 But I don't understand how to use sa-update. I've run it and I can see
 all the new rule files in /var/lib/spamassassin/3.002005. However, I
 think my rules run off the files in /usr/share/spamassassin/. The wiki
 at http://wiki.apache.org/spamassassin/RuleUpdates#Using_sa-update
 says NOT to use the --updatedir parameter to put updates in
 /usr/share/spamassassin. So how exactly do you get the new rule files
 into /usr/share/spamassassin so they start working? Do you just copy
 them across manually, or is there a way of getting sa-update to do it
 automatically?
 
 OK - I've found an answer to my own question at
 https://issues.apache.org/SpamAssassin/show_bug.cgi?id=6269#c41 - The
 rules in /usr/share/spamassassin are not consulted if a directory
 /var/lib/spamassassin/3.x exists. The sa-update only updates the
 latter.

 One other question I have which is more about this mailing list - is
 the list actually still active? It seems to have extremely low traffic
 for a product like spamassassin. And I've also found it can be quite
 difficult to get replies to questions. It makes me wonder whether I'm
 actually posting to the right place! Is this the official spamassassin
 mailing list?

   
The list is definitely active. Now, is it 100 messages a minute? No..
but your original post did get two replies providing the answer, both
slightly over 2 hours after your question.




Re: Confused about how to use sa-update

2010-03-31 Thread Phill Edwards
 The list is definitely active. Now, is it 100 messages a minute? No..
 but your original post did get two replies providing the answer, both
 slightly over 2 hours after your question.

Yeah, I've subsequently found them on a Nabble list. For some reason
I'm not getting any email from this list into my Gmail mailbox. Other
mailing lists are coming through just fine, but not spamassassin. I
wondered if they'd been spammed by Gmail in some sort of delicious
twist of irony, but no. I have no idea why they're not showing up, but
am glad to hear that the list is so active!