RE: Spamd dies after tcp timeout

2005-09-28 Thread Sander Holthaus - Orange XL
I did some more digging, and I also find the following entry several times
in the maillog:
 
prefork: select returned undef! recovering
 
Spamd is currentlty die-ing twice a day :-| Never had any problem with any
other version in this regard...
 
Kind Regards,
Sander Holthaus - Orange XL




From: Sander Holthaus - Orange XL [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, September 27, 2005 9:21 PM
To: users@spamassassin.apache.org
Subject: Spamd dies after tcp timeout


Since a few days, I been having more serious problems with
SpamAssassin 3.10. It just dies after the following two messages in the
error-log:
 
Sep 27 15:16:32 OrangeXL4 spamd[63730]: prefork: child states: II
Sep 27 15:18:12 OrangeXL4 spamd[63730]: tcp timeout at
/usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm
line 195.
Sep 27 15:18:12 OrangeXL4 spamd[63730]: tcp timeout at
/usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm
line 195.
 
Next to that, I get quite a few Pyzor / Alarm erros (for about 5% of
all mail).
 
Never had any of such problems with SpamAssassin 3.0.x or 2.xx
 
I'm using SpamAssassin on FreeBSD 4.10 with Perl 5.8.5 installed. 
 
Kind Regards,
Sander Holthaus




Spamd dies after tcp timeout

2005-09-27 Thread Sander Holthaus - Orange XL



Since a few days, I 
been having more serious problems with SpamAssassin 3.10. It just dies after the 
following two messages in the error-log:
 
Sep 27 15:16:32 
OrangeXL4 spamd[63730]: prefork: child states: IISep 27 15:18:12 OrangeXL4 
spamd[63730]: tcp timeout at 
/usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
195.Sep 27 15:18:12 OrangeXL4 spamd[63730]: tcp timeout at 
/usr/local/lib/perl5/site_perl/5.8.5/Mail/SpamAssassin/SpamdForkScaling.pm line 
195.
 
Next to that, I get 
quite a few Pyzor / Alarm erros (for about 5% of all mail).
 
Never had any of 
such problems with SpamAssassin 3.0.x or 2.xx
 
I'm using 
SpamAssassin on FreeBSD 4.10 with Perl 5.8.5 installed. 
 
Kind 
Regards,
Sander 
Holthaus


Empty/trusted href's in spam

2005-09-01 Thread Sander Holthaus - Orange XL



Today, I noticed a 
spam-mail in my backlog which had about 10 empty  scattered across it. Is SpamAssassin protected 
against this sort of abuse? (I take those href are either to fool SpamAssassin, 
do unnecessary DNS/Blacklist lookups or DoS installations with have ClamAV 
installed with the option follow-url's).
 
Kind 
Regards,
Sander 
Holthaus


RE: phish/bayes

2005-08-28 Thread Sander Holthaus - Orange XL



I wouldn't count too much on ClamAV to protect you from 
phising. I supplied them with various phising samples, but only a select few 
have been added to the database. Next to that, I wonder how well suited ClamAV 
is for this job.
 
But couldn't some 'simple' rules fix this? One 
metafilter that looks for valid links (images, href's, email-addresses) to ebay, 
amazon, banks, etc. and another meta-rule that looks for links that point to 
non-ebay, non-amazon, non-bank links. A phisers will always need to point the 
users to a site that is under his control and it shouldn't be too hard to 
recognize this site.
 
Kind 
Regards,
Sander 
Holthaus


  
  
  From: Greg Allen 
  [mailto:[EMAIL PROTECTED] Sent: Sunday, August 28, 2005 12:19 
  PMTo: satalk; users@spamassassin.apache.orgSubject: RE: 
  phish/bayes
  
  I 
  wouldn't worry about it. You can whitelist the real ebay servers with 
  SA.
   
  Also, if you want to catch more of the phish messages 
  you can install the Clamav plugin for SA, it does very good at finding 
  phishies. You have to also install Clamav, but it is a fairly 
  simple thing to install.
   
  On a 
  side note, Ebay is not too smart IMO. Their real emails sometimes look a lot 
  like phish, which must confuse the heck out of their customers. I am sure the 
  bad guys like it though.
   
   
  
-Original Message-From: satalk (sent by 
Nabble.com) [mailto:[EMAIL PROTECTED]Sent: Thursday, August 25, 
2005 6:49 PMTo: users@spamassassin.apache.orgSubject: 
phish/bayesI could not find any email in this forum 
addressing this issue - it does not mean there is not one - I just 
could'nt find it :) MY question is as follows: Given that so 
many valid tokens from ebay/paypal sites exist in phish emails, am I 
correct in saying that it is imperative to avoid phish emails entering 
the bayes database? Anthony 

Sent from the SpamAssassin - 
Users forum at Nabble.com. 


RE: spurious __alarm__ messages in spamd log

2005-08-20 Thread Sander Holthaus - Orange XL
Steve Martin wrote:
> Sat Aug 20 00:28:36 2005 [16014] info: spamd: processing
> message <[EMAIL PROTECTED]> for filter:88
> Sat Aug 20 00:28:42 2005 [16014] error: __alarm__ Sat Aug 20
> 00:28:42 2005 [16014] error: __alarm__ Sat Aug 20 00:28:49
> 2005 [16014] info: spamd: identified spam
> (35.1/5.0) for filter:88 in 13.7 seconds, 2360 bytes.
> 
> Anyone know how one might track these down (what debug areas
> to start with)?  I've seen 2 in the last 30 hours or so.
> 
> Rerunning the same message through with spamc gives the same
> score, but no __alarm__ messages.
> 
> SA 3.1rc1

They are quite easy to find in the code. Looking at them, I can only say
that I don't understand how they get used in the first place. It seems to be
using the return value of alarm, but I don't have a clue why. It returns the
value of the previous timer, or 0. But from what I can see and how it us
used, you only want to set it back to zero.

Next to that, if an eval which has an alarm, fails, the alarm isn't
imidiately reset, but calls some other code first (
$permsgstatus->leave_helper_run_mode(); ).

Last, I would say the evals are quite wide. Beside the actual code that
calls Pyzor/DCC, the also contain or other bits, meaning they can time out
even though the called app ran succesfull (but took long, but still less
than the specified timeout by the user).

Kind Regards,
Sander Holthaus 



RE: How to reenable DCC and Razor (was RE: ANNOUNCE: SpamAssassin 3.1.0-rc1 release candidate available!)

2005-08-14 Thread Sander Holthaus - Orange XL
Dan Kohn wrote:
> Justin said:
> 
>> - - Razor: disable Razor2 support by default per our policy, since
>>  the service is not free for non-personal use.  It's trivial to
>> reenable. 

But is that correct? As far as I can know, it is free to use for
non-personal use.

Kind Regards,
Sander Holthaus



RE: DCC vs Razor2

2005-08-10 Thread Sander Holthaus - Orange XL
William Albert wrote:
> Dr Robert Young wrote:
> 
>> We have been using Razor2 for some time on SA 3.0.4. I was recently
>> reading about DCC. We have never tried it, so I was wondering about
>> opinions as to its use. How effective is it? Should it be used with,
>> or in place of, Razor?
> 
> SpamAssassin will use both, so there's no need to choose
> between the two unless network traffic is a major concern.

I would use both. Razor2 is more effective than DCC, but DCC seems to
generate more errors in connecting, so perhaps it could have the same
recognition rate as Razor2.
There is no guarentee that either one won't produce false positives or that
it has an 100% uptime. So using both (and perhaps even Pyzor) for me seems a
sensible thing.

Some stats for recognized spam ( >10 ) over the last 10 days:

BAYES_99  ( 97%)
RAZOR2_CHECK  ( 86%) <--
  RAZOR2_CF_RANGE_51_100  ( 86%) <--
HTML_MESSAGE  ( 69%)
 DIGEST_MULTIPLE  ( 66%)
 URIBL_BLACK  ( 66%)
 PYZOR_CHECK  ( 57%) <--
  URIBL_JP_SURBL  ( 57%)
   DCC_CHECK  ( 57%) <--
   URIBL_SBL  ( 54%)
 URIBL_SC2_SURBL  ( 51%)
  URIBL_OB_SURBL  ( 51%)
  URIBL_XS_SURBL  ( 47%)

Kind Regards,
Sander Holthaus



RE: [sa-list] Re: spamd children run as root (again)

2005-08-09 Thread Sander Holthaus - Orange XL
I've been running spamc and spamd (3.0.4) on FreeBSD 4.10 with Perl 5.8.5
for quite a while, but using the -u vmail flag doesn't cause any problems. 

vmail   15329  0.0  2.9 59052 30300  ??  INsJ  5:55AM   0:03.05
/usr/local/bin/spamd -x -d -m 2 -r /var/run/spamd/spamd.pid -u vmail
--socketpath=/tmp/spamd.sock -H /usr/local/mail/.spamassassin
vmail   15355  0.0  5.9 64984 61072  ??  INJ   5:55AM   1:39.07 spamd child
(perl5.8.5)
vmail   15356  0.0  6.0 67352 63096  ??  INJ   5:55AM   0:24.58 spamd child
(perl5.8.5)

However, it does behave odd when using sa-learn. Sometimes (but only
sometimes), it will change the owner of one of the bayes_ files or
bayes.mutex to root. :-?

Sander Holthaus

Dan Mahoney, System Admin wrote:
> On Tue, 9 Aug 2005, Craig McLean wrote:
> 
> I applied the patch, and it fixed things on my end.  I noted
> in my PR that it was also odd to me that before, the children
> showed in ps as "perl" and afterwards as "perl5.8.6" or something
> very similar. 
> 
>> FWIW I *don't* see this issue on FBSD 5.2.1 running SA 3.0.4 with
>> perl 
>> 5.6.1
>> 
>> Craig.
>> 
>> Justin Mason wrote:
>>> 
>>> ah, good to hear -- although it would have been nice to
> have had that
>> noted on bug 3900, which was still listed as "awaiting
>> confirmation"... 
>>> 
>>> --j.
>>> 
>>> Charles Sprickman writes:
>>> 
> I've seen this problem as well, even in the latest "ports"
> version. Still runs as root.  If I apply the attached patch
> (obtained from one of the
>> bugzilla entries), it works properly.  Running FBSD 4.11 w/perl 5.6.2
>> (5.8.7 had the same problem, I backed out of 5.8 since it chewed up
>> more
> memory than I was comfortable with).
> Charles
> On Mon, 8 Aug 2005, Dan Mahoney, System Admin wrote:
>> On Tue, 26 Apr 2005, Justin Mason wrote:
>>> It's specifically a problem with perl on *BSD platforms --
>>> there's a
>> bug open about it, but it's stalled because we don't have any
>> developers with BSD machines ;)
>> Anyone want a test machine where this is occurring?  Where it
>> DIDN'T
>> occur
>> before under 3.0.3?  Contact me offlist.
>> I've had a bugzilla report sitting in "NEW" status for over a
>> month
>> now, I
>> think.  I flagged it as "security" because I a) thought maybe
>> there
>> was some
>> priority to that and b) actually believe it to be, but nobody has
>> done
>> 
>> anything with it.
>> http://bugzilla.spamassassin.org/show_bug.cgi?idD98
>> -Dan
>>> at least on some platforms (MacOS X) it appears perl's setuid
>>> support
>> substantially does not work.
>>> --j.
>>> Brandon Kuczenski writes:
 I've seen this question posted a couple times in the mailing
 list
>> archives
 (from October 2004) but no resolution.  The question again:
 I'm running SpamAssassin 3.0.2 on FreeBSD 4.10 in spamc/spamd
 format
>> with
 the '-u spamd' flag.  Problem is, all the child processes are
 running as root: $ ps aux | grep spam
 root  333  0.0 10.1 27636 25932  ??  I11Apr05  
 1:03.83 spamd child (perl) root  332  0.0 10.5 29020 27032
 ??  I11Apr05   1:07.96 spamd child (perl) root  331 
 0.0  9.7 26544 24852  ??  I11Apr05   0:52.68 spamd child
 (perl) root  330  0.0  9.9 27152 25524  ??  I11Apr05  
 1:04.40 spamd child (perl) root  329  0.0  9.8 26864 25116
 ??  I11Apr05   0:58.08 spamd child (perl) spamd 294 
 0.0  7.1 22392 18220  ??  Is   11Apr05  
>   0:01.61
>> /usr/local/bin/spamd -d -c -u spamd -H /home/spamd -r
>> /var/run/spamd.pid
 (perl)
 $
 Is this intended or is it a bug?  The two threads I've seen
 that pertain to it (both dating from Oct04) are left
 unresolved: 
 
> http://thread.gmane.org/gmane.mail.spam.spamassassin.general/579
 00
>> http://thread.gmane.org/gmane.mail.spam.spamassassin.general/58087
>> The practical consequence of this (aside from the unorthodoxy --
>> undesired
 processes owned by root) is that the permissions of my
 ~user/.spamassassin/bayes_journal file get changed to
 root:spamd 0660. I wanted them to be spamd:user 0660, so that
 the user can run 
>> sa-learn without asking for root's help.  Is that not the 'right way'
>> to
 do things?
 Has there been a resolution to this question?  If not, ..
 doesn't
>> everybody have this problem?  Or is it not a problem?  If not, why
>> not?
 -Brandon
>>>  Output from gpg 
>> 298BC7D0
>> gpg:  There is no indication that the signature belongs to
>> the
>>> owner.
>> 298B C7D0
>> --
>> "Don't try to out-wierd me.  I get stranger things than you free
>> with
>> my
>> breakfast cereal."
>> -Button seen at I-CON XVII (and subsequently purchased)
>> Dan Mahoney--

RE: Install Issue

2005-08-01 Thread Sander Holthaus - Orange XL
Daniel Straka wrote:
> [EMAIL PROTECTED] Mail-SpamAssassin-3.0.0]# perl Makefile.PL Perl
> v5.6.1 required--this is only v5.6.0, stopped at Makefile.PL line 2.
> 
> Now what can I do? RH 7.2
> 
> Dan Straka
> Casper College
> (307)268-2399
> 
>  **  Visit Casper College Online at www.caspercollege.edu  **

"v5.6.1 required--this is only v5.6.0" seems pretty obvious. Upgrade RH to a
newer version, 7.3 includes 5.6.1. You can also upgrade Perl to a newer
version (current is 5.8.7), but this may have some side-effects (not
familiar with RH).

I'm pretty sure Google can answer your question much faster and better than
this maillinglist btw ;-)

Kind regards,
Sander Holthaus

PS: Sorry for the short answer, but answering it completely would be
somewhat offtopic and long. More important, you will learn much more by
finding the answer yourself (including learning how to find answers :-) ).



RE: Please test sc2.surbl.org (and xs.surbl.org)

2005-07-29 Thread Sander Holthaus - Orange XL
>From the last three days:

SpamAssassinRuleHits for SPAM (score 10 and higher): 
BAYES_99  ( 95%)
RAZOR2_CHECK  ( 90%)
  RAZOR2_CF_RANGE_51_100  ( 85%)
 DIGEST_MULTIPLE  ( 74%)
 URIBL_BLACK  ( 72%)
HTML_MESSAGE  ( 71%)
   DCC_CHECK  ( 66%)
  URIBL_OB_SURBL  ( 60%)
  URIBL_JP_SURBL  ( 60%)
  URIBL_WS_SURBL  ( 57%)
 URIBL_SC2_SURBL  ( 57%)  <--
 PYZOR_CHECK  ( 55%)
   URIBL_SBL  ( 52%)
  URIBL_SC_SURBL  ( 50%)
  URIBL_XS_SURBL  ( 44%)  <--
  URIBL_AB_SURBL  ( 43%)
  MIME_HTML_ONLY  ( 40%)
   RCVD_IN_SORBS_DUL  ( 39%)
 FORGED_OUTLOOK_TAGS  ( 31%)
   RCVD_IN_NJABL_DUL  ( 30%)

Kind Regards,
Sander Holthaus



RE: RBL lookup failures

2005-07-26 Thread Sander Holthaus - Orange XL
Daniel O'Connor wrote:
> Hi,
> I am using Spam Assassin 3.0.4 called from MIMEDefang 2.51 on
> a FreeBSD
> 4.9 box with perl 5.6.2 and I get the following messages in
> my maillog on occasion..
> 
> Jul 26 12:43:36 cain sm-mta[81183]: j6Q3DUp1081183:
> from=<[EMAIL PROTECTED]>, size=4221,
> class=-30, nrcpts=1, msgid=<[EMAIL PROTECTED]>,
> proto=ESMTP, daemon=smtp, relay=mx2.freebsd.org [216.136.204.119]
> Jul 26 12:43:36 cain mimedefang-multiplexor[80550]: Slave 0
> stderr: Failed to run __RFC_IGNORANT_ENVFROM RBL SpamAssassin
> test, skipping:   (Can't call method "bgsend" on an
> undefined value at
> /usr/local/lib/perl5/site_perl/5.6.2/Mail/SpamAssassin/Dns.pm line
> 112. ) Jul 26 12:43:36 cain mimedefang-multiplexor[80550]: Slave 0
> stderr: Failed to run NO_DNS_FOR_FROM RBL SpamAssassin test,
> skipping:  (Can't call method "bgsend" on an undefined
> value at
> /usr/local/lib/perl5/site_perl/5.6.2/Mail/SpamAssassin/Dns.pm line
> 141. ) Jul 26 12:43:36 cain mimedefang-multiplexor[80550]: Slave 0
> stderr: Failed to run DNS_FROM_AHBL_RHSBL RBL SpamAssassin
> test, skipping:  (Can't call method "bgsend" on an undefined value at
> /usr/local/lib/perl5/site_perl/5.6.2/Mail/SpamAssassin/Dns.pm
> line 112. ) Jul 26 12:43:36 cain mimedefang.pl[80553]:
> MDLOG,j6Q3DUp1081183,mail_in,,,<[EMAIL PROTECTED]
org>,<[EMAIL PROTECTED]>,6-BETA1 iwi + wpa_supplicant fails, > and
sometimes silently reboots 
> 
> ie the slave errors.
> 
> I have..
> skip_rbl_checks 0
> use_razor2 0
> 
> ##
> # # Add your own customised scores for some tests below. 
> The default scores are # read from the installed
> "spamassassin.cf" file, but you can override them # here.  To
> see the list of tests and their default scores, go to #
> http://spamassassin.taint.org/tests.html .
> 
> urirhssub URIBL_JP_SURBL  multi.surbl.org.A   64
> body  URIBL_JP_SURBL  eval:check_uridnsbl('URIBL_JP_SURBL')
> describe  URIBL_JP_SURBL  Has URI in JP at
> http://www.surbl.org/lists.html
> tflagsURIBL_JP_SURBL  net
> 
> score URIBL_JP_SURBL3.0
> 
> trusted_networks 203.31.81.0/24 203.122.192.0/26 dns_available yes
> 
> in the MD .cf file.
> 
> Anyone have any clues about how I can resolve this?
> Thanks.

I can't be sure, but I think the problem occurs if something went wrong with
creating Net::DNS::Resolver object, on which the method bgsend gets called.
Not sure why that error would occur, but older versions of DNS.pm have some
problems/bugs. Do nslookups/digs from the command prompt work? And a very
simple script that does DNS-lookups?

My best bet would be to upgrade Perl to a recent version, such as 5.8.7.
This should be fairly easy on FreeBSD through the ports-system. (Be aware
though that some installed modules may need to be re-upgraded/re-installed,
though I believe they made some changes to the perl-port recently to do this
automatically.)
Also upgrade to a recent version of Net::DNS (0.52+).

Personally, I would use Razor2 too. It has an excellent spam-detection
ratio, much better than rbl's/uribl's.

Kind Regards,
Sander Holthaus



RE: How can I correct this FalsePositive?

2005-07-15 Thread Sander Holthaus - Orange XL
Kai Schaetzl wrote:
> Thomas Booms wrote on Fri, 15 Jul 2005 10:29:35 +0200:
> 
>> Content analysis details:   (2.2 points, 2.0 required)
> 
> Your problem is this setting. You should know by now from
> following the list that this is stupid. So, why do you do that and
> then ask for help? Set your spam threshold correctly and your FP
> problem is gone. 
> 
> Kai

2.0 is indeed low. However, I would also notify the sending party, because
their mail looks needlessly spammish. No plaintext content, no MIME-headers,
a webbug, bad html/css...

Using Bayes might help your problem and you can whitelist the sender. But
with a 2.0 point-level for spam, you'll always gonna have some FP's.

Kind Regards,
Sander Holthaus



RE: cleaning whitelist?

2005-07-14 Thread Sander Holthaus - Orange XL
Dr Robert Young wrote:
> I was wondering since our whitelist database is currently >
> 8Mb and when looking at it, a vast majority are only /1 scores...
> 
> 
> 
> On Jul 14, 2005, at 7:31 AM, Kai Schaetzl wrote:
> 
>> Dr Robert Young wrote on Wed, 13 Jul 2005 22:00:04 -0400:
>> 
>>> How often should one "clean" the whitelist ?
>> 
>> Unless there is a problem: never.
>> 
>> Kai
>> 
>> --
>> Kai Schätzl, Berlin, Germany
>> Get your web at Conactive Internet Services: http://www.conactive.com
>> IE-Center: http://ie5.de & http://msie.winware.org

I would suggest it depends on the volume of mail going through your server
and your users email-habits. A lot of one-time AWL entries are spammers, but
certainly not all. Seeing how AWL works, it cannot harm anyone doing an
occasional cleanup.
A few (recent) legitimate entries are lost, but the chance that this will
lead to a false positive or false negative is small. 

Cleaning the database as much as once a day is a bad idea, even cleaning
once a week may/will do more harm than good. With once a month, you should
be in the clear, but the best way would to just keep an eye one it, and
clean it when you notice there are too many one-time entries.

Do something like '  | grep "/1)
--" | wc -l' to count the one time-entries.

Kind Regards,
Sander Holthaus



RE: sa-learn user

2005-07-14 Thread Sander Holthaus - Orange XL
Matt Kettler wrote:
> At 09:29 PM 7/13/2005, Mun Fai wrote:
> 
>> I'd appreciate it if anyone could spot the user.
> 
> spamd is the user your mail gets scanned as.
> 
> All the spamd children and the main spamd process are running as that
> user. The main spamd process has it's command like cut off, but I
> bet if you checked it's got a "-u spamd" in it somewhere.

Also check to make sure that your bayes datebase it writeable by user spamd,
but in most cases, the installer does this for you.

You might want to check out the manual pages ("man ") for commands
such as ps, grep, awk, sed, sort, ls, etc. Mastering the basic *nix commands
can make your live a lot easier and save you a lot of time googling or
scouring/posting mailinglists ;-) If your are serious about spamassassin
and/or system administration, some reference books (O'Reilly) can also be of
great help.

Kind Regards,
Sander Holthaus



RE: How to shut down

2005-07-12 Thread Sander Holthaus - Orange XL



That might be a little overkill though it does the job 
;-)
 
Stopping running things on *nix platforms is 
generally done by killing them, along with their children. "man kill" will 
teach you how. Programs that start when during boot usually have special scripts 
to both start and stop them, your best option is to use them. Where they live 
depends on your platform and distribution, use the supplied documentation, the 
man command and google to find out exactly where. If you don't want it to run at 
all at bootup, disable the script (various ways of doing 
that).
 
Kind Regards,
Sander Holthaus
 
PS: Never turn on things for which you don't know how 
to turn them off. 
 

  
  
  From: Chris Santerre 
  [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 12, 
  2005 7:19 PMTo: users@spamassassin.apache.orgSubject: 
  RE: How to shut down 
  
  Unplug the power to the server.
   
  If 
  that fails, I assume you would need to contact the person that set it up. You 
  sysadmin could allow all your email to come thru without being scanned. 
  
   
  I 
  hope that helps,
   
  Thinking of you,
   
  Tom 
  Cruise
  
-Original Message-From: Michael 
[mailto:[EMAIL PROTECTED]Sent: Tuesday, July 12, 2005 12:48 
PMTo: users@spamassassin.apache.orgSubject: How to 
shut down 
How to shut down the spamassassin? so it doesnt 
run ??


RE: simultaneous sa-learn processes

2005-07-11 Thread Sander Holthaus - Orange XL
JamesDR wrote:
> Chavdar Videff wrote:
>> Hi List,
>> 
>> Our mailserver server serves about 100 users. Our config:
>> Sendmail+Procmail+SpamAssassin.
>> The question is:
>> If I got it right, we should run sa-learn for each user in order to
>> benefit from bayes. We intend to run a cron job for each user and do
>> it at night by supplying a daily snapshot of our spam and ham
>> collections to sa-learn. Can our mailserver handle it (256 MB RAM,
>> Celeron 400 Mhz)?

Why would you want to setup Bayes on a per user basis if you are going to
feeed it system-wide hams and spams? Especially feeding it systemwide hams
is odd.
 
>> A weekly collection run for 1 user usually eats 100% of CPU load. My
>> concern is whether the system is going to crash or just do the job
>> slower and if you can point out how many sa-learn tasks could we run
>> simultaneously with our setup.

Systems shouldn't crash under high load, so that's not a real concern. If it
does happen, you have a more serious problems elswhere. What would be more
of a concern is how it is going to affect other processes running on your
system. Slower is not a problem, but if you really put the load on your box
from a lot of processes, you might start seeing time-outs.

>> All hints will be appreciated, for we scheduled an initial load for
>> 16 users of the big collection of spam received so far.

If your are going to simultaniously learn spam and ham for 16 users, and
want to keep running your mailserver/spamassassin too (it take you also have
a virusscanner running somewhere), I would consider at least running the
sa-learn processes under nice to keep them from stalling more essential
services. But, depending on your System setup (OS, DB, etc) you might want
to cut down a little on the number of processes run simultaniously. 

>> 
>> Thanks guys
>> 
>> Chavdar Videff
>> 
>> 
> What kind of Bayes db are you using? We use MySQL here and
> haven't seen SA-Learn use up that much cpu... I've run it
> manually up to 10 processes at once without any noticeable
> slowing of the machine. (p2 450mhz, 256mb)




RE: Scoring - Display none

2005-07-08 Thread Sander Holthaus - Orange XL
Jean-Paul Natola wrote:
> -Original Message-
> From: Steven Dickenson [mailto:[EMAIL PROTECTED]
> Sent: Friday, July 08, 2005 2:47 PM
> To: Jean-Paul Natola
> Cc: users@spamassassin.apache.org
> Subject: Re: Scoring - Display none
> 
> Jean-Paul Natola wrote:
>> Note to sound t  ignorant , but how can I add that rule
> because it
> seems
>> that I don't have that installed. On that note how can I make sure
>> all the SARE rules are updated.
> 
> Download it from the SARE website and install it in
> /etc/spamassassin or /etc/mail/spamassassin; wherever your local.cf
> is.  Then HUP spamd. 
> 
> Keep SARE rules up to date with RulesDuJour.
> 
> PS - Please don't top-post.  This isn't a Windows list.  If
> you're running Outlook, check out Outlook-QuoteFix.
> http://jump.to/outlook-quotefix
> 
> Steven
> 
> Ok I did a search for all SARE files, apparently  they were installed,
> 
> So I downloaded the info so I can set it to run automatically.
> 
> The only thing that I'm a bit confused with is why half of the them
> reside in; /usr/local/etc/mail/spamassassin/72_sare_bml_post25x.cf
> ./usr/local/etc/mail/spamassassin/99_sare_fraud_post25x.cf
> 
> And the other half reside in
> 
> ./usr/ports/mail/spamass-rules/work/71_sare_redirect_pre3.0.0.cf
> ./usr/ports/mail/spamass-rules/work/70_sare_bayes_poison_nxm.cf
> 
> Is this a misconfiguration on my part, Should I be concerned?

No, those last two are in your working-dir of FreeBSD's port-directory. Go
to /usr/ports/mail/spamass-rules/ and do "make clean". They will disappear.
Persnally, I don't use that port, though I do use some of it's rulesets.
Best way is to install a script that updates your custom rules-sets on a
more regular basis (though only a few actually need that).

Kind Regards,
Sander Holthaus



RE: ALL_TRUSTED and Razor, DCC and Pyzor

2005-07-07 Thread Sander Holthaus - Orange XL
I think that is an excellent idea!

I call spamc from maildrop, so I can filter out some message's that do not
need to be processed by SpamAssassin. But it would be much easier for most
installations if such behaviour can be done from within SpamAssassin.

You might want to add even an extra option that doesn't scan local messages
(things like daily/weekly/monthly outputs), e.g. mail from the box itself
that spamassassin is running on
An option that disables scanning from or to certain addresses entirely (for
instance, if you have an mail-account friends can send you some sample spam
to which doesn't require filtering nor anything like AWL(learning or
Bayes(learning)).

Kind Regards,
Sander Holthaus

> -Original Message-
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Friday, July 08, 2005 12:33 AM
> To: Theo Van Dinter
> Cc: users@spamassassin.apache.org
> Subject: Re: ALL_TRUSTED and Razor, DCC and Pyzor 
> 
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> 
> Theo Van Dinter writes:
> > On Thu, Jul 07, 2005 at 04:34:03PM -0500, Kenneth S. wrote:
> > > Is there anyway to configure SA so that if the 
> ALL_TRUSTED rule is 
> > > hit it skips the Razor, DCC and Pyzor tests?
> > 
> > Not without modifying code.
> 
> However, it is something we've been thinking of. patches welcome! ;)
> 
> ps: fwiw, we were considering that rules like ALL_TRUSTED 
> that are 100% trustworthy would be set to run at a higher 
> priority (that's
> implemented) and cause the check to exit immediately (that's not).
> 
> - --j.
> -BEGIN PGP SIGNATURE-
> Version: GnuPG v1.2.5 (GNU/Linux)
> Comment: Exmh CVS
> 
> iD8DBQFCza2bMJF5cimLx9ARAt5OAJ9J/AOBFbr8g3ii6dC2xxc64ouO0QCdGLX2
> 4LU3Kh861VAxZGv5Hs6TTM0=
> =IgAN
> -END PGP SIGNATURE-
> 



RE: spamassassin --lint ....how long does it take?

2005-07-07 Thread Sander Holthaus - Orange XL
> I downloaded many of the SARE rulesets (not  bigevil 
> however), and I am running  "spamassassin -D --lint". It 
> seems like it is taking a very long time to run. Is this 
> typical or am I "hosed"?  I am running it on a test system 
> (non-production) so it is not currently a serious problem, 
> but I want to be sure of what's up before I try anything on 
> production (probably in a few days).

How many? Or better, can you specify which you downloaded?



RE: Autolearn problem

2005-07-06 Thread Sander Holthaus - Orange XL
 
 > Hi,
> These are the lines in my local.cf:
> 
> bayes_path /home/sharedspam/.spamassassin/bayes
> auto_whitelist_path /home/sharedspam/.spamassassin/auto-whitelist
> bayes_file_mode 777
> auto_whitelist_file_mode 777
> lock_method flock

777 is rather insecure... What's the output of ls -al on your .spamassassin
directory?
 
> Except for the last line, the others were also present for my 
> 2.63 config and I believed they worked fine.
> 
> Lint and sync seem to work fine. I've also tried the following:
> 
> ./check_whitelist /home/sharedspam/.spamassassin/auto-whitelist
> -2.3  (-43.1/19)  --  [EMAIL PROTECTED]|ip=217.15
> -2.0(-6.1/3)  --  [EMAIL PROTECTED]|ip=217.15
> -2.8(-5.5/2)  --  [EMAIL PROTECTED]|ip=none
> -2.8(-2.8/1)  --  [EMAIL PROTECTED]|ip=217.15
> -2.6(-2.6/1)  --  [EMAIL PROTECTED]|ip=217.15
> -2.6(-5.3/2)  --  [EMAIL PROTECTED]|ip=217.15
> I'm assuming that's fine.

Don't see a problem
 
> Finally, I also tried:
> spamassassin -D < 
> /root/.cpan/build/Mail-SpamAssassin-3.0.4/sample-spam.txt
> 
> This gave me the following: 
> 
> From: Sender <[EMAIL PROTECTED]>
> To: Recipient <[EMAIL PROTECTED]>
> Subject: *SPAM* Test spam mail (GTUBE)
> Date: Wed, 23 Jul 2003 23:30:00 +0200
> Message-Id: <[EMAIL PROTECTED]>
> X-Spam-Flag: YES
> X-Spam-Checker-Version: SpamAssassin 3.0.4 (2005-06-05) on 
> mailhost.datastream.com.mt
> X-Spam-Level: **
> X-Spam-Status: Yes, score=997.2 required=5.0 tests=ALL_TRUSTED,
> DNS_FROM_AHBL_RHSBL,GTUBE autolearn=unavailable version=3.0.4
> 
> Wierdly this reports unavailable as opposed to failed. 

I think that for certain messages, such as ALL_TRUSTED, bayes learning is
unavailable. At least, that is what I'm seeing here.

> 
> -Original Message-
> From: crisppy fernandes [mailto:[EMAIL PROTECTED]
> Sent: 06 July 2005 12:28
> To: Joe Borg; users@spamassassin.apache.org
> Subject: Re: Autolearn problem
> 
> On 7/6/05, Joe Borg <[EMAIL PROTECTED]> wrote:
> > Hi,
> > I've tried moving the file. What happens is that a new one 
> is created; 
> > usually belonging to a different user on the system and 
> then, that get
> 
> This is something strange.
> 
> > stuck. In other words the problem keeps on reoccurring.
> 


Which user? And as which user is spamassassin installed? As which user is
spamd running, and as which users are spamd's children running? 



RE: dcc / razor

2005-07-05 Thread Sander Holthaus - Orange XL
> Ronan McGlue wrote:
> > what is the official stance on using razor/dcc for not personal use.
> > I've looked at the 3.1 docs and its off by default. I cant seem to 
> > find any liscencing info on either site. Anyone got any URLS/ info 
> > regarding this issue. Im using it at a uni here sitewide so 
> Id like to 
> > forgo a lawsuit...
> > 
> > Ronan
> > 
> 
> Someone already pointed out the one for DCC. Here's the one for razor:
> 
> http://razor.sourceforge.net/docs/doc.php?type=text&name=SERVI
> CE_POLICY
> 
> Although that policy is considerably vague.

"Use of the SpamNet service by Razor-agent-enabled software will remain
free for personal use, subject to capacity constraints that Cloudmark may
enforce against intensive users of the service as it sees fit"

That's indeed vague. They talk about personal use and about commercially
embedded software, but where does this leave and ISP? It also means that
anyone not paying license fee's has absolutely no guarentee that the service
will remain reliable.

Kind Regards,
Sander Holthaus



RE: autolearn=failed

2005-07-05 Thread Sander Holthaus - Orange XL



File-permission issue?

  
  
  From: Michael [mailto:[EMAIL PROTECTED] 
  Sent: Tuesday, July 05, 2005 6:21 PMTo: 
  users@spamassassin.apache.orgSubject: 
  autolearn=failed
  
  When I receive spam msgs I have this line in 
  headers autolearn=failed .I tried to look it on wiki but I couldn't find 
  anything. I know it that the SA cannot gain lock on bayes database. I 
  checked  maillog file but I couldn't find anything that was suspicious. 
  Anybody could help me with this one 
?


RE: dcc / razor

2005-07-05 Thread Sander Holthaus - Orange XL
As far as I know, the licensing issues have to do with people selling
antispam sollutions. E.g. you can download and install DCC/Razor and turn
them on in SpamAssassin, but you cannot sell (SpamAssassin with) Razor/DCC
as part of an anti-spam sollution without paying license fee's.

Kind Regards,
Sander Holthaus

> -Original Message-
> From: Ronan McGlue [mailto:[EMAIL PROTECTED] 
> Sent: Tuesday, July 05, 2005 6:48 PM
> To: SPAMASSASSIN
> Subject: dcc / razor
> 
> what is the official stance on using razor/dcc for not personal use. 
> I've looked at the 3.1 docs and its off by default. I cant 
> seem to find any liscencing info on either site. Anyone got 
> any URLS/ info regarding this issue. Im using it at a uni 
> here sitewide so Id like to forgo a lawsuit...
> 
> Ronan



RE: Domain of the sender does not resolve

2005-02-25 Thread Sander Holthaus - Orange XL
> Hi all,
> I think this is a general sendmail issue, but maybe you can 
> help me figure out what to do.
> I'm seeing mails being rejected for with "Domain of the 
> sender does not resolve", how do I disable the mta of 
> rejecting these? will adding this to the sendmail.mc file help me?
> FEATURE(`accept_unresolvable_domains')dnl
> 
> Many thanks,
> 
> Yang

I'm not in to Sendmail, but you might want to ask yourself if you really
want to accept such mails. Usually, such mail are spam or virusses and if
the domain does not resolve, can cause problems when those messages get
bounced.
If the mails are legitimate (very small chance), it means that there is a
configuration error somewhere. A good sysadmin will pickup that a large
portion of outgoing mail is refused with "Domain of the sender does not
resolve".

One other thing, you might want to check your DNS-server(s) and verify it's
results. I've had a similar problem recently and it turned out the
DNS-server was bricked. 

Kind Regards,
Sander Holthaus

PS: This message doesn't really belong on the SpamAssassin mailinglist...



RE: spamd hanging or looping

2005-02-15 Thread Sander Holthaus - Orange XL
> On Mon, Feb 14, 2005 at 08:18:47PM +0100, Henk van Lingen wrote:
>   >
>   > Additional info on this bug:
> 
>   Being a bit surprised about the lack of interest in a bug 
> like this here,
>   I'm trying to submit something to this 'bugzilla'. I've 
> made an account
>   and now it suggest reading the guidelines. However:
> 
>   "The requested URL /bugwritinghelp.html was not found on 
> this server."
> 
>   So, what is priority, severity and URL? Or does it not 
> really matter?
>   And what component to choose? spamd seems logical but i 
> think the prob
>   is in a library.
> 
>   (I hate having to edit a message without vim in a 'textarea' :-))

>From reading your messages, I wouldn't be sure if it is a bug in spamd
itself. It could very well be in either the Perl-version or related modules
your are using. Some are actually quite old and have known bugs in them
which can lead to endless loops.

Before submitting a bugreport, upgrade perl and related modules to their
latest versions. Also save your db-files and the related messages (which can
be handy if it is indeed an unresolved bug).

Kind Regards,
Sander Holthaus

PS: What OS and which Perl-version are you using?



RE: Less spam blocked with 3.02 - AWL-related?

2005-02-14 Thread Sander Holthaus - Orange XL
> On Fri, Feb 11, 2005 at 10:37:06AM -0500, Chris Santerre wrote:
> > 
> > >
> > >> > 4) Can you share the output from a --lint with us?
> > >> $ spamassassin --lint
> > >> [EMAIL PROTECTED]:~$
> > >
> > >What about spamassassin -D --lint?
> > >
> > >Kind Regards,
> > >Sander Holthaus
> > 
> > LOL, yeah I need to start typing exactly what I mean :)
> > 
> > Can we get the output from the "spamassassin -D --lint" please?
> 
> I should have read what you meant!  Ok.  Here it is ( have 
> since added a few more rules ) :
> 
> debug: SpamAssassin version 3.0.2
> debug: Score set 0 chosen.
> debug: running in taint mode? yes
> debug: Running in taint mode, removing unsafe env vars, and 
> resetting PATH
> debug: PATH included '/home/spamd/bin', keeping.
> debug: PATH included '/usr/local/bin', keeping.
> debug: PATH included '/usr/bin', keeping.
> debug: PATH included '/bin', keeping.
> debug: PATH included '/usr/bin/X11', which doesn't exist, dropping.
> debug: PATH included '/usr/games', keeping.
> debug: Final PATH set to: 
> /home/spamd/bin:/usr/local/bin:/usr/bin:/bin:/usr/games
> debug: diag: module installed: DBI, version 1.21
> debug: diag: module installed: DB_File, version 1.75
> debug: diag: module installed: Digest::SHA1, version 2.00
> debug: diag: module installed: IO::Socket::UNIX, version 1.20
> debug: diag: module installed: MIME::Base64, version 2.12

You should upgrade that one 

> debug: diag: module installed: Net::DNS, version 0.48
> debug: diag: module not installed: Net::LDAP ('require' failed)
> debug: diag: module not installed: Razor2::Client::Agent 
> ('require' failed)
> debug: diag: module installed: Storable, version 1.014
> debug: diag: module installed: URI, version 1.18

You should probably upgrade this one as well

> debug: ignore: using a test message to lint rules
> debug: using "/etc/spamassassin/init.pre" for site rules init.pre
> debug: config: read file /etc/spamassassin/init.pre
> debug: using "/usr/share/spamassassin" for default rules dir
> debug: config: read file /usr/share/spamassassin/10_misc.cf
> debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf
> debug: config: read file /usr/share/spamassassin/20_body_tests.cf
> debug: config: read file /usr/share/spamassassin/20_compensate.cf
> debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf
> debug: config: read file /usr/share/spamassassin/20_drugs.cf
> debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf
> debug: config: read file /usr/share/spamassassin/20_head_tests.cf
> debug: config: read file /usr/share/spamassassin/20_html_tests.cf
> debug: config: read file /usr/share/spamassassin/20_meta_tests.cf
> debug: config: read file /usr/share/spamassassin/20_phrases.cf
> debug: config: read file /usr/share/spamassassin/20_porn.cf
> debug: config: read file /usr/share/spamassassin/20_ratware.cf
> debug: config: read file /usr/share/spamassassin/20_uri_tests.cf
> debug: config: read file /usr/share/spamassassin/23_bayes.cf
> debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf
> debug: config: read file /usr/share/spamassassin/25_hashcash.cf
> debug: config: read file /usr/share/spamassassin/25_spf.cf
> debug: config: read file /usr/share/spamassassin/25_uribl.cf
> debug: config: read file /usr/share/spamassassin/30_text_de.cf
> debug: config: read file /usr/share/spamassassin/30_text_fr.cf
> debug: config: read file /usr/share/spamassassin/30_text_nl.cf
> debug: config: read file /usr/share/spamassassin/30_text_pl.cf
> debug: config: read file /usr/share/spamassassin/50_scores.cf
> debug: config: read file /usr/share/spamassassin/60_whitelist.cf
> debug: config: read file /usr/share/spamassassin/65_debian.cf
> debug: using "/etc/spamassassin" for site rules dir
> debug: config: read file /etc/spamassassin/70_sare_bayes_poison_nxm.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj0.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj1.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj3.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj_arc.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj_eng.cf
> debug: config: read file /etc/spamassassin/70_sare_genlsubj_x30.cf
> debug: config: read file /etc/spamassassin/70_sare_header.cf
> debug: config: read file /etc/spamassassin/70_sare_header0.cf
> debug: config: read file /etc/spamassassin/70_sare_header1.cf
> debug: config: read file /etc/spamassassin/70_sare_header2.cf
> debug: config: read file /etc/spamassassin/70_sare_header3.cf
> debug: config: read file /etc/spamassassin/70_sare_header_arc.cf
> debug: config: read file /etc/spamassassin/70_sare_header_eng.cf
> debug: config: read file /etc/spamassassin/70_sare_header_x264_x30.cf

This one is not intended for 3.x, already included in the base distribution.

> debug: config: read file /etc/spamassassin/70_sare_header_x30.cf

Thi

RE: spamd hanging or looping

2005-02-14 Thread Sander Holthaus - Orange XL
Is there anything specific with the message? (well, there most likely is :-)
). What kind of message is it? Are Perl and all related modules up to date
with latest versions? What is the absolute last message that is logged?

Kind Regards,
Sander Holthaus

> -Original Message-
> From: Henk van Lingen [mailto:[EMAIL PROTECTED] 
> Sent: Sunday, February 13, 2005 4:04 PM
> To: users@spamassassin.apache.org
> Subject: spamd hanging or looping
> 
> 
> Hi,
> 
> I'm using SA 3.0.2 on a CentOS 3.4 Linux system in a 
> spamd/spamc setup.
> spamc is called via 'maildrop' with user privileges. Now the 
> problem is, I have a message in my queue (postfix) that seems 
> to hang spamd. Every time it tries local delivery, spamc 
> times out after 10 minutes but the spamd child is going on 
> using the CPU forever, without doing any syscalls.
> The last thing spamd is telling in debug mode is it is doing 
> 'tokenize:'
> steps.
> 
> I don't know how to investigate this further?
> 
> Regards,
> -- 
> Henk van Lingen, Systems & Network Administrator  
> (o-  -+
> Dept. of Computer Science, Utrecht University.
> /\|
> phone: +31-30-2535278v_/_
> http://henk.vanlingen.net/ 
> http://www.tuxtown.net/netiquette/



RE: Humor: "The Ultimate Spam Email"

2005-02-11 Thread Sander Holthaus - Orange XL
 > >
> >Scores pretty well (14.3, 9.0 required), though, I heard that 
> >SpamAssassin
> >3+ included the SARE_FRAUD rules (or similiar). But looking at
> >the report, I
> >don't see any fraud-hits from SpamAssassin :-/
> >
> >Would it be wise to re-add SARE_FRAUD as a extra ruleset to 
> >SpamAssassin?
> >
> >Kind Regards,
> >Sander Holthaus
> 
> The beauty of it, is you can add them and score them to your 
> taste, just for a test. But I see no harm in adding them in 
> anyway. But I might be biased ;)
> 
> 
> --Chris

I see that they now (as of 01.03.02?) should be SpamAssassin 3.0+
compatible, so they're back! :-) Was kind of missing the 100+ points
spams...

Kind Regards,
Sander Holthaus



RE: bayesian filter training

2005-02-11 Thread Sander Holthaus - Orange XL
 > but would that not mean that the bayes filter will learn the 
> headers that spam assassin adds as spam .. and then after a 
> while only start classing mail that already has the spam 
> headers as bayes_99 ?

You can set bayes to ignore certain header. See the documentation for more
info. I would think it does not learn X-Spam-* headers by default. Also, I
would try to add other spam-related headers as well, as those can only be
abused by spammers IMHO.

Kind Regards,
Sander Holthaus



RE: Humor: "The Ultimate Spam Email"

2005-02-11 Thread Sander Holthaus - Orange XL
> -Original Message-
> From: Thomas Arend [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 11, 2005 1:41 PM
> To: users@spamassassin.apache.org
> Subject: Re: Humor: "The Ultimate Spam Email"
> 
> Am Donnerstag, 10. Februar 2005 22:46 schrieb Jim Maul:
> > Mike Jackson wrote:
> > >> http://lowendmac.com/lite/05/0210.html
> > >
> > > I sent it to myself...
> > >
> > > X-Spam-Report:
> > > *  1.8 URG_BIZ BODY: Contains urgent matter
> > > *  0.7 SARE_MONEYTERMS BODY: Talks about money in some way.
> > > *  0.7 SARE_URGBIZ BODY: Contains urgent matter
> > > *  2.6 NA_DOLLARS BODY: Talks about a million North 
> American dollars
> > > *  0.4 US_DOLLARS_3 BODY: Mentions millions of $ ($NN,NNN,NNN.NN)
> > > * -2.6 BAYES_00 BODY: Bayesian spam probability is 0 to 1%
> > > *  [score: 0.0003]
> > > *  1.0 URIBL_SBL Contains an URL listed in the SBL blocklist
> > > *  [URIs: walla.com]
> > > *  1.7 SARE_FRAUD_10 Matches 2 phrases commonly used in fraud spam
> > > *  1.7 SARE_FRAUD_1 Matches 2 phrases commonly used in fraud spam
> > > *  3.4 NIGERIAN_BODY1 Message body looks like a Nigerian spam 
> > > message 1+
> > > *  1.7 SARE_FRAUD_X5 Matches 5+ phrases commonly used in 
> fraud spam
> > > *  1.7 SARE_FRAUD_X6 Matches 6+ phrases commonly used in 
> fraud spam
> > > *  1.2 MISSING_SUBJECT Missing Subject: header
> > > *  0.6 NIGERIAN_BODY2 Message body looks like a Nigerian spam 
> > > message 2+
> > > *  1.7 SARE_FRAUD_X3 Matches 3+ phrases commonly used in 
> fraud spam
> > > *  1.7 SARE_FRAUD_X4 Matches 4+ phrases commonly used in 
> fraud spam
> > > *  1.7 SARE_FRAUD_6 Matches 2 phrases commonly used in fraud spam
> > > *  1.7 SARE_FRAUD_3 Matches 2 phrases commonly used in fraud spam
> > > *  1.7 SARE_FRAUD_5 Matches 2 phrases commonly used in fraud spam
> > > *  0.1 NIGERIAN_BODY3 Message body looks like a Nigerian spam 
> > > message 3+
> > > *  1.7 SARE_FRAUD_2 Matches 2 phrases commonly used in fraud spam
> > > *  0.9 SARE_FRAUD_9 Matches 2 phrases commonly used in fraud spam
> > > *  -15 AWL AWL: From: address is in the auto white-list
> > >
> > > The AWL hit is because I sent it from my work address. 
> The low Bayes 
> > > score surprises me; my Bayes database should be loaded with crap 
> > > like that.
> >
> > I just tried saving the text to a file and running spamc on it.  It 
> > didnt have any headers or anything but it still managed a score of 
> > 10.6 on my system without any add on rules...pretty good i 
> think.  My 
> > bayes hit with a BAYES_44.
> >
> > -Jim
> 
> Without the SARE rules I got the following low scores. One 
> rule is my own TA_Save_b01.
> 
> Thomas
> 
> Content analysis details:   (6.0 points, 5.0 required)
> 
>  pts rule name  description
>  -- 
> --
> -3.3 ALL_TRUSTEDDid not pass through any untrusted hosts
>  1.8 URG_BIZBODY: Contains urgent matter
>  0.5 TA_Save_b01BODY: Promise to save
>  2.6 NA_DOLLARS BODY: Talks about a million North 
> American dollars
>  0.4 US_DOLLARS_3   BODY: Mentions millions of $ 
> ($NN,NNN,NNN.NN)
>  0.0 BAYES_50   BODY: Bayesian spam probability 
> is 40 to 60%
> [score: 0.5000]
>  0.6 NIGERIAN_BODY2 Message body looks like a 
> Nigerian spam message 2+
>  3.4 NIGERIAN_BODY1 Message body looks like a 
> Nigerian spam message 1+
>  0.1 NIGERIAN_BODY3 Message body looks like a 
> Nigerian spam message 3+

X-Spam-Report: 
* -0.0 SPF_PASS SPF: sender matches SPF record
*  2.3 MANGLED_FREE BODY: mangled free
*  0.3 OFFER BODY: Offers you Something
*  1.0 SARE_OEM_SOFT_IS BODY: Software that is OEM
*  0.8 SARE_OEM_OEMCD BODY: Mentions a OEM cd
*  0.4 US_DOLLARS_3 BODY: Mentions millions of $ ($NN,NNN,NNN.NN)
*  1.8 URG_BIZ BODY: Contains urgent matter
*  0.6 J_CHICKENPOX_62 BODY: 6alpha-pock-2alpha
*  0.7 SARE_MONEYTERMS BODY: Talks about money in some way.
*  0.7 SARE_URGBIZ BODY: Contains urgent matter
*  2.6 NA_DOLLARS BODY: Talks about a million North American dollars
*  1.9 BAYES_99 BODY: Bayesian spam probability is 99 to 100%
*  [score: 1.]
*  3.4 NIGERIAN_BODY1 Message body looks like a Nigerian spam
message 1+
*  1.2 MISSING_SUBJECT Missing Subject: header
*  0.6 NIGERIAN_BODY2 Message body looks like a Nigerian spam
message 2+
*  3.0 NIGERIAN_BODY_2 More Nigerian scum body content
*  0.1 NIGERIAN_BODY3 Message body looks like a Nigerian spam
message 3+
* -7.0 AWL AWL: From: address is in the auto white-list

Scores pretty well (14.3, 9.0 required), though, I heard that SpamAssassin
3+ included the SARE_FRAUD rules (or similiar). But looking at the report, I
don't see any fraud-hits from SpamAssassin :-/

Would it be wise to re-add SARE_FRAUD as a extra rule

RE: Less spam blocked with 3.02 - AWL-related?

2005-02-11 Thread Sander Holthaus - Orange XL
> > 4) Can you share the output from a --lint with us?
> $ spamassassin --lint
> [EMAIL PROTECTED]:~$

What about spamassassin -D --lint?

Kind Regards,
Sander Holthaus



RE: Less spam blocked with 3.02 - AWL-related?

2005-02-10 Thread Sander Holthaus - Orange XL
> 3) Stop using AWL. Seriously, I found it did more harm then 
> good and got big too fast. 

I don't have any problem with it, and it is doing it's job quite well
actually. BUT I do think that it will only work if you have a good working
setup, in which there is a clear distinction in score's for both ham and
spam. Otherwise, it may backfire. Without any extra rule-sets and or various
net-lookups (SPF, SURBL, etc), I can't indeed imagine that it will work...
Also, the AWL-factor may need some tuning, in order to have a possitive
effect.

> --Chris 

Kind Regards,
Sander Holthaus



RE: Less spam blocked with 3.02 - AWL-related?

2005-02-10 Thread Sander Holthaus - Orange XL
> On Thu, Feb 10, 2005 at 11:48:18AM +0100, Sander Holthaus - 
> Orange XL wrote:
> > Your (mail)logs might come in handy for this, if you write out 
> > SpamAssassin's basic output there. With a basic Perl-script 
> (you can 
> > do this in almost any other script-language of course) you can see 
> > most likely everything you need. Spam, ham and mail-scores, 
> > scan-times, tests that where hit (!), etc. With only a small bit of 
> > programming, you can calculate and see everything you need! 
> You should 
> > check wat AWL and BAYES -tests are doing, especially if 
> they hit on Spam.
> 
> True.  Maybe I was to lazy to think about that ;)
> 
> I was looking at the logfile /var/log/mail.info which shows 
> which rules were used, but not with the individual values e.g.  
> 
> Feb 10 14:42:44 mail1 spamd[16031]: result: . -2 - 
> AWL,BAYES_20,DRUG_ED_CAPS,HTML_MESSAGE
> scantime=0.1,size=3491,mid=<01E4C22DDCD5E94DAC1863202903F26809
[EMAIL PROTECTED]>,
> bayes=0.0983660349113599,autolearn=disabled
> 
> But in exim's rejectlogs the full spamreport appears.

Well, I didn't get to it either until recently. I think there are not too
many who automate analysis of spamassassin output. While it is quite handy.
>From looking at the entry above, I think a few changes could be made to your
setup. Indeed you appear to have a problem with AWL, it shouldn't hit on
spam. But it think it is more likely to be related to the fact that messages
which are spam aren't getting enough hitpoints to be seen as spam.
Bayes_20 is also quite low (but not that unusual) for a spam-mail, not to
mention that only two other rules hit on the message. Do you perform any
networks-tests? (Pyzor, Razor, DCC, URIDNSBL)

> 
> > When I upgraded, (2.64 > 3.02) I noticed only a small increase in 
> > scores for spam and decrease for ham from SpamAssassin. Not the big 
> > results I had hoped for, but I'll patiently wait for 3.1. Overall 
> > results are slightly better, and technically, there should 
> be a lower 
> > possiblility of ham being marked as spam (due to 
> SPF-checking, did you install that?).
> 
> No, I did not install SPF-checking. I will have to read up about it.  

It is a nice addition, though not widely implemented (most major
webmail-providers use SPF nowadays, but many medium- and small
ISP's/webmail-providers don't). http://spf.pobox.com will tell you what it
is.

> > As to your setup. How up to date are those extra custom rules? 
> 
> A few days ago.

That's good. No problem there.

> > Any reason
> > why your are using 70_sare_html2.cf and 70_sare_html3.cf but not 
> > 70_sare_header0, cf70_sare_header1.cf, 70_sare_genlsubj0.cf, 
> > 70_sare_genlsubj1.cf, etc, etc...?
> 
> I did not know about them. 

Check out www.rulesemporium.com You will find all available rules,
descriptions and hints how to use them. There are also links to none
sare-rules, which can give excellent results too (e.g. chickenpox, weeds /
weeds2 and mangeled to name just a few).

> > There are more effective rules out there than just 
> sare_html or just 
> > sare rules!
> 
> > I use most of the Sare-rules + some extra rules, and 
> results are very 
> > good (though watch your memory and scantimes!). Have yet to see a 
> > false positive with a treshold of 9, and only 1-2% of all 
> traffic scores between 5 and 9.
> 
> I have tried now to download them with rule_du_jour and it 
> ends with an error:
> 
> 70_sare_bayes_poison_nxm.cf was up to date [skipped 
> downloading of 
> http://www.rulesemporium.com/rules/70_sare_bayes_poison_nxm.cf ] ...
> 
> No index found for ruleset named SARE_GENLSUBJ2.  Check that 
> this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ2.  Check that 
> this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ3.  Check that 
> this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ_ARC.  Check 
> that this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ_ENG.  Check 
> that this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ.  Check that 
> this ruleset is still valid.
> No files updated; No restart required.
> 
> 
> 
> 
> 
> Rules Du Jour Run Summary:RulesDuJour Run Summary on archive3:
> 
> No index found for ruleset named SARE_GENLSUBJ2.  Check that 
> this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ2.  Check that 
> this ruleset is still valid.
> 
> No index found for ruleset named SARE_GENLSUBJ3.  Check that 
> this ruleset is still valid.
> 
> No index found

RE: Less spam blocked with 3.02 - AWL-related?

2005-02-10 Thread Sander Holthaus - Orange XL
Your (mail)logs might come in handy for this, if you write out
SpamAssassin's basic output there. With a basic Perl-script (you can do this
in almost any other script-language of course) you can see most likely
everything you need. Spam, ham and mail-scores, scan-times, tests that where
hit (!), etc. With only a small bit of programming, you can calculate and
see everything you need! You should check wat AWL and BAYES -tests are
doing, especially if they hit on Spam.

When I upgraded, (2.64 > 3.02) I noticed only a small increase in scores for
spam and decrease for ham from SpamAssassin. Not the big results I had hoped
for, but I'll patiently wait for 3.1. Overall results are slightly better,
and technically, there should be a lower possiblility of ham being marked as
spam (due to SPF-checking, did you install that?).

As to your setup. How up to date are those extra custom rules? Any reason
why your are using 70_sare_html2.cf and 70_sare_html3.cf but not
70_sare_header0, cf70_sare_header1.cf, 70_sare_genlsubj0.cf,
70_sare_genlsubj1.cf, etc, etc...?
There are more effective rules out there than just sare_html or just sare
rules!
I use most of the Sare-rules + some extra rules, and results are very good
(though watch your memory and scantimes!). Have yet to see a false positive
with a treshold of 9, and only 1-2% of all traffic scores between 5 and 9.

Kind Regards,
Sander Holthaus

> -Original Message-
> From: Johann Spies [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 10, 2005 8:20 AM
> To: [EMAIL PROTECTED]
> Subject: Less spam blocked with 3.02 - AWL-related?
> 
> I have upgraded spamassassin on three mail  (2.63 -> 3.02 on two and
> 2.64 -> 3.02 on the other) servers about two weeks ago.
> 
> On the old system I have disabled AWL and Auto-learn because 
> they corrupted my bayesian database on at least one occasion.
> 
> I have decided to try out AWL with 3.02.
> 
> At first I did not use any extra rules but installed the 
> following after a week:
> 
> 70_sare_bayes_poison_nxm.cf
> 70_sare_html2.cf
> 99_sare_fraud_post25x.cf
> 70_sare_html0.cf 
> 70_sare_html3.cf 
> evilnumbers.cf
> 70_sare_html1.cf
> 70_sare_html_eng.cf
> 
> I have experienced less false positives with the new one.  
> Complaints came down from about 6 per week to maybe 1 in the 
> last two weeks.
> 
> But the feedback from users about spam received increased and 
> the following statistics shows that something is not working 
> as effectively as it was previously:
> 
> Average spam blocked per minute for the last
>   
>   Day WeekMonth   Year (Since April-June last year)
> mail1 5.946.217.678.20
> mail2 5.045.956.486.69
> mail3 4.954.67*   6.236.85
> 
> *  mail3 was down for a few hours during the week.
> 
> The three servers started out with the same bayesian database 
> and are trained with the same spam/ham on a nearly daily basis.
> 
> 
> I am suspecting AWL to be the culprit but I am not sure how 
> to determine it other than switching it of for a period.
> 
> Any commentary?
> 
> Regards
> Johann
> -- 
> Johann Spies  Telefoon: 021-808 4036
> Informasietegnologie, Universiteit van Stellenbosch
> 
>  "I was glad when they said unto me, Let us go into the 
>   house of the LORD."  Psalms 122:1 



RE: Manually training SpamAssassin by forwarding mail

2005-02-04 Thread Sander Holthaus - Orange XL
 

> -Original Message-
> From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 04, 2005 7:35 PM
> To: Peter Marshall; SpamAssassin Users
> Subject: Re: Manually training SpamAssassin by forwarding mail
> 
> Peter Marshall wrote:
> > Stuart Johnston wrote:
> > 
> >> Peter Marshall wrote:
> >>
> >>> Kevin Sullivan wrote:
> >>>
> >>>> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> >>>>
> >>>>> I've been interested in offering customers to train 
> manually train 
> >>>>> the SpamAssassin Bayes filter for ham and spam (to reduce false 
> >>>>> positives and negatives). However, I can only find 
> documentation 
> >>>>> to this for local mailboxes and IMAP. Most users 
> however, retrieve 
> >>>>> their mail through POP and use Outlook (Express) as 
> mail client. 
> >>>>> Is there a way to train SpamAssassin with such a setup (e.g. 
> >>>>> forwarding mail with Outlook
> >>>>> (Express) using SMTP)?
> >>>>
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> If you want to do a lot of programming, you could save 
> all incoming 
> >>>> messages for a few days in a database somewhere.  When a user 
> >>>> forwards a message to a special "ham" or "spam" mailbox, 
> you pull 
> >>>> the message-id from the message and use it to recover 
> the original 
> >>>> message from your database.
> >>>>
> >>>> -Kevin
> >>>
> >>>
> >>>
> >>>
> >>> My question is the same as Henrik, I have a bunch of 
> email that is 
> >>> spam (either tagged by spam assassin or not tagged at all.  I 
> >>> forwared it as an attachment to a "spam" mail box.  What 
> do I have 
> >>> to do now before I can get bayes to learn the message ... 
> I read you 
> >>> have to remove the headers  Could anyone give me a 
> little more 
> >>> detail ?
> >>
> >>
> >>
> >> I use a modified version of the DMZS-sa-learn.pl from: 
> >> http://www.dmzs.com/tools/files/spam.phtml
> >>
> >> When someone forwards a spam to me, I move the message to 
> a special 
> >> imap folder that gets processed by the script.  My additions look 
> >> something like:
> >>
> >> use Email::MIME;
> >> ...
> >> my $msg = Email::MIME->new($raw_message_body);
> >>
> >> my @parts = $msg->parts;
> >>
> >> foreach (@parts) {
> >>   if ($_->content_type =~ m|message/rfc822|) {
> >> sa_learn($_->body_raw);
> >>   }
> >> }
> >>
> >>
> >> I've tested this with messages forwarded as attachment 
> from Outlook 
> >> and Thunderbird.  I'm not sure how effective it is though. 
>  I'm sure 
> >> that it still looses something in the translation.  All imap is 
> >> really the way to go if you can.
> >>
> >>
> >> Stuart Johnston
> >>
> >>
> > But I have no imap .. only pop .. they would forwared (as 
> attachment) 
> > to a mailbox, and then I have to run sa-learn ... I assume as root ?
> > 
> > Will the stuff you posted work for this setup as well ??
> > 
> > Would there be big problems just running it after the forwared as 
> > attachment. ??
> 
> The code I posted only shows how you can extract the attached 
> spam from the email.  You'll need to write your own code to 
> integrate it into your particular setup.
> 
> BTW, in Outlook, you can easily attach multiple spams to one 
> message and this code should handle it.

CTRL-a, right click, "Forward Items" will indeed do the trick.

> > 
> > Can users also forwared as attachemtn mail that was sent that was 
> > already marked as spam ... or is there any advantage to this ?
> 
> If you use Bayes auto learn, I suspect that this wouldn't do much. 
> Otherwise, it might help.

I would check the headers of the forwarded messages to see if their
spam-score is above your auto-learning threshold. If it is, relearning is is
perhaps quite useless. You might wonder why they received the message anyway
(I would think that something that is good enough to autolearn is good
enough to refuse or discard).

Kind Regards,
Sander Holthaus



RE: Manually training SpamAssassin by forwarding mail

2005-02-04 Thread Sander Holthaus - Orange XL
 

> -Original Message-
> From: Stuart Johnston [mailto:[EMAIL PROTECTED] 
> Sent: Friday, February 04, 2005 5:20 PM
> To: users@spamassassin.apache.org
> Cc: Peter Marshall
> Subject: Re: Manually training SpamAssassin by forwarding mail
> 
> Peter Marshall wrote:
> > Kevin Sullivan wrote:
> > 
> >> --On 02/03/05 01:59:21 +0100 Sander Holthaus - Orange XL wrote:
> >>
> >>> I've been interested in offering customers to train 
> manually train 
> >>> the SpamAssassin Bayes filter for ham and spam (to reduce false 
> >>> positives and negatives). However, I can only find 
> documentation to 
> >>> this for local mailboxes and IMAP. Most users however, retrieve 
> >>> their mail through POP and use Outlook (Express) as mail 
> client. Is 
> >>> there a way to train SpamAssassin with such a setup (e.g. 
> forwarding 
> >>> mail with Outlook
> >>> (Express) using SMTP)?
> >>
> >>
> >>
> >> If you want to do a lot of programming, you could save all 
> incoming 
> >> messages for a few days in a database somewhere.  When a user 
> >> forwards a message to a special "ham" or "spam" mailbox, 
> you pull the 
> >> message-id from the message and use it to recover the original 
> >> message from your database.
> >>
> >> -Kevin
> > 
> > 
> > My question is the same as Henrik, I have a bunch of email that is 
> > spam (either tagged by spam assassin or not tagged at all.  
> I forwared 
> > it as an attachment to a "spam" mail box.  What do I have to do now 
> > before I can get bayes to learn the message ... I read you have to 
> > remove the headers  Could anyone give me a little more detail ?
> 
> I use a modified version of the DMZS-sa-learn.pl from: 
> http://www.dmzs.com/tools/files/spam.phtml
> 
> When someone forwards a spam to me, I move the message to a 
> special imap folder that gets processed by the script.  My 
> additions look something like:
> 
> use Email::MIME;
> ...
> my $msg = Email::MIME->new($raw_message_body);
> 
> my @parts = $msg->parts;
> 
> foreach (@parts) {
>if ($_->content_type =~ m|message/rfc822|) {
>  sa_learn($_->body_raw);
>}
> }
> 
> 
> I've tested this with messages forwarded as attachment from 
> Outlook and Thunderbird.  I'm not sure how effective it is 
> though.  I'm sure that it still looses something in the 
> translation.  All imap is really the way to go if you can.
> 
> 
> Stuart Johnston

Would it be an idea to stip the delivered to-header from the message, as
this will have no meaning to distinct between ham/spam? 

Also, I was wondering if anybody who is using spam-learn and ham-learn has
any protection build in to stop non-system users from mailing to those
addresses? 

Kind Regards,
Sander Holthaus



RE: Manually training SpamAssassin by forwarding mail

2005-02-04 Thread Sander Holthaus - Orange XL
> --On 02/04/05 16:08:53 +0100 Sander Holthaus - Orange XL wrote:
> > Basically, I've got two option. All mail that is received 
> is backupped 
> > on the mailserver before adding any headers. I could match 
> those with 
> > mail received in the spam-learn and ham-learn accounts. 
> However, mail 
> > is backupped only for a limited amount of time before being moved, 
> > after which the mail-server hasn't got any access to it. So unless 
> > people report mail that found it's way through the filters 
> on a very 
> > regular basis it won't be a full proof sollution.
> 
> You don't really need a 100% solution; something which works 
> 80% of the time would probably be fine.  But you may not want 
> to do the programming needed to automate this.

I don't have the time for it yet, but I should be able t make something in
Perl. Personally, I'm no big fan of the 80% rule in programming as that last
undone 20% usually forms 80% of my problems :-)
 
> > The other option sounds more viable, I would only need to strip off 
> > the X-Scanned-By, X-Spam-* and X-Sanitized headers (which 
> are ignored 
> > in my setup for bayes anyhow), BUT I have no guarentee that the 
> > message is in it's original format. Some MIME-Boundry 
> rewriting may be 
> > done by the mailserver (where necessary), as is converting 8bit to 
> > 7bit where possible. And I think that there are many client-sided 
> > mailfiltering engines, spamscanners and virusscanners out 
> there that 
> > may do some rewriting as well.
> 
> You'll probably find that the various changes don't affect 
> bayes that much. 
> When a re-written message is learned you may make bayes miss 
> email which (in an ideal world) it would have caught, but I 
> think it will tend to classify messages around 50% "I don't 
> know if this is ham or spam" rather than classifying it 
> incorrectly.  And there should be enough unchanged tokens in 
> the messages to let bayes work anyways.
> 
> So I say strip off what you can but don't obsess about the 
> rest.  Feed it into bayes and see how it works, and only try 
> to fix it if you see bayes misclassifying email.

I'm not sure if I know of a good system to check and see if BAYES is
misclassifing, but I should be able to get some of that information from the
logfiles. Perhaps throing away mail that has been rewritten/reformatted
would be a sollution, thouh I don't know if those can be recognized easily.
We'll see :-)

Thanks for all the help and suggestions!

Kind Regards,
Sander Holthaus



RE: Relay Country

2005-02-04 Thread Sander Holthaus - Orange XL
> What does one need to do to activate the relay country tests? 
> We have the CPAN module installed and added this line to local.cf
> 
> loadplugin Mail::SpamAssassin::Plugin::RelayCountry
> 
> Do we need to add any scores or tests?
> 
> We are not yet seeing any evidence that the test is being used.

You should set it in init.pre, found in the same dir as local.cf 

In init.pre, you'll find:

# RelayCountry - add metadata for Bayes learning, marking the countries
# a message was relayed through
#
loadplugin Mail::SpamAssassin::Plugin::RelayCountry

Which answers your second question is as to why you are not seeing any
evidence, it is used in the bayes-filter. The only thing you could observe
is a slight improvement in bayes-scores.

Kind regards,
Sander Holthaus



RE: Manually training SpamAssassin by forwarding mail

2005-02-04 Thread Sander Holthaus - Orange XL
> --On 02/04/05 09:17:55 -0400 Peter Marshall wrote:
> > My question is the same as Henrik, I have a bunch of email that is 
> > spam (either tagged by spam assassin or not tagged at all.  
> I forwared 
> > it as an attachment to a "spam" mail box.  What do I have to do now 
> > before I can get bayes to learn the message ... I read you have to 
> > remove the headers  Could anyone give me a little more detail ?
> 
> There's no 100% good way to do this; it depends on how the 
> message was mangled by the client (and possibly server).  The 
> only guaranteed way is (as I described) to save a copy at the 
> same point as it is inspected by SpamAssassin so you can use it later.
> 
> That being said, forwarding a message as an attachment will 
> usually preserve the headers pretty well.  The perl MailTools 
> and MIME-tools modules have procedures to pull out 
> attachments and save them in the Unix format which sa-learn wants.
> 
> Sorry I don't have any ready-made scripts for this; my users 
> dump messages into shared IMAP mailboxes which don't need any 
> preprocessing before being fed to sa-learn.
> 
>   -Kevin

Basically, I've got two option. All mail that is received is backupped on
the mailserver before adding any headers. I could match those with mail
received in the spam-learn and ham-learn accounts. However, mail is
backupped only for a limited amount of time before being moved, after which
the mail-server hasn't got any access to it. So unless people report mail
that found it's way through the filters on a very regular basis it won't be
a full proof sollution.

The other option sounds more viable, I would only need to strip off the
X-Scanned-By, X-Spam-* and X-Sanitized headers (which are ignored in my
setup for bayes anyhow), BUT I have no guarentee that the message is in it's
original format. Some MIME-Boundry rewriting may be done by the mailserver
(where necessary), as is converting 8bit to 7bit where possible. And I think
that there are many client-sided mailfiltering engines, spamscanners and
virusscanners out there that may do some rewriting as well.

>From above, I'm not sure that learning spam-assassin using forwarded
messages that may or may not be in the original format as SpamAssassin
received them the first time is a good idea. But I don't have enough
knowledge of SpamAssassin's internal workings and it's bayes-filter to be
sure...

Kind Regards,
Sander Holthaus



broken line in maillog

2005-02-04 Thread Sander Holthaus - Orange XL



In my maillogfile's 
I came across the following line:
 
spamd[11786]: 
result: Y 76 - 
BAYES_99,J_CHICKENPOX_101,J_CHICKENPOX_12,J_CHICKENPOX_13,J_CHICKENPOX_14,J_CHICKENPOX_15,J_CHICKENPOX_16,J_CHICKENPOX_21,J_CHICKENPOX_210,J_CHICKENPOX_22,J_CHICKENPOX_23,J_CHICKENPOX_24,J_CHICKENPOX_25,J_CHICKENPOX_26,J_CHICKENPOX_27,J_CHICKENPOX_29,J_CHICKENPOX_31,J_CHICKENPOX_32,J_CHICKENPOX_33,J_CHICKENPOX_34,J_CHICKENPOX_35,J_CHICKENPOX_36,J_CHICKENPOX_37,J_CHICKENPOX_41,J_CHICKENPOX_42,J_CHICKENPOX_43,J_CHICKENPOX_44,J_CHICKENPOX_45,J_CHICKENPOX_46,J_CHICKENPOX_47,J_CHICKENPOX_51,J_CHICKENPOX_52,J_CHICKENPOX_54,J_CHICKENPOX_62,J_CHICKENPOX_63,J_CHICKENPOX_65,J_CHICKENPOX_71,J_CHICKENPOX_74,J_CHICKENPOX_81,J_CHICKENPOX_91,MANGLED_AFFORD,MANGLED_FULL,MANGLED_INCLDN,MANGLED_LIPS,MANGLED_ONLINE,MANGLED_PLEASE,MANGLED_SMALL,MANGLED_SOLTNS,MANGLED_SPCALS,MANGLED_STOCK,MANGLED_TOOL,MANGLED_WHILE,MANGLED_WHLSAL,MANGLED_WRLDWD,NO_RDNS2,PERCENT_RANDOM,RATWARE_RCVD_LC_ESMTP,RCVD_HELO_IP_MISMATCH,RCVD_NUMERIC_HELO,SARE_MONEYTERMS,SARE_OEM_FAKE_YEAR,SARE_RAND_2,SARE_RECV_IP_220116,SA
 
The line end quite 
abruptly, is this due to a limit/error in Spamassassin or should I look 
elswhere?
 
Kind 
Regards,
Sander 
Holthaus


RE: Odd subject line spam

2005-02-03 Thread Sander Holthaus - Orange XL
> Hello,
> 
> We're seeing quite a few spam emails with subject lines 
> similar to the below...
> 
> "Better st0ck perfOrmance fr0m 0tc helpline"
> 
> Does anyone have a rule for these yet?
> 
> --
> Regards,
>  Matt 
> 

There are rules for those, however, they only seem to exist for the body.
Mangled, Chickpox and SARE_adult all hit on that line.

Kind Regards,
Sander Holthaus



RE: spamassassin scoring message twice

2005-02-03 Thread Sander Holthaus - Orange XL
What kind of setup are you using? What do you do if a email is tagged as
spam? SpamAssassin ran twice, but because of -2.8 ALL_TRUSTED, I would say
that it is some configuration issue on how you quarentine spam.

Kind Regards,
Sander Holthaus

> -Original Message-
> From: Peter Marshall [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 03, 2005 2:36 PM
> To: users@spamassassin.apache.org
> Subject: spamassassin scoring message twice
> 
> I am not sure why it is doing this ... but everytime i get a 
> spam, it looks like it does the smap rateing twice.  And it 
> gives different scores each time.  Here is the new header 
> from the last email I got.  Notice how it looks like 
> spamassassin ran twice.  Any Idea's ???  (yes, my threshhold 
> is low ... i am just testing what happens when  spam arrives).
> 
> --
> Spam detection software, running on the system 
> "mailtestlx.mydomain.com", has identified this incoming email 
> as possible spam.  The original message has been attached to 
> this so you can view it (if it isn't spam) or label similar 
> future email.  If you have any questions, see 
> [EMAIL PROTECTED] for details.
> 
> Content preview:  Spam detection software, running on the system
>   "mailtestlx.mydomain.com", has identified this incoming email
>   as possible spam. The original message has been attached to this so
>   you can view it (if it isn't spam) or label similar future email. If
>   you have any questions, see [EMAIL PROTECTED] for details. [...]
> 
> Content analysis details:   (8.6 points, 3.0 required)
> 
>  pts rule name  description
>  -- 
> 
> --
>  0.5 FROM_ENDS_IN_NUMS  From: ends in numbers
>  0.9 PLING_QUERYSubject has exclamation mark and 
> question mark
> -2.8 ALL_TRUSTEDDid not pass through any untrusted hosts
>  1.1 FORGED_HOTMAIL_RCVD2   hotmail.com 'From' address, but 
> no 'Received:'
>  0.8 BODY_ENHANCEMENT2  BODY: Information on getting 
> larger body parts
>  0.2 HTML_TEXT_AFTER_HTML   BODY: HTML contains text after 
> HTML close tag
>  0.2 HTML_TEXT_AFTER_BODY   BODY: HTML contains text after 
> BODY close tag
>  0.3 MIME_HTML_MOSTLY   BODY: Multipart message mostly 
> text/html MIME
>  0.0 HTML_MESSAGE   BODY: HTML included in message
>  0.5 HTML_OBFUSCATE_05_10   BODY: Message is 5% to 10% HTML 
> obfuscation
>  1.5 MPART_ALT_DIFF BODY: HTML and text parts are different
>  0.2 HTML_90_100BODY: Message is 90% to 100% HTML
>  0.0 HTML_TITLE_EMPTY   BODY: HTML title contains no text
>  0.1 MIME_BASE64_TEXT   RAW: Message text disguised using base64
> encoding
>  0.8 MIME_BASE64_BLANKS RAW: Extra blank lines in base64 encoding
>  1.2 OBFUSCATING_COMMENTHTML comments which obfuscate text
>  3.1 PERCENT_RANDOM PERCENT_RANDOM
> 
> The original message was not completely plain text, and may 
> be unsafe to open with some email clients; in particular, it 
> may contain a virus, or confirm that your address can receive 
> spam.  If you wish to view it, it may be safer to save it to 
> a file and open it with an editor.
> 
> 
> 
> 
> Subject:
> ???SPAM??? FW: Get it now!
> From:
> "Joe" <[EMAIL PROTECTED]>
> Date:
> Wed, 02 Feb 2005 22:17:00 -0400
> To:
> [EMAIL PROTECTED]
> 
> Spam detection software, running on the system 
> "mailtestlx.mydomain.com", has identified this incoming email 
> as possible spam.  The original message has been attached to 
> this so you can view it (if it isn't spam) or label similar 
> future email.  If you have any questions, see 
> [EMAIL PROTECTED] for details.
> 
> Content preview:  >From: "Fastest Penis Growth Available" To:
>   [EMAIL PROTECTED] Subject: >Get it now! >Date: Mon, 31 Jan 2005
>   09:21:56 -0800 > Nah, it's not what i'm looking for. clickhere . .
>   [...]
> 
> Content analysis details:   (9.6 points, 3.0 required)
> 
>  pts rule name  description
>  -- 
> 
> --
>  0.5 FROM_ENDS_IN_NUMS  From: ends in numbers
>  0.8 BODY_ENHANCEMENT2  BODY: Information on getting 
> larger body parts
>  0.2 HTML_TEXT_AFTER_HTML   BODY: HTML contains text after 
> HTML close tag
>  0.2 HTML_TEXT_AFTER_BODY   BODY: HTML contains text after 
> BODY close tag
>  0.3 MIME_HTML_MOSTLY   BODY: Multipart message mostly 
> text/html MIME
>  0.0 HTML_MESSAGE   BODY: HTML included in message
>  0.5 HTML_OBFUSCATE_05_10   BODY: Message is 5% to 10% HTML 
> obfuscation
>  1.5 MPART_ALT_DIFF BODY: HTML and text parts are different
>  0.2 HTML_90_100BODY: Message is 90% to 100% HTML
>  0.0 HTML_TITLE_EMPTY   BODY: HTML title contains no text
>  0.1 MIME_BASE64_TEXT   RAW: Message text disguised using base64
> encoding
>  0.8 MIME_BASE64_BLANKS RAW: Extra blank lines in base64 encoding
> 

RE: bayes: bayes db version 2 is not able to be used, aborting!

2005-02-03 Thread Sander Holthaus - Orange XL
My first guess would be that you could have 2 bayes db on your system, and
that spamassassin running as root is not looking in the same place for the
bayes db as spamd is. When upgrading from 2.64 to 3.02 I had a similar issue
where 2.64 was using the virtual mail users homedir fo its files (such as
bayes) whereas 3.02 started using root's homedir.

Kind Regards,
Sander Holthaus 

> -Original Message-
> From: Kevin Blackwell [mailto:[EMAIL PROTECTED] 
> Sent: Thursday, February 03, 2005 8:48 AM
> To: users@spamassassin.apache.org
> Subject: bayes: bayes db version 2 is not able to be used, aborting!
> 
> I'm running debian stable and I updated spamassassin from 2.63 to 3.0.
> It seem to be running fine, but I keep getting thsi error.
> 
> bayes: bayes db version 2 is not able to be used, aborting! 
> at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 
> 160,  line 44.
> 
> I've seen the link to this problem on spamassassins wiki, but 
> it also says if after you run
> 
> sa-learn -D -sync
> 
> the message dosen't go avay, post to thsi group. If anyone 
> can help, that would be great.
> 
> Here's the output of sa-learn -D -sync
> 
> debug: SpamAssassin version 3.0.2
> debug: Score set 0 chosen.
> debug: running in taint mode? yes
> debug: Running in taint mode, removing unsafe env vars, and 
> resetting PATH
> debug: PATH included '/usr/local/sbin', keeping.
> debug: PATH included '/usr/local/bin', keeping.
> debug: PATH included '/usr/sbin', keeping.
> debug: PATH included '/usr/bin', keeping.
> debug: PATH included '/sbin', keeping.
> debug: PATH included '/bin', keeping.
> debug: PATH included '/usr/bin/X11', keeping.
> debug: Final PATH set to:
> /usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/
> usr/bin/X11
> debug: using "/etc/spamassassin/init.pre" for site rules init.pre
> debug: config: read file /etc/spamassassin/init.pre
> debug: using "/usr/share/spamassassin" for default rules dir
> debug: config: read file /usr/share/spamassassin/10_misc.cf
> debug: config: read file /usr/share/spamassassin/20_anti_ratware.cf
> debug: config: read file /usr/share/spamassassin/20_body_tests.cf
> debug: config: read file /usr/share/spamassassin/20_compensate.cf
> debug: config: read file /usr/share/spamassassin/20_dnsbl_tests.cf
> debug: config: read file /usr/share/spamassassin/20_drugs.cf
> debug: config: read file /usr/share/spamassassin/20_fake_helo_tests.cf
> debug: config: read file /usr/share/spamassassin/20_head_tests.cf
> debug: config: read file /usr/share/spamassassin/20_html_tests.cf
> debug: config: read file /usr/share/spamassassin/20_meta_tests.cf
> debug: config: read file /usr/share/spamassassin/20_phrases.cf
> debug: config: read file /usr/share/spamassassin/20_porn.cf
> debug: config: read file /usr/share/spamassassin/20_ratware.cf
> debug: config: read file /usr/share/spamassassin/20_uri_tests.cf
> debug: config: read file /usr/share/spamassassin/23_bayes.cf
> debug: config: read file /usr/share/spamassassin/25_body_tests_es.cf
> debug: config: read file /usr/share/spamassassin/25_hashcash.cf
> debug: config: read file /usr/share/spamassassin/25_spf.cf
> debug: config: read file /usr/share/spamassassin/25_uribl.cf
> debug: config: read file /usr/share/spamassassin/30_text_de.cf
> debug: config: read file /usr/share/spamassassin/30_text_fr.cf
> debug: config: read file /usr/share/spamassassin/30_text_nl.cf
> debug: config: read file /usr/share/spamassassin/30_text_pl.cf
> debug: config: read file /usr/share/spamassassin/50_scores.cf
> debug: config: read file /usr/share/spamassassin/60_whitelist.cf
> debug: config: read file /usr/share/spamassassin/65_debian.cf
> debug: using "/etc/spamassassin" for site rules dir
> debug: config: read file /etc/spamassassin/Chinese_rules.cf
> debug: config: read file /etc/spamassassin/antidrug.cf
> debug: config: read file /etc/spamassassin/chickenpox.cf
> debug: config: read file /etc/spamassassin/local.cf
> debug: config: read file /etc/spamassassin/rolex.cf
> debug: config: read file /etc/spamassassin/sa-blacklist.current.uri.cf
> debug: using "/root/.spamassassin/user_prefs" for user prefs file
> debug: config: read file /root/.spamassassin/user_prefs
> debug: plugin: loading Mail::SpamAssassin::Plugin::URIDNSBL from @INC
> debug: plugin: registered 
> Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8a47634)
> debug: plugin: loading Mail::SpamAssassin::Plugin::Hashcash from @INC
> debug: plugin: registered 
> Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8a45378)
> debug: plugin: loading Mail::SpamAssassin::Plugin::SPF from @INC
> debug: plugin: registered 
> Mail::SpamAssassin::Plugin::SPF=HASH(0x8a46a28)
> debug: plugin: Mail::SpamAssassin::Plugin::URIDNSBL=HASH(0x8a47634)
> implements 'parse_config'
> debug: plugin: Mail::SpamAssassin::Plugin::Hashcash=HASH(0x8a45378)
> implements 'parse_config'
> debug: bayes: 12444 tie-ing to DB file R/O 
> /root/.spamassassin/bayes_toks
> debug: bayes: 12444 tie-ing to DB file R/O 
> /root

SpamAssassin 3 memory usage

2005-02-03 Thread Sander Holthaus - Orange XL
I've noticed that my current memory consumption of spamd (3.x), when using a
number of custom rule-sets such as SARE, is relatively high (~50MB according
to ps). When running with a large number of children, this would consume
quite a large portion of memory.
Or am I wrong here, and is a portion of that 50MB per child actually shared?
 
Kind Regards,
Sander Holthaus



RE: Manually training SpamAssassin by forwarding mail

2005-02-03 Thread Sander Holthaus - Orange XL
> At 07:59 PM 2/2/2005, Sander Holthaus - Orange XL wrote:
> >I've been interested in offering customers to train manually 
> train the 
> >SpamAssassin Bayes filter for ham and spam (to reduce false 
> positives 
> >and negatives). However, I can only find documentation to this for 
> >local mailboxes and IMAP. Most users however, retrieve their mail 
> >through POP and use Outlook (Express) as mail client. Is 
> there a way to 
> >train SpamAssassin with such a setup (e.g. forwarding mail 
> with Outlook
> >(Express) using SMTP)?
> >

Matt Kettler wrote:
> 
> Only if you can somehow get the users to forward an 
> un-mangled message, 
> complete with original headers, as an attachment. You can then have a 
> script strip off the attachments and feed those to sa-learn.
> 
> The fundamental problem with normal forwarding is that from a SA 
> perspective, the forwarded message looks very little like the 
> original. New 
> headers, different encoding, extra text often added to the 
> body, superflous 
> mime sections dropped, others added.
> 
> Since SA learns from the message headers and some of the 
> message encoding 
> has an impact on learning, these changes cause problems.. 
>

Will Yardley wrote:
> There are various schemes to do this; the tricky part is 
> getting people to submit emails in a consistent format - if 
> you can get them to forward them as mesage/rfc822 
> attachments, it probably wouldn't be too hard to write a 
> program to extract them and train... I imagine this would be 
> too complicated for many users, though.
> 
> One scheme that we've used is to have specially named IMAP 
> folders that users can place mis-classified emails in for 
> training.. then you can have a server-side robot which trains 
> the filter and then discards the emails.


Thanks, I figured that that would a the problem. Makes it pretty hard to
impossible to create such a system for average users. I was hoping that
SpamAssassin would include a system simiar to DSPAM.

On the side, if I would get such a system working (where users are able to
forward emails untouched and I am able to extract those messages to
sa-learn), could I expect problem with some locally added headers? For
instance, added headers when the message passes though a local anti-spam or
anti-virus proxy. Or in case of IMAP, when users flag messages (or if they
are automatically flagged) before moving them to a learn-ham / learn-spam
folder?

Kind Regards,
Sander Holthaus



Manually training SpamAssassin by forwarding mail

2005-02-03 Thread Sander Holthaus - Orange XL



I've been interested 
in offering customers to train manually train the SpamAssassin Bayes filter for 
ham and spam (to reduce false positives and negatives). However, I can only find 
documentation to this for local mailboxes and IMAP. Most users however, retrieve 
their mail through POP and use Outlook (Express) as mail client. Is there a way 
to train SpamAssassin with such a setup (e.g. forwarding mail with Outlook 
(Express) using SMTP)? 
 
Kind 
Regards,
Sander 
Holthaus


Attempt to free unreferenced scalar: SV 0xbb91874.

2005-02-03 Thread Sander Holthaus - Orange XL



Yesterday, I saw the 
following message in my logs after shutting down spamd:
 
    
Attempt to free unreferenced scalar: SV 0xbb91874.
 
I have no clue as to 
what is means. Can anyone enlighten me? I'm using SpamAssassin 3.02, Perl 5.8.5 
and FreeBSD 4.10
 
Kind 
Regards,
Sander 
Holthaus