Re: DCC in Spamassassin

2005-03-13 Thread Norman Zhang
I also uncommented the DCCIFD_ARGS line.
# used to start dccifd
#   a common value is
#   DCCIFD_ARGS=-SHELO -Smail_host -SSender -SList-ID
DCCIFD_ARGS=
My DCCIFD_ARGS is empty. Should I add the options that is shown above?
Was there a dccd file created in /etc/init.d as part of the
installation process for dcc? It starts dccd, grey, and dccifd. Here's
# ls -l /etc/rc.d/init.d/
-rwx--  1 root root  1406 Mar  1 18:44 amavisd*
-rwx--  1 root root  1101 Jan 28 08:26 clamd*
-rwx--  1 root root  3266 Sep 28 03:32 dccd*
-rwx--  1 root root  1219 Jan 28 08:26 freshclam*
Someone pointed out to me I should look for rcdcc, but I only have
# slocate cdcc
/usr/bin/cdcc
Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
use dccifd?

Regards,
Norman Zhang


Re: SA addr tests need to be updated

2005-03-13 Thread List Mail User
...
Date: Sat, 12 Mar 2005 18:46:52 -0500
From: Eric A. Hall [EMAIL PROTECTED]
User-Agent: Mozilla Thunderbird 1.0 (Windows/20041206)
X-Accept-Language: en-us, en
MIME-Version: 1.0
To: users@spamassassin.apache.org
Subject: Re: SA addr tests need to be updated
References: [EMAIL PROTECTED] [EMAIL PROTECTED] [EMAIL PROTECTED] 
[EMAIL PROTECTED]
...

After considering all the discussion, I've filed these three bugs:

 4188--RCVD_HELO_IP_MISMATCH should check address literals (this was
   argued against by Justin, but I'm convinced it's spam-sign)

 4186--RCVD_NUMERIC_HELO does not test reserved addresses (they are
   still 'numeric' and aren't hostnames, and should still hit)

 4187--RCVD_ILLEGAL_IP does not fire in all cases (reserved, malformed,
   and literals should all be tested, but aren't)

The rest of it can stay where it is and still be useful

Thanks

-- 
Eric A. Hallhttp://www.ehsco.com/
Internet Core Protocols  http://www.oreilly.com/catalog/coreprot/

 
Eric,
 
I know what I say certainly hold no authority, but I clearly agree
 with 4186 and 4187.  And if you mean literals unqualified by brackets, I
 not only agree with 4188, but would argue that it and the others should be
 promoted to be DSN_ style rules and that the finding of unbracketed numeric
 HELO/EHLOs anywhere in the received chain is an *excellent* spam-sign
 (especially when forged one or two levels below the relay machine).
 For 4186 and 4187, it would seem that brackets are irrelevant - you are
 correct that all cases should be tested.
 
The only exception I would make, if they were DSN_* rules, would be
 a -notfirsthop qualifier for RFC1918 IP hosts and rule #4186 since they are
 so common for internal corporate networks running DHCP.
 
Paul Shupak
[EMAIL PROTECTED]


Re: DCC in Spamassassin

2005-03-13 Thread Matt Kettler
At 10:07 PM 3/12/2005, Norman Zhang wrote:
Someone pointed out to me I should look for rcdcc, but I only have
# slocate cdcc
/usr/bin/cdcc
I'm not too familiar with the scripts that come with DCC for this.. I just 
wrote my own init script to start dccifd.


Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
use dccifd?
No..  SA will use neither dccproc nor dccifd if you do that. The use_dcc 
option completely enables or completely disables all of DCC at once.




RE: DCC in Spamassassin

2005-03-13 Thread Greg Allen
My dcc on RedHat 8.0 is located in

/var/dcc  directory

You should see DCCIFD there as well.

I had to change a line in the /var/dcc/dcc_conf to

DCCIFD_ENABLE=ON

Also, make sure you install the latest DCC with DCCIFD from the Rayolite
website (if you do not have it).

Put the path to DCC in your local spamassassin config file just to be safe.
(local.cf)

DCC_HOME /VAR/DCC

Do all of this and spamassassin should automatically use DCCIFD instead of
DCC when it detects it.







-Original Message-
From: Norman Zhang [mailto:[EMAIL PROTECTED]
Sent: Saturday, March 12, 2005 10:07 PM
To: users@spamassassin.apache.org
Subject: Re: DCC in Spamassassin


 I also uncommented the DCCIFD_ARGS line.

# used to start dccifd
#   a common value is
#   DCCIFD_ARGS=-SHELO -Smail_host -SSender -SList-ID
DCCIFD_ARGS=

My DCCIFD_ARGS is empty. Should I add the options that is shown above?

 Was there a dccd file created in /etc/init.d as part of the
 installation process for dcc? It starts dccd, grey, and dccifd. Here's

# ls -l /etc/rc.d/init.d/
-rwx--  1 root root  1406 Mar  1 18:44 amavisd*
-rwx--  1 root root  1101 Jan 28 08:26 clamd*
-rwx--  1 root root  3266 Sep 28 03:32 dccd*
-rwx--  1 root root  1219 Jan 28 08:26 freshclam*

Someone pointed out to me I should look for rcdcc, but I only have

# slocate cdcc
/usr/bin/cdcc

Should I use this instead? BTW do I need to set use_dcc = 0 if I want to
use dccifd?

Regards,
Norman Zhang



What's wrong with my test?

2005-03-13 Thread shirlei


Hi everyone!
Probably I'm doing a stupid question... but, anyway, here it go:
I saved in a folder some messages that I received classified as no spam. So, i
run these command:
sa-learn --spam path to folder
So I take the following :
Learned from 2 message(s) (3 message(s) examined).

So, I redirect those messages to me and I received them as no spam again! Am I
doing shit? What is wrong with my test?
Thanks for your attention.
bye



Re: DCC in Spamassassin

2005-03-13 Thread Bill Randle
On Sat, 2005-03-12 at 19:07 -0800, Norman Zhang wrote:
  I also uncommented the DCCIFD_ARGS line.
 
 # used to start dccifd
 #   a common value is
 #   DCCIFD_ARGS=-SHELO -Smail_host -SSender -SList-ID
 DCCIFD_ARGS=
 
 My DCCIFD_ARGS is empty. Should I add the options that is shown above?

I would.

  Was there a dccd file created in /etc/init.d as part of the
  installation process for dcc? It starts dccd, grey, and dccifd. Here's
 
 # ls -l /etc/rc.d/init.d/
 -rwx--  1 root root  1406 Mar  1 18:44 amavisd*
 -rwx--  1 root root  1101 Jan 28 08:26 clamd*
 -rwx--  1 root root  3266 Sep 28 03:32 dccd*
 -rwx--  1 root root  1219 Jan 28 08:26 freshclam*
 
 Someone pointed out to me I should look for rcdcc, but I only have
 
 # slocate cdcc
 /usr/bin/cdcc
 
 Should I use this instead? BTW do I need to set use_dcc = 0 if I want to 
 use dccifd?

I don't have rcddc either. With SpamAssassin, use dccifd as previously
mentioned. Once you edited the dcc_conf file to enable DCCIFD, start it
using the init program:
# /etc/init.d/dccd start

I see you also have amavisd installed. If you run spamassassin from
amavisd you will need to reload it as well:
# amavisd reload

This will force it to re-read the spamassassin config files and pick up
your use_dcc change (should be set to 1) and look for the dccifd socket.

-Bill




Re: What's wrong with my test?

2005-03-13 Thread shirlei
Citando Matt Kettler [EMAIL PROTECTED]:

 At 11:57 PM 3/12/2005, you wrote:
 So, I redirect those messages to me and I received them as no spam again!
 
 Define exactly what you mean by redirect those messages. What specific 
 actions did you do?
   I used redirect tool from webmail (horde)

 
 Also, tell us a bit about how your mail gets scanned for spam. What tools 
 do you use? procmailrc? a milter? etc.

  I'm just testing and learning about spamassassin. Wasn't me whom installed and
configured all the tools. But is used qmail . I connected in server as user spam
d and I executed the command I refer in my last message.  Do I missed something?
What information more is important to know?


 
 
 
 
 
 
 
 






Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread Bob Proulx
Jens Ahlin wrote:
 When trying to build rpm using rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz
 fails with
 error: Failed build dependencies:
 perl(Digest::SHA1) is needed by spamassassin-3.0.2-1
 perl(HTML::Parser) is needed by spamassassin-3.0.2-1
 
 After installing these modules using CPAN rpmbuild still fails with the
 same error.

You are mixing CPAN installations and RPM installations.  RPM does not
know about CPAN.  Once you have made the decision to install from CPAN
you are commited to installing everything from CPAN.  You should
either install everything with CPAN or install everything with RPM.
Don't mix them.  The person with two clocks is never really sure of
the current time.

Since you have been installing perl modules by CPAN you should
probably just continue and install spamassassin by CPAN too.  Which
means you don't need to build an rpm package.  But if you want rpm to
know that you have those perl modules installed then you need to
install them by RPM.

 Installing these modules from RPMS solves the problem. Why isn't rpmbuild
 find the modules installed using CPAN.

Your question is rather like Bilbo Baggins asking What do I have in
my pocket?  How would RPM know what you have installed by CPAN?

Bob


Re: rule for mail not to me

2005-03-13 Thread Vicki Brown
At 20:15 -0800 03/06/2005, Vicki Brown wrote:
I can create a user rule for mail not addressed (To or Cc) to me

  header CF_NOT_FOR_METoCc !~ /[EMAIL PROTECTED]/
  score CF_NOT_FOR_ME 4.0
  describe CF_NOT_FOR_ME  Neither To nor Cc me

However, the still-not-addressed user scores bug prevents me from setting the
score any higher than 1 for these.
   http://bugzilla.spamassassin.org/show_bug.cgi?id=4121



Many thanks to the SpamAssassin development team for fixing bug 4121!

However, I'm still interested in knowing:

is there a magic variable for my address that would allow me to set up a
general site-wide rule of this type?

-- 
Vicki Brown  ZZZ
Journeyman Sourceror:  zz  |\ _,,,---,,_ Code, Docs, Process,
Scripts  Philtres  zz /,`.-'`'-.  ;-;;,_   Perl, WWW, Mac OS X
http://cfcl.com/vlb   |,4-  ) )-,_. ,\ ( `'-'   SF Bay Area, CA  USA
___  '---''(_/--'  `-'\_)  ___


Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
Hello,
for some days now my bayesian DB does not seem to grow. Its size remains
stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).

I use SpamAssassin 3.0.2. No configuration change has been done recently. It 
used to work fine.
I've tried --sync, --force-expire, but no luck.
Any help would be appreciated
Thanks
Greg

_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/



Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread Jens Ahlin
 Jens Ahlin wrote:
 When trying to build rpm using rpmbuild -tb
 Mail-SpamAssassin-3.0.2.tar.gz
 fails with
 error: Failed build dependencies:
 perl(Digest::SHA1) is needed by spamassassin-3.0.2-1
 perl(HTML::Parser) is needed by spamassassin-3.0.2-1

 After installing these modules using CPAN rpmbuild still fails with the
 same error.

 You are mixing CPAN installations and RPM installations.  RPM does not
 know about CPAN.  Once you have made the decision to install from CPAN
 you are commited to installing everything from CPAN.  You should
 either install everything with CPAN or install everything with RPM.
 Don't mix them.  The person with two clocks is never really sure of
 the current time.

 Since you have been installing perl modules by CPAN you should
 probably just continue and install spamassassin by CPAN too.  Which
 means you don't need to build an rpm package.  But if you want rpm to
 know that you have those perl modules installed then you need to
 install them by RPM.

 Installing these modules from RPMS solves the problem. Why isn't
 rpmbuild
 find the modules installed using CPAN.

 Your question is rather like Bilbo Baggins asking What do I have in
 my pocket?  How would RPM know what you have installed by CPAN?

 Bob



Thanks for the clarification.

 Jens



Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
 Does anyone have or know about a list of spam-advertised URIs
 where the spam they appeared in was sent through open relays,
 zombies, open proxies, etc.  In other words does anyone know
 of a list of spamvertised web sites or their domains that's
 been cross referenced to exploited hosts?

 We could use that information as a valuable tool for getting
 more records into SURBLs.

One fairly easy for anyone running a large SpamAssassin
installation to help us get this data would be to simply grep
for XBL and SURBL rules hitting the same message and report
out the URI domains from those messages.

Perhaps some kind person could write a reporting function in
SpamAssassin for this?

Jeff C.
--
If it appears in hams, then don't list it.



Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 5:12:30 AM, Jeff Chan wrote:
 On Friday, March 11, 2005, 11:27:52 PM, Jeff Chan wrote:
 Does anyone have or know about a list of spam-advertised URIs
 where the spam they appeared in was sent through open relays,
 zombies, open proxies, etc.  In other words does anyone know
 of a list of spamvertised web sites or their domains that's
 been cross referenced to exploited hosts?

 We could use that information as a valuable tool for getting
 more records into SURBLs.

 One fairly easy for anyone running a large SpamAssassin
 installation to help us get this data would be to simply grep
 for XBL and SURBL rules hitting the same message and report
 out the URI domains from those messages.

 Perhaps some kind person could write a reporting function in
 SpamAssassin for this?

Hmm, perhaps if we could extract *all* URI domains from messages
sent through XBLed senders then prioritize those say by frequency
of appearance, we could create a new SURBL list of spamvertised
domains sent through exploited hosts.  That would pretty directly
address the use of zombies, etc. and put a penalty on using them
to advertise sites through them.  Even with volume weighting such
a list of sites could be attacked by major joe job unless we took
additional countermeasures, but does anyone else think this might
be a useful type of data source for SURBLs?

Jeff C.
--
If it appears in hams, then don't list it.



Re: Bayes DB does not grow anymore

2005-03-13 Thread Kai Schaetzl
GRP Productions wrote on Sun, 13 Mar 2005 11:21:12 +0200:

 for some days now my bayesian DB does not seem to grow. Its size remains 
 stable. It is read with no problems by SA 3.0.2, but nothing new is written. 
 I send an email to me, it is classified as BAYES_50. I sa-learn it as spam, 
 send it again, and it is still BAYES_50 (I expected to see it as BAYES_99).


This doesn't prove anything. sa-learn --dump magic shows you what's inside. 
Also, Bayes is not a checksum system like Razor, that's its strength. If you 
learn something to it that means that it extracts tokens (short pieces) from 
the message and adjusts its internal probability for them being ham or spam by 
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the fact 
that the db format seems to have some air in it, so that it grows in jumps 
and not continually.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: What's wrong with my test?

2005-03-13 Thread Kai Schaetzl
 wrote on Sun, 13 Mar 2005 01:57:18 -0300:

 So, I redirect those messages to me and I received them as no spam again!


As I just wrote to GRP Productions: Bayes doesn't work this way.

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Tests results are different?

2005-03-13 Thread David Suen
Hi all, I installed spam assassin (3.0.2) in my linux box and looks like
does not work (probably due to my configuration). I tried RTFM but still
cannot make it work.

Situation:
I use spamassassin with simscan + qmail. My problem is when I use the
spamassassin (not spamc) + sample-spam.txt it gives me the correct result
(and I tested another real spam with the 'keyword' viagra). However,
when I use spamc ...it keeps saying (all the emails) score is 0.0.

Header using spamc:
X-Spam-Status: spam=No, score=0.0 required=4.2 tests= none
autolearn=unavailable

Header using spamassassin: X-Spam-Level: 
X-Spam-Status: spam=Yes, score=4.3 required=4.2 tests= BIZ_TLD=0.527,
DATE_IN_PAST_06_12=0.211,DRUGS_ERECTILE=0.026,DRUG_ED_CAPS=1.535,
EXTRA_MPART_TYPE=0.222,SUBJECT_DRUG_GAP_VIA=1.77 autolearn=no


local.cf:

rewrite_subject 1
report_header 1
report_safe 2
required_hits 4.2
add_header all Status spam=_YESNO_, score=_SCORE_ required=_REQD_ tests=
_TESTSSCORES(,)_ autolearn=_AUTOLEARN_



Do you guys have any idea why spamc said the score is 0 for all emails?

If you need more information I am happy to provide if necessary.


Thanks

David




Re: [SURBL-Discuss] List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Kai Schaetzl
Jeff Chan wrote on Sun, 13 Mar 2005 05:12:30 -0800:

 One fairly easy for anyone running a large SpamAssassin 
 installation to help us get this data would be to simply grep 
 for XBL and SURBL rules hitting the same message and report 
 out the URI domains from those messages.


I have a large corpus of spam and ham by quarantining in MailScanner. 
Unfortunately, MailScanner doesn't alter the quarantined messages, so I 
would need to have a tool scan the saved score data in the Mailwatch db 
and then scan each corresponding message for URIs (and wouldn't know which 
one of them, matched).
So, depending on how you run SA, it's not that easy to get at this data. 
Wouldn't it be possible to have an option in SA that adds the matching URI 
to the score (URI_SURBL_domain.com) or saves it in a summary? Wouldn't a 
statistics module for SA make sense anyway?

Kai

-- 
Kai Schätzl, Berlin, Germany
Get your web at Conactive Internet Services: http://www.conactive.com
IE-Center: http://ie5.de  http://msie.winware.org





Re: [Slight OT] Problems with perl modules req for rpmbuild -tb Mail-SpamAssassin-3.0.2.tar.gz

2005-03-13 Thread List Mail User
   ...The person with two clocks is never really sure of
 the current time.

OT, but... above - *not* a good quote, but it sounds nice)
To be `sure' of the time, you need at least three clocks (look at the
documentation for ntp/ntpd).


 ...
...

Paul Shupak
[EMAIL PROTECTED]


Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 5:36:55 AM, Raymond Dijkxhoorn wrote:
 Hi!

 Perhaps some kind person could write a reporting function in
 SpamAssassin for this?

 Hmm, perhaps if we could extract *all* URI domains from messages
 sent through XBLed senders then prioritize those say by frequency
 of appearance, we could create a new SURBL list of spamvertised
 domains sent through exploited hosts.  That would pretty directly
 address the use of zombies, etc. and put a penalty on using them
 to advertise sites through them.  Even with volume weighting such
 a list of sites could be attacked by major joe job unless we took
 additional countermeasures, but does anyone else think this might
 be a useful type of data source for SURBLs?
[...]

 Spamtraps are bad news if you use them 1:1, you need to parse out a LOT, 
 did you run poluted spamtraps? I have been running two proxypots, i still 
 might have some tars, and most of it was really useless. What more helps 
 is a wider coverage. I rather see some automated system like spamcop 
 setup, so people can report, and we auto parse it with Joe's tool for 
 example. With a larger footprint we also get spam earlier. Its not like 
 they first send to the spamtraps and then to 'real'users alone.

 I understand you want to cover new area's but please dont rely on other 
 RBL's too much, i think waiting with own checks does much more in the end. 
 IF SBL picks it up we can pick it up faster. But we also want to pickup 
 ones NOT listed by any RBL do we ?

I think you're not understanding what I'm asking for.  :-)

I'm not asking for trap data.  I'm asking to look for XBL hits,
then take the URIs from messages that hit XBL.  In other words
I want to get the sites that are being advertised through
exploited hosts.

Nothing to do with traps or SBL.  ;-)

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: [SURBL-Discuss] Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
On Sunday, March 13, 2005, 7:31:01 AM, Raymond Dijkxhoorn wrote:
 I'm not asking for trap data.  I'm asking to look for XBL hits,
 then take the URIs from messages that hit XBL.  In other words
 I want to get the sites that are being advertised through
 exploited hosts.

 Nothing to do with traps or SBL.  ;-)

 If you can get a feed, why limit this to hosts found inside XBL?

This is not for a spam feed specifically.  It's to get data about
what sites are spam advertised through compromised hosts.  XBL
happens to be a good, reliable list of compromised hosts.  Other
lists like list.dsbl.org may be ok too, but those are the only
two RBLs I have a lot of confidence in.  The goal would not be to
get all data but to get all reliable data.

Jeff C.
-- 
Jeff Chan
mailto:[EMAIL PROTECTED]
http://www.surbl.org/



Re: Was: List of spamvertised sites sent via zombies, open proxies, etc.?

2005-03-13 Thread Jeff Chan
It would probably help if I explained that I brought up two
different but related ides in quick succession:

1.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL that mentioned SURBL URIs.

2.  Asking for URI domains of messages sent through zombies, open
relays, open proxies, etc. detected by XBL regardless of whether
those domains were already listed in SURBLs or not.

The latter may actually be more useful since it's broader and
more inclusive.  We could easily intersect them against SURBLs
ourselves if it were useful for other applications.

I believe this could be a valuable new data source.  It's true
that Spamhaus and others probably already have this data
internally but we don't.  ;-)  It's also possibly true that
existing trap based lists like ob.surbl.org and jp.surbl.org
may already have similar data in them.  As Paul notes there
is probably a lot of overlap between the various datasets
being used or proposed.

I'd probably ask for messages sent through XBL and list.dsbl.org
listed hosts since both lists are pretty reliable.  Completeness
of compromised host detection is probably non-essential for this
application.  The resulting dataset would be so large that missing
some fraction of zombies probably would not affect the end result
very much.  The sites of the biggest spammers would tend to
bubble to the top of a volume-ranked list.

Jeff C.
--
If it appears in hams, then don't list it.



[RD] evilnumbers update changes

2005-03-13 Thread Matt Yackley
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi all,

I've released a new version of evilnumbers and there are several changes in the 
new
version.

Ruleset name change:
In order to get this old setup in line with current SARE standards the name of 
the
ruleset has changed from evilnumbers.cf to 70_sare_evilnum*.cf

Multiple files:
The set has been spilt into three different files..
70_sare_evilnum0.cf = hits 0 ham during SARE masschecks
70_sare_evilnum1.cf = hits a few ham, but most folks consider these messages 
spam
70_sare_evilnum2.cf = hit 0 spam  ham during last masscheck, but may come back

RulesDuJour:
A new version of RDJ will be released soon to handle these changes, but here is 
a
manual fix.

In your RDJ or MyRDJ config file locate the evilnumbers entry and change the
following lines.
ADD = OLD_CF_FILES[8]=evilnumbers.cf
CHANGE = CF_FILES[8]=70_sare_evilnum0.cf
CHANGE = CF_URLS[8]=http://www.rulesemporium.com/rules/70_sare_evilnum0.cf;

Info on adding files 1  2 to RDJ
http://www.exit0.us/index.php?pagename=RulesDuJourRuleSets

Language files:
If you use a local language file 98_text_**_evilnumbers.cf, please delete this 
file.
 The structure of the rules may change soon, if/when that happens I'll release
updated language files.

Cheers,
matt








-BEGIN PGP SIGNATURE-
Version: GnuPG v1.2.4 (GNU/Linux)

iD8DBQFCNITmjzAeShEp8NMRAkS2AJ9O3Wvt4qvc5BmRlKh1fFmxJP+/WACfQch7
gSpphFJ7593ULRK4L79hnck=
=ECdQ
-END PGP SIGNATURE-


Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
This doesn't prove anything. sa-learn --dump magic shows you what's inside.
Also, Bayes is not a checksum system like Razor, that's its strength. If 
you
learn something to it that means that it extracts tokens (short pieces) 
from
the message and adjusts its internal probability for them being ham or spam 
by
a certain factor. Or if it doesn't know that token yet it adds it.
That the size doesn't grow can have several reasons, f.i. expiry or the 
fact
that the db format seems to have some air in it, so that it grows in 
jumps
and not continually.
Perhaps I have not been clear enough. It's not only that the files' size is 
constant. I am pasting the output of dump magic, and I have to explain that 
the nham and nspam values are the same for many days now. This is not 
normal, since we are talking about a very busy server (more than 4,000 
messages per day). This behaviour has not always been the case, it used to 
work fine. If I send to myself a message from Yahoo, with subject 'Viagra 
sex teen  and other nice words, I certainly do not want it to pass. 
Bayes classifies it as 50% spam. I tried to sa-learn --forget, and then 
re-learn, still is BAYES_50. The nham and nspam values used to increase very 
rapidly (sometimes by a value of 200-300 per day). No errors are produced. I 
wouldn't have noticed the particular problem, but fortunately during the 
last days we started having more spam than usual to be passing. Also, I 
tried to force an expiration many times, but as you can see the expiration 
did not take place. Its definitely not a file permission issue.

Thanks
Number of Spam Messages:49,740
Number of Ham Messages: 47,167
Number of Tokens:   123,325
Oldest Token:   Wed, 2 Feb 2005 06:37:53 +0200
Newest Token:   Sat, 12 Mar 2005 16:07:30 +0200
Last Journal Sync:  Fri, 11 Feb 2005 18:03:10 +0200
Last Expiry:Fri, 11 Feb 2005 15:45:34 +0200
Last Expiry Reduction Count:3,475 tokens
_
FREE pop-up blocking with the new MSN Toolbar - get it now! 
http://toolbar.msn.click-url.com/go/onm00200415ave/direct/01/



Re: Bayes DB does not grow anymore

2005-03-13 Thread GRP Productions
That is the output of --dump magic? I haven't ever seen it formatted that
nicely. I assume you skipped the first line, but there's also missing the
expire atime delta. So, where do you got this from? Not directly from 
sa-learn
--dump magic I'd say. You are running SA thru some interface? You should 
have
said something about the whereabouts of your installation.
You are right, I am using MailWatch. I just posted this output to be easy 
for one to see the actual dates without having to convert. Here is the 
actual output:

# /usr/bin/sa-learn -p /opt/MailScanner/etc/spam.assassin.prefs.conf --dump 
magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  49740  0  non-token data: nspam
0.000  0  47167  0  non-token data: nham
0.000  0 123325  0  non-token data: ntokens
0.000  0 1107319073  0  non-token data: oldest atime
0.000  0 1110636450  0  non-token data: newest atime
0.000  0 1108137790  0  non-token data: last journal sync 
atime
0.000  0 1108129534  0  non-token data: last expiry atime
0.000  0 804361  0  non-token data: last expire atime 
delta
0.000  0   3475  0  non-token data: last expire 
reduction count

Ok. Get the values. Then learn a message to it. Make sure it says that it
actually learned, then check the values again. Is either the spam or ham 
count
increased by one or not?
No it isn't. This is exactly the point I mentioned. But as I said earlier, 
sa-learn claims it has learned, even from the web interface:
SA Learn: Learned from 1 message(s) (1 message(s) examined).

Ok, this finally looks a bit suspicious. No sync and no expire for a month. 
If
it doesn't sync you don't get new tokens. Check in your bayes directory how 
big
your bayes_journal is. I'd think it's quite big. Do a sync now. (Please 
don't
do it via an interface, do it on the command line.) What's the output? Is 
the
journal gone and the number of tokens increased now? If so, you need to
investigate why it doesn't sync anymore. Also do an expire then.
This is getting more suspicious: there is no bayes_journal file!
# ll /var/spool/MailScanner/bayes/
total 11780
drwxrwxrwx  2 root nobody 4096 Mar 14 00:22 .
drwxr-xr-x  4 root nobody 4096 Mar 13 11:55 ..
-rw-rw-rw-  1 root nobody 1236 Mar 14 00:22 bayes.mutex
-rw-rw-rw-  1 root nobody 10452992 Mar 14 00:22 bayes_seen
-rw-rw-rw-  1 root nobody  5509120 Mar 14 00:02 bayes_toks
I can assure you noone has touched anything inside this directory. If this 
is the reason for the problems I've been facing, is there a way to recreate 
the file without having to lose my current data? (perhaps by copying the 
above files somewhere, execute sa-learn --clear and some time later restore 
the above files?)

Thanks for your help
_
Express yourself instantly with MSN Messenger! Download today it's FREE! 
http://messenger.msn.click-url.com/go/onm00200471ave/direct/01/