Problem with Bayes and AutoLearning

2004-09-24 Thread Thomas Bolioli
I am having a problem with 2.63 not using bayes. (NB: setup is using 
individual data and triggering using .4ward, procmail and postfix with 
no individual .sa and .procmail files) I have trained each of three 
accounts with over 1000 ham and some 48K spam messages. SA is working 
and tagging spam based on all tests other than bayes. I make changes to 
the global SA conf and those changes are acted upon so I know that spamd 
is seeing my global conf (below). Also below is a sample header w/ 
report. Needless to say, the auto learn feature is not working as well. 
That is how I knew something was going on. The machine is a standard 
Mandrake 10 setup with regards to SA.
Thanks in advance,
Tom

My Conf:
auto_whitelist_path/var/spool/spamassassin/auto-whitelist
auto_whitelist_file_mode   0666
use_bayes 1
bayes_path ~/.spammer
bayes_file_mode 0700
bayes_use_hapaxes 1
bayes_expiry_max_db_size 150
#bayes_learn_to_journal 1
bayes_auto_learn 1
bayes_auto_learn_threshold_nonspam 1
bayes_auto_learn_threshold_spam 6
rewrite_subject 0
report_safe 0
skip_rbl_checks 1
# How many hits before a message is considered spam.
required_hits   3.0
## Optional Score Increases
#score BAYES_99 4.300
#score BAYES_90 3.500
#score BAYES_80 3.000
Sample Header:
Return-Path: [EMAIL PROTECTED]
X-Original-To: [EMAIL PROTECTED]
Delivered-To: [EMAIL PROTECTED]
Received: from g66dc.g.pppool.de (g66dc.g.pppool.de [80.185.102.220])
   by smtp.terranovum.com (Postfix) with SMTP id 708503E6F9B
   for [EMAIL PROTECTED]; Fri, 24 Sep 2004 13:54:40 -0400 (EDT)
Original-Encoded-Information-Types: multipart/alternative
Language: English
Disclose-Recipients: No
Reply-To: Lillian Fitzpatrick [EMAIL PROTECTED]
From: Lillian Fitzpatrick [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: no more red light tickets!
Date: Fri, 24 Sep 2004 14:40:57 -0500
MIME-Version: 1.0
Content-Type: multipart/alternative;
   boundary=--58012207185158267337
Message-Id: [EMAIL PROTECTED]
X-Spam-Flag: YES
X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on 
nova.terranovum.com
X-Spam-Level: ***
X-Spam-Status: Yes, hits=7.3 required=3.0 
tests=CLICK_BELOW,FORGED_YAHOO_RCVD,
   HTML_50_60,HTML_FONTCOLOR_RED,HTML_FONT_INVISIBLE,HTML_IMAGE_ONLY_04,
   HTML_LINK_CLICK_HERE,HTML_MESSAGE,MIME_HTML_ONLY,MIME_HTML_ONLY_MULTI,
   MSGID_FROM_MTA_SHORT autolearn=no version=2.63
X-Spam-Report:
   *  0.1 HTML_LINK_CLICK_HERE BODY: HTML link text says click here
   *  0.0 HTML_MESSAGE BODY: HTML included in message
   *  0.1 MIME_HTML_ONLY BODY: Message only has text/html MIME parts
   *  0.4 HTML_FONT_INVISIBLE BODY: HTML font color is same as background
   *  0.2 HTML_50_60 BODY: Message is 50% to 60% HTML
   *  0.1 HTML_FONTCOLOR_RED BODY: HTML font color is red
   *  1.5 HTML_IMAGE_ONLY_04 BODY: HTML: images with 200-400 bytes of words
   *  3.3 MSGID_FROM_MTA_SHORT Message-Id was added by a relay
   *  0.5 FORGED_YAHOO_RCVD 'From' yahoo.com does not match 'Received' 
headers
   *  0.0 CLICK_BELOW Asks you to click below
   *  1.1 MIME_HTML_ONLY_MULTI Multipart message only has text/html 
MIME parts




Re: Problem with Bayes and AutoLearning

2004-09-24 Thread Thomas Bolioli
I do not believe that is an issue. It only puts the bayes databases at 
~/.spammer_toks and ~/.spammer_seen. sa-learn has not had a problem 
loading the databases. They have grown everytime I have used it. I can't 
see why spamd would have a problem with it.
Tom

Matt Kettler wrote:
At 03:40 PM 9/24/2004, Thomas Bolioli wrote:
bayes_path ~/.spammer

This statement is invalid if a directory named .spammer exists in 
the user's home..

Please read the docs on bayes_path VERY carefully. Despite being named 
path it's really path, plus filename prefix.

Thus bayes_path should be something like ~/.spammer/bayes
However, why over-ride it at all? it defaults to ~/.spamassassin/bayes



Re: Problem with Bayes and AutoLearning

2004-09-24 Thread Matt Kettler
At 04:10 PM 9/24/2004, Thomas Bolioli wrote:
I do not believe that is an issue. It only puts the bayes databases at 
~/.spammer_toks and ~/.spammer_seen. sa-learn has not had a problem 
loading the databases. They have grown everytime I have used it. I can't 
see why spamd would have a problem with it.
Fair enough. Like I said, it's a syntax error if a directory named 
~/.spammer/ exists. However, if it doesn't exist, it's fine.

Are you sure spamc is being invoked as the proper user, and not as root?
spamd will fall back to nobody if it finds itself still running as root 
after setuiding to the client user. You could try copying a set of files 
into the path of nobody's home-dir and see if bayes starts running.





Re: Problem with Bayes and AutoLearning

2004-09-24 Thread Thomas Bolioli
I changed the path just in case. It was that way as a mistake anyhow. 
Here is the output of lint. (it is exactly the same as with the other 
paths so I am sure that is not the issue.) Note that it works there. 
Although not when run through procmail. I think your idea about users is 
on to something.
My .forward file is
|IFS=' '  exec /usr/bin/procmail || exit 75 #webmaster
Quotes and all. Is that correct?
Tom

[EMAIL PROTECTED] webmaster]$ spamassassin -D --lint
debug: Score set 0 chosen.
debug: running in taint mode? yes
debug: Running in taint mode, removing unsafe env vars, and resetting PATH
debug: PATH included '/sbin', keeping.
debug: PATH included '/usr/sbin', keeping.
debug: PATH included '/bin', keeping.
debug: PATH included '/usr/bin', keeping.
debug: PATH included '/usr/X11R6/bin', which doesn't exist, dropping.
debug: PATH included '/usr/local/bin', keeping.
debug: PATH included '/usr/local/sbin', keeping.
debug: Final PATH set to: 
/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin
debug: ignore: using a test message to lint rules
debug: using /usr/share/spamassassin for default rules dir
debug: using /etc/mail/spamassassin for site rules dir
debug: using /home/webmaster/.spamassassin for user state dir
debug: using /home/webmaster/.spamassassin/user_prefs for user prefs file
debug: bayes: 28490 tie-ing to DB file R/O 
/home/webmaster/.spamassassin/bayes_toks
debug: bayes: 28490 tie-ing to DB file R/O 
/home/webmaster/.spamassassin/bayes_seen
debug: bayes: found bayes db version 2
debug: Score set 3 chosen.
debug: Initialising learner
debug: running header regexp tests; score so far=0
debug: running body-text per-line regexp tests; score so far=2.077
debug: bayes corpus size: nspam = 47336, nham = 1028
debug: uri tests: Done uriRE
debug: tokenize: header tokens for *F = U*ignore 
D*compiling.spamassassin.taint.org D*spamassassin.taint.org D*taint.org 
D*org
debug: tokenize: header tokens for *m =  1096056335 lint_rules 
debug: bayes token 'TextCat' = 0.0489090909090909
debug: bayes token 'somewhat' = 0.095669124722507
debug: bayes token 'H*F:D*org' = 0.122005426957751
debug: bayes: score = 0.0118746978798883
debug: bayes: 28490 untie-ing
debug: bayes: 28490 untie-ing db_toks
debug: bayes: 28490 untie-ing db_seen
debug: Razor2 is not available
debug: running raw-body-text per-line regexp tests; score so far=2.077
debug: running uri tests; score so far=2.077
debug: uri tests: Done uriRE
debug: running full-text regexp tests; score so far=2.077
debug: Razor2 is not available
debug: Current PATH is: 
/sbin:/usr/sbin:/bin:/usr/bin:/usr/local/bin:/usr/local/sbin
debug: Pyzor is not available: pyzor not found
debug: DCCifd is not available: no r/w dccifd socket found.
debug: DCC is not available: no executable dccproc found.
debug: all '*From' addrs: [EMAIL PROTECTED]
debug: all '*To' addrs:
debug: is Net::DNS::Resolver available? no
debug: is DNS available? 0
debug: running meta tests; score so far=2.077
debug: is spam? score=0.553 required=3 
tests=BAYES_01,DATE_MISSING,NO_REAL_NAME

Matt Kettler wrote:
At 04:10 PM 9/24/2004, Thomas Bolioli wrote:
I do not believe that is an issue. It only puts the bayes databases 
at ~/.spammer_toks and ~/.spammer_seen. sa-learn has not had a 
problem loading the databases. They have grown everytime I have used 
it. I can't see why spamd would have a problem with it.

Fair enough. Like I said, it's a syntax error if a directory named 
~/.spammer/ exists. However, if it doesn't exist, it's fine.

Are you sure spamc is being invoked as the proper user, and not as root?
spamd will fall back to nobody if it finds itself still running as 
root after setuiding to the client user. You could try copying a set 
of files into the path of nobody's home-dir and see if bayes starts 
running.