Bug#289758: bogofilter always reports spamicity=0.520000

2005-01-12 Thread Christian Garbs
On Mon, Jan 10, 2005 at 04:15:18PM -0500, Clint Adams wrote:
  in the bogofilter manpage (except for an extra -l for syslog output
  and an explicit -d ~/.bogofilter to point to my directory).
 
 What's the output of bogoutil -p ~/.bogofilter .MSG_COUNT ?

Thanks, this was a huge push into the right direction!

I had about 25 spam messages, but no ham messages at all, so
everything was flagged as unsure.  I now sort unsure mails to an extra
folder where I manually classify the messages as either ham or spam
(see below) and everything works.


I've always trained bogofilter on the job with the procmail receipt
from the manpage.  Since the current bogofilter versions use a
tristate (ham/spam/unsure), this receipt won't work if you start with
an emtpy word list: only spam is moved to an extra folder, so you
can't distinguish ham from unsure.

Perhaps the receipt in the manpage should be changed to this:

   # filter mail through bogofilter, tagging it as spam and
   # updating the wordlist

   :0fw
   | bogofilter -u -e -p

   # if bogofilter failed, return the mail to the queue, the MTA will
   # retry to deliver it later
   # 75 is the value for EX_TEMPFAIL in /usr/include/sysexits.h

   :0e
   { EXITCODE=75 HOST }

   # file the mail to unsure-bogofilter if it is neither ham or spam.

   :0:
   * ^X-Bogosity: Unsure, tests=bogofilter
   unsure-bogofilter

   # file the mail to spam-bogofilter if it's spam.

   :0:
   * ^X-Bogosity: Spam, tests=bogofilter
   spam-bogofilter


With this receipt, you can train bogofilter starting with an empty
 wordlist.  Be sure to always keep your unsure-folder empty by marking
 the messages therein as either ham or spam.  If you do this,
 bogofilter will automatically learn and after some time, no messages
 should be marked as unsure any more.

Feel free to close this bug or, if you think it's appropriate, forward
my suggestion on the manpage to upstream (and perhaps lower the bug
importance).

Regards,
Christian
-- 
Christian.Garbs.http://www.cgarbs.de

Futurama - comming soon to an illegal DVD


signature.asc
Description: Digital signature


Bug#289758: bogofilter always reports spamicity=0.520000

2005-01-10 Thread Christian Garbs
Package: bogofilter
Version: 0.93.3.1-1
Severity: important

I had some problems with the upgrade from 0.93.1 bogofilter version,
so I just removed my old wordlist and started new with a clean and
empty ~/.bogofilter directory.

I'm using procmail to classify my mail, the receipt looks like the one
in the bogofilter manpage (except for an extra -l for syslog output
and an explicit -d ~/.bogofilter to point to my directory).

The mails actually get filtered by bogofilter, but every single mail
gets this classification:

X-Bogosity: Unsure, tests=bogofilter, spamicity=0.52, version=0.93.3.1

I can mark a mail as spam and then bounce it back to myself, it will
be scanned properly and get spamicity=0.52 again.

My syslog looks like this:

Jan 10 08:23:29 yggdrasil bogofilter[8910]: register-Ns, 250 words, 1 messages
Jan 10 08:23:37 yggdrasil bogofilter[8932]: register-Ns, 282 words, 1 messages
Jan 10 08:24:07 yggdrasil bogofilter[8966]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 08:24:14 yggdrasil bogofilter[8972]: register-Ns, 175 words, 1 messages
Jan 10 09:12:15 yggdrasil bogofilter[11668]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 09:24:08 yggdrasil bogofilter[12207]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
[...]
Jan 10 16:24:06 yggdrasil bogofilter[2119]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 16:36:10 yggdrasil bogofilter[2965]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 17:24:09 yggdrasil bogofilter[5313]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 17:56:28 yggdrasil bogofilter[6937]: register-Ns, 207 words, 1 messages
Jan 10 17:56:39 yggdrasil bogofilter[6955]: register-Ns, 1069 words, 1 messages
Jan 10 17:56:56 yggdrasil bogofilter[6963]: register-Ns, 195 words, 1 messages
Jan 10 18:00:14 yggdrasil bogofilter[17895]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 18:12:06 yggdrasil bogofilter[24940]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
[...]
Jan 10 21:36:14 yggdrasil bogofilter[16926]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:37:36 yggdrasil bogofilter[17010]: register-Ns, 159 words, 1 messages
Jan 10 21:38:19 yggdrasil bogofilter[17060]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:38:34 yggdrasil bogofilter[17083]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:38:37 yggdrasil bogofilter[17086]: register-Sn, 144 words, 1 messages
Jan 10 21:38:58 yggdrasil bogofilter[17102]: register-Sn, 201 words, 1 messages
Jan 10 21:39:00 yggdrasil bogofilter[17114]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:39:04 yggdrasil bogofilter[17148]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:39:05 yggdrasil bogofilter[17151]: register-Sn, 255 words, 1 messages
Jan 10 21:48:08 yggdrasil bogofilter[17748]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 
Jan 10 21:48:09 yggdrasil bogofilter[17764]: X-Bogosity: Unsure, 
spamicity=0.52, version=0.93.3.1 

With the old version 0.93.1 everything was fine for me:

Jan  6 11:48:09 yggdrasil bogofilter[15096]: X-Bogosity: Spam, 
spamicity=1.00, version=0.93.1, register-s, 220 words, 1 messages 
Jan  6 11:48:09 yggdrasil bogofilter[15101]: X-Bogosity: Spam, 
spamicity=1.00, version=0.93.1, register-s, 179 words, 1 messages 
Jan  6 11:52:57 yggdrasil bogofilter[15413]: X-Bogosity: Unsure, 
spamicity=0.499758, version=0.93.1 
Jan  6 12:24:07 yggdrasil bogofilter[16686]: X-Bogosity: Ham, 
spamicity=0.00, version=0.93.1, register-n, 320 words, 1 messages 
Jan  6 12:36:06 yggdrasil bogofilter[17203]: X-Bogosity: Ham, 
spamicity=0.00, version=0.93.1, register-n, 359 words, 1 messages

My freshly created directory looks like this:

[EMAIL PROTECTED]:~$ ls -l .bogofilter/
insgesamt 2772
-rw---  1 mitch mitch   16384 2005-01-10 21:38 __db.001
-rw---  1 mitch mitch 5251072 2005-01-10 21:38 __db.002
-rw---  1 mitch mitch   98304 2005-01-10 21:38 __db.003
-rw---  1 mitch mitch 4063232 2005-01-10 21:38 __db.004
-rw---  1 mitch mitch   16384 2005-01-10 21:38 __db.005
-rw---  1 mitch mitch   0 2005-01-10 21:38 lockfile-d
-rw---  1 mitch mitch1024 2005-01-10 21:48 lockfile-p
-rw---  1 mitch mitch 1048576 2005-01-10 21:48 log.01
-rw---  1 mitch mitch   20480 2005-01-10 21:39 wordlist.db

My crontab contains this entry to remove the huge logs that can
accumulate:

#
# remove bogofilter database transaction logs
#
44 4* * *   db4.3_archive -h ~/.bogofilter -d


What is wrong here?
Why do all my mails get classified as spamicity=0.52?

-- System Information:
Debian Release: 3.1
  APT prefers testing
  APT policy: (500, 'testing'), (50, 'unstable')
Architecture: i386 (i686)
Kernel: Linux 2.6.10
Locale: [EMAIL PROTECTED], [EMAIL PROTECTED] (charmap=ISO-8859-15)

Versions of packages bogofilter depends on:
ii