Re: Can't get sa-learn to work

2006-09-08 Thread Bo Mellberg

Theo Van Dinter skrev:

On Thu, Sep 07, 2006 at 11:27:36AM +0200, Bo Mellberg wrote:

max:/#sa-learn -D --sync

which would upgrade the db from version 0 to version 2.


FWIW, the upgrade occurs anytime a DB write is attempted, --sync just
forces a write.


OK. Got it.




[25567] dbg: bayes: found bayes db version 0
[25567] dbg: bayes: detected bayes db format 0, upgrading
[25567] dbg: bayes: upgrading database format from v0 to v2
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock

The last row just keeps repeating itself until I ctrl-C out of it.


Nothing seems crazy so far -- the upgrade may take a long time if there
are a lot of tokens, so SA refreshes the lock periodically so it doesn't
loose it.


You were perfectly right! I waited some more and it actually finished:

[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: bayes: upgraded database format from v2 to v3 in 89 seconds
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: bayes: expiry completed
[29404] dbg: bayes: untie-ing
[29404] dbg: bayes: untie-ing db_toks
[29404] dbg: bayes: untie-ing db_seen
[29404] dbg: bayes: files locked, now unlocking lock
[29404] dbg: locker: safe_unlock: unlink /root/.spamassassin/bayes.lock




What does sa-learn --dump magic say?



Well, before it didn't say anything, just complained about DB version 0, 
but now it gives me:


max:~# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1157694555  0  non-token data: oldest atime
0.000  0 1157694555  0  non-token data: newest atime
0.000  0 1157694555  0  non-token data: last journal 
sync atime

0.000  0 1157694555  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


So I guess I'm good. Thanks a bunch!

/Bo


Which DB is actually used?

2006-09-08 Thread Bo Mellberg

I have SA 3.1.4 configured and running on Debian Sarge using apt-get.

I'm finding it hard to know what directory is actually used for the 
bayes-database:


max:~# ls /root/.spamassassin/ -al
total 2344
drwx--  2 root root4096 Sep  8 07:52 .
drwxr-xr-x 12 root root4096 Sep  5 09:37 ..
-rw---  1 root root   12288 Sep  4 14:20 auto-whitelist
-rw-rw-rw-  1 root root   6 Sep  4 14:20 auto-whitelist.mutex
-rw-rw-rw-  1 root root   13992 Sep  4 14:08 bayes.mutex
-rw---  1 root root  344064 Sep  4 14:05 bayes_seen
-rw---  1 root root 2605056 Sep  8 07:52 bayes_toks
-rw-r--r--  1 root root1487 Sep  4 14:20 user_prefs
max:~# ls /home/bosse/.spamassassin/ -al
total 4564
drwx--S--- 2 bosse bosse4096 Sep  7 10:35 .
drwxr-sr-x 5 bosse bosse4096 Aug 31 16:19 ..
-rw--- 1 root  bosse   12288 Sep  6 01:06 auto-whitelist
-rw--- 1 root  bosse   6 Sep  6 01:06 auto-whitelist.mutex
-rw-rw-rw- 1 bosse bosse   15282 Sep  6 01:06 bayes.mutex
-rw--- 1 root  bosse   86136 Sep  6 01:06 bayes_journal
-rw--- 1 bosse bosse  339968 Sep  6 01:06 bayes_seen
-rw--- 1 root  bosse 5255168 Sep  6 01:06 bayes_toks
-rw--- 1 root  bosse1165 Oct  2  2005 user_prefs
max:~# ls /var/spool/exim4/.spamassassin/ -al
total 3424
drwx-- 2 Debian-exim Debian-exim4096 Sep  8 08:04 .
drwxr-x--- 7 Debian-exim Debian-exim4096 Sep  5 15:54 ..
-rw--- 1 Debian-exim Debian-exim 1298432 Sep  8 08:04 auto-whitelist
-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 
auto-whitelist.mutex

-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 bayes.mutex
-rw--- 1 Debian-exim Debian-exim   64704 Sep  8 08:04 bayes_journal
-rw--- 1 Debian-exim Debian-exim  319488 Sep  8 08:04 bayes_seen
-rw--- 1 Debian-exim Debian-exim 2629632 Sep  8 08:04 bayes_toks
-rw-r--r-- 1 Debian-exim Debian-exim1175 Nov  1  2005 user_prefs

As you can see there are three directories which are all quite recently 
changed. How can I make sure that only one directory is used?


I would like to make SA site-wide, but the filtering is working really 
good right now so I'm afraid i'll break something. BTW, the user bosse 
is my own account used for my email.


* I just performed sa-learn --sync -D as root.
* I've never touched the exim directory, still it has the latest change 
date.


Thanks in advance.

/Bo


Can't get sa-learn to work

2006-09-08 Thread Al Smith

Since upgrading to 3.1.5, I get this when trying to use sa-learn:

# sa-learn --showdots --spam --mbx spam
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory
archive-iterator: unable to open spam.spam: No such file or directory

Learned tokens from 0 message(s) (0 message(s) examined)
#

Any ideas?
Al.


On Fri, 8 Sep 2006, Bo Mellberg wrote:


Theo Van Dinter skrev:

On Thu, Sep 07, 2006 at 11:27:36AM +0200, Bo Mellberg wrote:

max:/#sa-learn -D --sync

which would upgrade the db from version 0 to version 2.


FWIW, the upgrade occurs anytime a DB write is attempted, --sync just
forces a write.


OK. Got it.




[25567] dbg: bayes: found bayes db version 0
[25567] dbg: bayes: detected bayes db format 0, upgrading
[25567] dbg: bayes: upgrading database format from v0 to v2
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[25567] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock

The last row just keeps repeating itself until I ctrl-C out of it.


Nothing seems crazy so far -- the upgrade may take a long time if there
are a lot of tokens, so SA refreshes the lock periodically so it doesn't
loose it.


You were perfectly right! I waited some more and it actually finished:

[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: bayes: upgraded database format from v2 to v3 in 89 seconds
[29404] dbg: locker: refresh_lock: refresh /root/.spamassassin/bayes.lock
[29404] dbg: bayes: expiry completed
[29404] dbg: bayes: untie-ing
[29404] dbg: bayes: untie-ing db_toks
[29404] dbg: bayes: untie-ing db_seen
[29404] dbg: bayes: files locked, now unlocking lock
[29404] dbg: locker: safe_unlock: unlink /root/.spamassassin/bayes.lock




What does sa-learn --dump magic say?



Well, before it didn't say anything, just complained about DB version 0, but 
now it gives me:


max:~# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1157694555  0  non-token data: oldest atime
0.000  0 1157694555  0  non-token data: newest atime
0.000  0 1157694555  0  non-token data: last journal sync 
atime

0.000  0 1157694555  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime 
delta
0.000  0  0  0  non-token data: last expire reduction 
count


So I guess I'm good. Thanks a bunch!

/Bo




Re: Can't get sa-learn to work

2006-09-08 Thread Loren Wilton

max:~# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1157694555  0  non-token data: oldest atime
0.000  0 1157694555  0  non-token data: newest atime
0.000  0 1157694555  0  non-token data: last journal sync 
atime

0.000  0 1157694555  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime 
delta
0.000  0  0  0  non-token data: last expire 
reduction count


So I guess I'm good. Thanks a bunch!


Well maybe.  That report says that you don't have any ham or spam messages 
in the database, and you need at least 200 of each before Bayes will do 
anything fo you.


If that was what you were expecting, fine.  If not...

   Loren



Re: Can't get sa-learn to work

2006-09-08 Thread Bo Mellberg



Loren Wilton skrev:

max:~# sa-learn --dump magic
0.000  0  3  0  non-token data: bayes db version
0.000  0  0  0  non-token data: nspam
0.000  0  0  0  non-token data: nham
0.000  0  0  0  non-token data: ntokens
0.000  0 1157694555  0  non-token data: oldest atime
0.000  0 1157694555  0  non-token data: newest atime
0.000  0 1157694555  0  non-token data: last journal 
sync atime

0.000  0 1157694555  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


So I guess I'm good. Thanks a bunch!


Well maybe.  That report says that you don't have any ham or spam 
messages in the database, and you need at least 200 of each before Bayes 
will do anything fo you.


If that was what you were expecting, fine.  If not...

   Loren


AHA! That brings me to my other question on this mailing list Which DB 
is actually used?.


I did sa-learn --dump magic as root, but root has never been tought any 
spam or ham.


I did sa-learn --dump magic --dbpath ...:

max:~# sa-learn --dump magic --dbpath /home/bosse/.spamassassin/
0.000  0  3  0  non-token data: bayes db version
0.000  0   2447  0  non-token data: nspam
0.000  0   1320  0  non-token data: nham
0.000  0 152520  0  non-token data: ntokens
0.000  0 1145221056  0  non-token data: oldest atime
0.000  0 1157497564  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal 
sync atime

0.000  0 1157464926  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire 
atime delta
0.000  0  21382  0  non-token data: last expire 
reduction count


max:~# sa-learn --dump magic --dbpath /var/spool/exim4/.spamassassin/
0.000  0  3  0  non-token data: bayes db version
0.000  0477  0  non-token data: nspam
0.000  0   1966  0  non-token data: nham
0.000  0  69851  0  non-token data: ntokens
0.000  0 1130849621  0  non-token data: oldest atime
0.000  0 1157701646  0  non-token data: newest atime
0.000  0 1157687964  0  non-token data: last journal 
sync atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire 
atime delta
0.000  0  0  0  non-token data: last expire 
reduction count


Since the filtering is working quite well, I guess one of these two 
databases are used. The user bosse is my own user for emails. The 
database in /var/spool/exim4 is that for auto learning?


Questions:

1. How can I know which of these two are actually used for filtering the 
emails for bosse.


2. Can I move the db in use for site wide usage and auto-learning?


Re: Can't get sa-learn to work

2006-09-08 Thread Loren Wilton
AHA! That brings me to my other question on this mailing list Which DB is 
actually used?.


General rule for learning: learn as the user you use to filter mail. 
Corolary: don't learn as root, since SA never runs as root.


You can generally set up SA two different ways: site-wide bayes, or 
individual user bayes.  How you do either probably depends in part on what 
the rest of your mail system is.


It isn't quite clear to me how you have your system set up.  In any case I 
don't know enough to tell you exactly what you need to do to get either 
effect; I could only suggest RTFM, which doesn't always help much.


It looks to me like the exim user is probably the user that you main mail 
processing is running under.  I'd say that was the main database and you 
were set up site-wide, but that doesn't explain why there is a second 
database under your usercode.


Perhaps the exim user is the main filter and has been doing auto-learning, 
and you have also been manually learning mails under your own usercode?  But 
the database sizes are relatively even, so maybe you somehow are processing 
mail under two different usercodes for different recipients?


   Loren



I did sa-learn --dump magic as root, but root has never been tought any 
spam or ham.


I did sa-learn --dump magic --dbpath ...:

max:~# sa-learn --dump magic --dbpath /home/bosse/.spamassassin/
0.000  0  3  0  non-token data: bayes db version
0.000  0   2447  0  non-token data: nspam
0.000  0   1320  0  non-token data: nham
0.000  0 152520  0  non-token data: ntokens
0.000  0 1145221056  0  non-token data: oldest atime
0.000  0 1157497564  0  non-token data: newest atime
0.000  0  0  0  non-token data: last journal sync 
atime

0.000  0 1157464926  0  non-token data: last expiry atime
0.000  0   11059200  0  non-token data: last expire atime 
delta
0.000  0  21382  0  non-token data: last expire 
reduction count


max:~# sa-learn --dump magic --dbpath /var/spool/exim4/.spamassassin/
0.000  0  3  0  non-token data: bayes db version
0.000  0477  0  non-token data: nspam
0.000  0   1966  0  non-token data: nham
0.000  0  69851  0  non-token data: ntokens
0.000  0 1130849621  0  non-token data: oldest atime
0.000  0 1157701646  0  non-token data: newest atime
0.000  0 1157687964  0  non-token data: last journal sync 
atime

0.000  0  0  0  non-token data: last expiry atime
0.000  0  0  0  non-token data: last expire atime 
delta
0.000  0  0  0  non-token data: last expire 
reduction count


Since the filtering is working quite well, I guess one of these two 
databases are used. The user bosse is my own user for emails. The 
database in /var/spool/exim4 is that for auto learning?


Questions:

1. How can I know which of these two are actually used for filtering the 
emails for bosse.


2. Can I move the db in use for site wide usage and auto-learning? 




Re: Which DB is actually used?

2006-09-08 Thread jdow

From: Bo Mellberg [EMAIL PROTECTED]


I have SA 3.1.4 configured and running on Debian Sarge using apt-get.

I'm finding it hard to know what directory is actually used for the 
bayes-database:


max:~# ls /root/.spamassassin/ -al
total 2344
drwx--  2 root root4096 Sep  8 07:52 .
drwxr-xr-x 12 root root4096 Sep  5 09:37 ..
-rw---  1 root root   12288 Sep  4 14:20 auto-whitelist
-rw-rw-rw-  1 root root   6 Sep  4 14:20 auto-whitelist.mutex
-rw-rw-rw-  1 root root   13992 Sep  4 14:08 bayes.mutex
-rw---  1 root root  344064 Sep  4 14:05 bayes_seen
-rw---  1 root root 2605056 Sep  8 07:52 bayes_toks
-rw-r--r--  1 root root1487 Sep  4 14:20 user_prefs
max:~# ls /home/bosse/.spamassassin/ -al
total 4564
drwx--S--- 2 bosse bosse4096 Sep  7 10:35 .
drwxr-sr-x 5 bosse bosse4096 Aug 31 16:19 ..
-rw--- 1 root  bosse   12288 Sep  6 01:06 auto-whitelist
-rw--- 1 root  bosse   6 Sep  6 01:06 auto-whitelist.mutex
-rw-rw-rw- 1 bosse bosse   15282 Sep  6 01:06 bayes.mutex
-rw--- 1 root  bosse   86136 Sep  6 01:06 bayes_journal
-rw--- 1 bosse bosse  339968 Sep  6 01:06 bayes_seen
-rw--- 1 root  bosse 5255168 Sep  6 01:06 bayes_toks
-rw--- 1 root  bosse1165 Oct  2  2005 user_prefs
max:~# ls /var/spool/exim4/.spamassassin/ -al
total 3424
drwx-- 2 Debian-exim Debian-exim4096 Sep  8 08:04 .
drwxr-x--- 7 Debian-exim Debian-exim4096 Sep  5 15:54 ..
-rw--- 1 Debian-exim Debian-exim 1298432 Sep  8 08:04 auto-whitelist
-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 
auto-whitelist.mutex

-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 bayes.mutex
-rw--- 1 Debian-exim Debian-exim   64704 Sep  8 08:04 bayes_journal
-rw--- 1 Debian-exim Debian-exim  319488 Sep  8 08:04 bayes_seen
-rw--- 1 Debian-exim Debian-exim 2629632 Sep  8 08:04 bayes_toks
-rw-r--r-- 1 Debian-exim Debian-exim1175 Nov  1  2005 user_prefs

As you can see there are three directories which are all quite recently 
changed. How can I make sure that only one directory is used?


I would like to make SA site-wide, but the filtering is working really 
good right now so I'm afraid i'll break something. BTW, the user bosse 
is my own account used for my email.


* I just performed sa-learn --sync -D as root.
* I've never touched the exim directory, still it has the latest change 
date.


Thanks in advance.

/Bo


Bo - I can't particularly help you with the single site-wide database
thing. It seems you have a bit if a mishmash that depending on things
you have done may be actually acting the way you want it to act. It
looks like you might have played with training or tests as bosse
and root and otherwise have everything working on the exim4 global
database. Always test and train as the user that is used for filtering
the email by the MTA. Other tests and training are meaningless.

If you do not have many users at all, dozens or less, then do
consider using per user BAYES. It CAN provide the users with a better
anti-spam experience. The reasoning behind this is that one user's
spam is almost always going to be some other user's ham. If you have
hundreds then there might be a good reason for a single BAYES database.
By the time you're into thousands you're using virtual accounts and
a global database may be required. But it won't provide quite the pin-
point accuracy of a per user database.

{^_^}



Re: Juste a little question

2006-09-08 Thread jdow

From: [EMAIL PROTECTED]


Hi everyone,

I have a little problem :

As you can see below, this message is scoring : 25.1 pts, but instead i 
see 10,00 points.


How to i set spamassassin to have stronger limit or the score ?

Content analysis details:   (10,00 points, 6,00 required)

pts rule name  description
 -- --
0,3 DATE_IN_PAST_03_06 Date: is 3 to 6 hours before Received: date
0,2 HTML_MESSAGE   HTML included in message
3,0 FORGED_MUA_OUTLOOK Forged mail pretending to be from MS Outlook
9,9 URIBL_SURBLURIBL_SURBL  Contains an URL listed in the SURBL b
0,8 MY_DSL I could use a BL for this.
0,0 NO_RDNS2   Sending MTA has no reverse DNS
0,6 J_CHICKENPOX_36
0,6 J_CHICKENPOX_51
0,6 J_CHICKENPOX_74
9,0 URIBL_SURBLA   URIBL_SURBLA  Contains an URL listed in the SURBL 


Are you sure that the path from the sender to you involves exactly
one SpamAssassin run, yours? The LAST SpamAssassin run is the one that
gets scored. And many initial setups seem to somehow get SpamAssassin
into the loop twice, which is not good.

{^_^}



Re: Which DB is actually used?

2006-09-08 Thread Bo Mellberg

jdow skrev:

From: Bo Mellberg [EMAIL PROTECTED]


I have SA 3.1.4 configured and running on Debian Sarge using apt-get.

I'm finding it hard to know what directory is actually used for the 
bayes-database:


max:~# ls /root/.spamassassin/ -al
total 2344
drwx--  2 root root4096 Sep  8 07:52 .
drwxr-xr-x 12 root root4096 Sep  5 09:37 ..
-rw---  1 root root   12288 Sep  4 14:20 auto-whitelist
-rw-rw-rw-  1 root root   6 Sep  4 14:20 auto-whitelist.mutex
-rw-rw-rw-  1 root root   13992 Sep  4 14:08 bayes.mutex
-rw---  1 root root  344064 Sep  4 14:05 bayes_seen
-rw---  1 root root 2605056 Sep  8 07:52 bayes_toks
-rw-r--r--  1 root root1487 Sep  4 14:20 user_prefs
max:~# ls /home/bosse/.spamassassin/ -al
total 4564
drwx--S--- 2 bosse bosse4096 Sep  7 10:35 .
drwxr-sr-x 5 bosse bosse4096 Aug 31 16:19 ..
-rw--- 1 root  bosse   12288 Sep  6 01:06 auto-whitelist
-rw--- 1 root  bosse   6 Sep  6 01:06 auto-whitelist.mutex
-rw-rw-rw- 1 bosse bosse   15282 Sep  6 01:06 bayes.mutex
-rw--- 1 root  bosse   86136 Sep  6 01:06 bayes_journal
-rw--- 1 bosse bosse  339968 Sep  6 01:06 bayes_seen
-rw--- 1 root  bosse 5255168 Sep  6 01:06 bayes_toks
-rw--- 1 root  bosse1165 Oct  2  2005 user_prefs
max:~# ls /var/spool/exim4/.spamassassin/ -al
total 3424
drwx-- 2 Debian-exim Debian-exim4096 Sep  8 08:04 .
drwxr-x--- 7 Debian-exim Debian-exim4096 Sep  5 15:54 ..
-rw--- 1 Debian-exim Debian-exim 1298432 Sep  8 08:04 auto-whitelist
-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 
auto-whitelist.mutex

-rw-rw-rw- 1 Debian-exim Debian-exim   6 Sep  4 14:15 bayes.mutex
-rw--- 1 Debian-exim Debian-exim   64704 Sep  8 08:04 bayes_journal
-rw--- 1 Debian-exim Debian-exim  319488 Sep  8 08:04 bayes_seen
-rw--- 1 Debian-exim Debian-exim 2629632 Sep  8 08:04 bayes_toks
-rw-r--r-- 1 Debian-exim Debian-exim1175 Nov  1  2005 user_prefs

As you can see there are three directories which are all quite 
recently changed. How can I make sure that only one directory is used?


I would like to make SA site-wide, but the filtering is working really 
good right now so I'm afraid i'll break something. BTW, the user 
bosse is my own account used for my email.


* I just performed sa-learn --sync -D as root.
* I've never touched the exim directory, still it has the latest 
change date.


Thanks in advance.

/Bo


Bo - I can't particularly help you with the single site-wide database
thing. It seems you have a bit if a mishmash that depending on things
you have done may be actually acting the way you want it to act. It
looks like you might have played with training or tests as bosse
and root and otherwise have everything working on the exim4 global
database. Always test and train as the user that is used for filtering
the email by the MTA. Other tests and training are meaningless.

If you do not have many users at all, dozens or less, then do
consider using per user BAYES. It CAN provide the users with a better
anti-spam experience. The reasoning behind this is that one user's
spam is almost always going to be some other user's ham. If you have
hundreds then there might be a good reason for a single BAYES database.
By the time you're into thousands you're using virtual accounts and
a global database may be required. But it won't provide quite the pin-
point accuracy of a per user database.

{^_^}




Thanks for this info,

It seems like the exim-users database is being touched regularly, so I'm 
guessing that it has been set up by apt-get in some auto-learning state.


I have earlier trained spam and ham as user bosse, which is why there 
is a working db there as well.


As I am the only user on my system, it really doesn't matter if I use 
site-wide or not, but rather how I invoke sa-learn.


Lets say I remove the databases for bosse and root. Is this the 
proper way to invoke sa-learn:


1. Log on as user bosse
2. sa-learn --showdots --sync --dbpath /var/spool/exim4/.spamassassin 
--spam /home/bosse/Maildir/.MissedSpam/cur


If I set up a cron job to do the above I could just toss missed spam 
into the MissedSpam-folder right?


Thanks again!

/Bo


RE: Some Spam getting through

2006-09-08 Thread Bowie Bailey
David Reta wrote:
 I am having an issue with spam not getting caught by the filter.
 
 The spam will score low initially but when I run it on the
 quarantined message a minute later the message will score well over
 the threshold.  
 
 I am using spamassassin 3.1.4 and it is being called through
 mimedefang. I quarantine the message so I can keep a copy on the
 relay. I have a bayes database that is shared over nfs. On this
 particular instance it looks like the bayes test is skipped. Since I
 am using a bayes database that is shared, could this be causing a
 timeout issue and if so how can I increase the timeout so this does
 not occur?  
 
 Here is the MSG.0 File from the quarantine directory
 
 Content analysis details:   (3.6 points, 4.5 required)
 
  pts rule name  description
  --
 -- 
  1.1 EXTRA_MPART_TYPE   Header has extraneous
 Content-type:...type= entry 
  0.1 FORGED_RCVD_HELO   Received: contains a forged HELO
  0.4 HTML_30_40 BODY: Message is 30% to 40% HTML
  0.0 HTML_MESSAGE   BODY: HTML included in message
  2.0 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP
 address [85.99.173.13 listed in
 dnsbl.sorbs.net] 
 
  3.647 4.5

EXTRA_MPART_TYPE,FORGED_RCVD_HELO,HTML_30_40,HTML_MESSAGE,RCVD_IN_SORBS_DUL
 
 Here is the ourput from when I run it manually a minute later.
 
 [EMAIL PROTECTED] qdir-2006-09-07-15.33.07-001]$ spamassassin 
 ENTIRE_MESSAGE | more 
 Content analysis details:   (7.1 points, 4.5 required)
 
  pts rule name  description
  --
 -- 
  1.1 EXTRA_MPART_TYPEHeader has extraneous
 Content-type:...type= entry 
  0.1 FORGED_RCVD_HELO   Received: contains a forged HELO
  0.4 HTML_30_40   BODY: Message is 30% to 40% HTML
  0.0 HTML_MESSAGE   BODY: HTML included in message
  3.5 BAYES_99 BODY: Bayesian spam probability
  is 99 to 100%
 [score: 0.9997] 
  2.0 RCVD_IN_SORBS_DUL  RBL: SORBS: sent directly from dynamic IP
 address
 [85.99.221.218 listed in dnsbl.sorbs.net] 

The only difference between these two runs is Bayes.  Based on this, I
would say that mimedefang is running as one user, and you are testing
as a different user.  The mimedefang user either has Bayes disabled,
or has not learned enough ham and spam to run Bayes.

If you do your test while logged in as the mimedefang user, you should
see identical results to the first run.

Most likely, you need to either use a global Bayes db, or make sure
you are doing your ham/spam learning as the mimedefang user.

-- 
Bowie


RE: site-wide config?

2006-09-08 Thread Bowie Bailey
Russell Jones wrote:
 Sorry if this is covered somewhere in the documentation, and if so
 can someone be nice enough to point it to me :) I can't seem to
 locate it.  
 
 I would like to set spamassassin to use a site-wide configuration, so
 that when I tell it to sa-learn, it will apply what it learns to
 every single email account on the server.  
 
 If someone can point me to the documentation and/or examples of how
 to set this, I would be very grateful. 
 
 Thanks!

man Mail::SpamAssassin::Conf

Search for the bayes_path and bayes_file_mode settings.

You need to create a bayes directory that is not relative to the
user's home and set the mode so that everyone can read and write to
it.

Make sure to read the bayes_path description carefully.  This is NOT a
simple directory path.

-- 
Bowie


Spam on Exchange

2006-09-08 Thread Floyd

Hi,

I am new to SpamAssassin, I have been testing it for about 3 to 4 weeks now
and I have categorized about
4000 ham and about 1000 spam, and I still get the same spam after a few
days. 
What I am trying to figure out is about sa-learn. 
When I look at the logs and it classifies an incoming mail that is spam it
gives it a score of 2.8 out of 6 and therefore it classifies as ham.
Log file:

PreFile:  C:\ESA\NEW\msg060907173314_A5169.in.emlPostFile: 
C:\ESA\NEW\msg060907173314_A5169.out.eml
09-07-2006 05:33:14 :   SpamAssassin: C:\PERL\BIN\SPAMASSASSIN.BAT  
C:\ESA\NEW\msg060907173314_A5169.in.eml 
C:\ESA\NEW\msg060907173314_A5169.out.eml


But when i run spamassassin -t  c:\esa\ham\message in question it gives me
a score of 10.6,

Now why is the previous score different from the manual score. They both are
running spamassassin from c:\perl\bin

The only difference is the logs show c:\perl\bin\spamassassin.bat
and i use c:\perl\bin\spamassassin -t 

Is there a difference beteen the twoI don't think so

Any help would be appreciated

Thanks,
-- 
View this message in context: 
http://www.nabble.com/Spam-on-Exchange-tf2239711.html#a6210978
Sent from the SpamAssassin - Users forum at Nabble.com.



RE: Spam on Exchange

2006-09-08 Thread Bowie Bailey
Floyd wrote:
 Hi,
 
 I am new to SpamAssassin, I have been testing it for about 3 to 4
 weeks now and I have categorized about
 4000 ham and about 1000 spam, and I still get the same spam after a
 few days.
 What I am trying to figure out is about sa-learn.
 When I look at the logs and it classifies an incoming mail that is
 spam it gives it a score of 2.8 out of 6 and therefore it classifies
 as ham. 
 Log file:
 
 PreFile:  C:\ESA\NEW\msg060907173314_A5169.in.eml  PostFile:
 C:\ESA\NEW\msg060907173314_A5169.out.eml
 09-07-2006 05:33:14 :   SpamAssassin: C:\PERL\BIN\SPAMASSASSIN.BAT  
 C:\ESA\NEW\msg060907173314_A5169.in.eml 
 C:\ESA\NEW\msg060907173314_A5169.out.eml
 
 
 But when i run spamassassin -t  c:\esa\ham\message in question it
 gives me a score of 10.6,
 
 Now why is the previous score different from the manual score. They
 both are running spamassassin from c:\perl\bin
 
 The only difference is the logs show c:\perl\bin\spamassassin.bat
 and i use c:\perl\bin\spamassassin -t
 
 Is there a difference beteen the twoI don't think so

c:\perl\bin\spamassassin is a Perl program
c:\perl\bin\spamassassin.bat is a DOS batch file

Take a look at the .bat file and see what options it uses when it
calls spamassassin.  Then do your tests with the same options and see
what you get.

-- 
Bowie


Re: Which DB is actually used?

2006-09-08 Thread Logan Shaw

On Fri, 8 Sep 2006, Bo Mellberg wrote:
It seems like the exim-users database is being touched regularly, so I'm 
guessing that it has been set up by apt-get in some auto-learning state.


Yes, you might want to check whatever's running SpamAssassin and
see what user it's running as and also check the configuration
files (probably in /etc/mail/spamassassin) to see where it's
storing the database.

I have earlier trained spam and ham as user bosse, which is why there is a 
working db there as well.


As I am the only user on my system, it really doesn't matter if I use 
site-wide or not, but rather how I invoke sa-learn.


Lets say I remove the databases for bosse and root. Is this the proper 
way to invoke sa-learn:


1. Log on as user bosse
2. sa-learn --showdots --sync --dbpath /var/spool/exim4/.spamassassin --spam 
/home/bosse/Maildir/.MissedSpam/cur


Probably not, or at least not the best way.

First of all, you need to run sa-learn as the same user that
runs the filtering.  Since you haven't said what user that it
is (whether it's bosse or some other user), it's impossible
to say whether that's the correct user to run sa-learn as.

Second, once you determine the correct user, in most cases
sa-learn should consult the same configuration file that
the learning process does, so there shouldn't be a reason to
give --dbpath.

And finally, you don't really need to run --sync every time
you train the Bayes database, although I guess it wouldn't hurt.

If I set up a cron job to do the above I could just toss missed spam into the 
MissedSpam-folder right?


Yeah, but for efficiency reasons, you'd probably not want
messages in that folder to keep accumulating forever, so you'd
probably want a way to purge them after some period of time.
sa-learn can cope with a situation where you feed it the same
message repeatedly with no harm, but it's still a waste of
CPU cycles.

  - Logan


Re: Spam on Exchange

2006-09-08 Thread Duane Hill
On Friday, September 8, 2006 at 3:40:06 PM, Bowie confabulated:

 Floyd wrote:
 Hi,
 
 I am new to SpamAssassin, I have been testing it for about 3 to 4
 weeks now and I have categorized about
 4000 ham and about 1000 spam, and I still get the same spam after a
 few days.
 What I am trying to figure out is about sa-learn.
 When I look at the logs and it classifies an incoming mail that is
 spam it gives it a score of 2.8 out of 6 and therefore it classifies
 as ham. 
 Log file:
 
 PreFile:  C:\ESA\NEW\msg060907173314_A5169.in.eml  PostFile:
 C:\ESA\NEW\msg060907173314_A5169.out.eml
 09-07-2006 05:33:14 :   SpamAssassin: C:\PERL\BIN\SPAMASSASSIN.BAT  
 C:\ESA\NEW\msg060907173314_A5169.in.eml 
 C:\ESA\NEW\msg060907173314_A5169.out.eml
 
 
 But when i run spamassassin -t  c:\esa\ham\message in question it
 gives me a score of 10.6,
 
 Now why is the previous score different from the manual score. They
 both are running spamassassin from c:\perl\bin
 
 The only difference is the logs show c:\perl\bin\spamassassin.bat
 and i use c:\perl\bin\spamassassin -t
 
 Is there a difference beteen the twoI don't think so

 c:\perl\bin\spamassassin is a Perl program
 c:\perl\bin\spamassassin.bat is a DOS batch file

 Take a look at the .bat file and see what options it uses when it
 calls spamassassin.  Then do your tests with the same options and see
 what you get.

By  default,  when  something  is  ran  on the command line without an
extension,  Windows  assumes  to  add the appropriate extension before
determining  the  proper application to use. Therefore, 'spamassassin'
on  the  command  line  ultimately  runs 'spamassassin.bat'. The other
'spamassassin'  without  the  extension would never get ran unless you
supply the Perl interpreter with the command. I.e. 'perl spamassassin'
or 'c:\perl\bin\perl.exe c:\perl\bin\spamassassin'.

The same would hold true for 'sa-learn' as would 'sa-update'.

-- 
This message was sent using 100% recycled electrons.



RE: Spam on Exchange

2006-09-08 Thread Bowie Bailey
Duane Hill wrote:
 On Friday, September 8, 2006 at 3:40:06 PM, Bowie confabulated:
 
  Floyd wrote:
   Hi,
   
   I am new to SpamAssassin, I have been testing it for about 3 to 4
   weeks now and I have categorized about
   4000 ham and about 1000 spam, and I still get the same spam after
   a few days. What I am trying to figure out is about sa-learn.
   When I look at the logs and it classifies an incoming mail that is
   spam it gives it a score of 2.8 out of 6 and therefore it
   classifies as ham. Log file:
   
   PreFile:  C:\ESA\NEW\msg060907173314_A5169.in.eml  PostFile:
   C:\ESA\NEW\msg060907173314_A5169.out.eml
   09-07-2006 05:33:14 :   SpamAssassin:
   C:\PERL\BIN\SPAMASSASSIN.BAT  
   C:\ESA\NEW\msg060907173314_A5169.in.eml 
   C:\ESA\NEW\msg060907173314_A5169.out.eml 
   
   
   But when i run spamassassin -t  c:\esa\ham\message in question
   it gives me a score of 10.6, 
   
   Now why is the previous score different from the manual score.
   They both are running spamassassin from c:\perl\bin
   
   The only difference is the logs show c:\perl\bin\spamassassin.bat
   and i use c:\perl\bin\spamassassin -t
   
   Is there a difference beteen the twoI don't think so
 
  c:\perl\bin\spamassassin is a Perl program
  c:\perl\bin\spamassassin.bat is a DOS batch file
 
  Take a look at the .bat file and see what options it uses when it
  calls spamassassin.  Then do your tests with the same options and
  see what you get.
 
 By  default,  when  something  is  ran  on the command line without an
 extension,  Windows  assumes  to  add the appropriate extension before
 determining  the  proper application to use. Therefore, 'spamassassin'
 on  the  command  line  ultimately  runs 'spamassassin.bat'. The other
 'spamassassin'  without  the  extension would never get ran unless you
 supply the Perl interpreter with the command. I.e. 'perl spamassassin'
 or 'c:\perl\bin\perl.exe c:\perl\bin\spamassassin'.
 
 The same would hold true for 'sa-learn' as would 'sa-update'.

Ok, I'll give you that one.  I tend to forget about that limitation
since I mostly work with linux.

Still, the symptom that you describe indicates that your mail system
and your test are either running different spamassassin programs, or
using different config files.

Are you logged in as the same user your mail system uses to do the
scan (probably not an issue with Windows, but still a valid question)?

And just out of curiosity, try this command:
c:\perl\bin\spamassassin.bat -t  c:\esa\ham\message

It may or may not take the -t argument, but if not, you can look at
the headers in the message returned.

-- 
Bowie


Customizing RBL and SURBL lists

2006-09-08 Thread D . J .
Greetings all:I intend to eventually have local copies for the lists I wish to use, so it's important for me to figure this out. Currently, I'm just trying to get SA to only do checks against the lists that I want to eventually check. However, it still appearst to be checking all of them. I've basically taken the 20_dnsbl_tests.cf and 25_uribl.cf files, copied them to my /etc/mail/spamassassin directory, and commented out the lists that I don't want checked. However, as I said, it still appears to be checking things. I have a feeling this is because the original 20_dnsbl_tests.cf and 25_uribl.cf files are still in the /usr/share/spamassassin directory, and simply commenting them from a later file won't change matters. However, the catch is that from what I've read, I *don't* want to edit the files under /usr/share/spamassassin as they'll be overwritten upon upgrades, thus losing my customizations. What is the best course of action to accomplish this goal?
Thanks!- D.J.


Re: Customizing RBL and SURBL lists

2006-09-08 Thread Kelson

D.J. wrote:
I've 
basically taken the 20_dnsbl_tests.cf and 25_uribl.cf files, copied them 
to my /etc/mail/spamassassin directory, and commented out the lists that 
I don't want checked.


That won't work, because SpamAssassin will try to merge a bunch of 
comments with the default rules... and come up with the default rules.


The way to disable a rule is to copy the score line to a .cf file in 
your local config directory and assign it a score of 0.  This will 
override the score in the default config, and rules with scores 
explicitly assigned to 0 are not run.


In short, all you need is:

score NAME_OF_RULE 0

--
Kelson Vibber
SpeedGate Communications www.speed.net


SPF Scores

2006-09-08 Thread Michel Vaillancourt

I set up SPF for Wolfstar.ca yesterday, and I've been reading a bit off 
the website about SPF itself.  WRT to SA, I'm interested in knowing if folks 
have adjusted their stock SPF scores or if they've done some custom rules to 
lever this technology?
-- 
--Michel Vaillancourt
Wolfstar Systems
www.wolfstar.ca


Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey








We are testing a new configuration using FuzzyOCR, and found
it to work very well overall 



However, there have been two occasions in the last 24 hrs
where screenshots embedded into the emails caused false positives.



One was an account summary from a cell
company, the other was some internal marketing info.



Are there other approaches to getting certain images white listed
if they contain, say, our specific company name ?



Any other ideas on how to deal with this ?





Many thanks !





Michael Grey












RE: Spam on Exchange

2006-09-08 Thread Floyd

Thanks for the responses

I tried using the command 
spamassassin.bat  c:\esa\ham\message 

and it gave me the same score as spamassassasin -t   c:\esa\ham\message
and it also gave me the same score as spamassassin.bat -t 
c:\esa\ham\message

I am at a loss for words and actions...I do not know why it does not put it
as spam then


Bowie Bailey wrote:
 
 Duane Hill wrote:
 On Friday, September 8, 2006 at 3:40:06 PM, Bowie confabulated:
 
  Floyd wrote:
   Hi,
   
   I am new to SpamAssassin, I have been testing it for about 3 to 4
   weeks now and I have categorized about
   4000 ham and about 1000 spam, and I still get the same spam after
   a few days. What I am trying to figure out is about sa-learn.
   When I look at the logs and it classifies an incoming mail that is
   spam it gives it a score of 2.8 out of 6 and therefore it
   classifies as ham. Log file:
   
   PreFile:  C:\ESA\NEW\msg060907173314_A5169.in.eml  PostFile:
   C:\ESA\NEW\msg060907173314_A5169.out.eml
   09-07-2006 05:33:14 :   SpamAssassin:
   C:\PERL\BIN\SPAMASSASSIN.BAT  
   C:\ESA\NEW\msg060907173314_A5169.in.eml 
   C:\ESA\NEW\msg060907173314_A5169.out.eml 
   
   
   But when i run spamassassin -t  c:\esa\ham\message in question
   it gives me a score of 10.6, 
   
   Now why is the previous score different from the manual score.
   They both are running spamassassin from c:\perl\bin
   
   The only difference is the logs show c:\perl\bin\spamassassin.bat
   and i use c:\perl\bin\spamassassin -t
   
   Is there a difference beteen the twoI don't think so
 
  c:\perl\bin\spamassassin is a Perl program
  c:\perl\bin\spamassassin.bat is a DOS batch file
 
  Take a look at the .bat file and see what options it uses when it
  calls spamassassin.  Then do your tests with the same options and
  see what you get.
 
 By  default,  when  something  is  ran  on the command line without an
 extension,  Windows  assumes  to  add the appropriate extension before
 determining  the  proper application to use. Therefore, 'spamassassin'
 on  the  command  line  ultimately  runs 'spamassassin.bat'. The other
 'spamassassin'  without  the  extension would never get ran unless you
 supply the Perl interpreter with the command. I.e. 'perl spamassassin'
 or 'c:\perl\bin\perl.exe c:\perl\bin\spamassassin'.
 
 The same would hold true for 'sa-learn' as would 'sa-update'.
 
 Ok, I'll give you that one.  I tend to forget about that limitation
 since I mostly work with linux.
 
 Still, the symptom that you describe indicates that your mail system
 and your test are either running different spamassassin programs, or
 using different config files.
 
 Are you logged in as the same user your mail system uses to do the
 scan (probably not an issue with Windows, but still a valid question)?
 
 And just out of curiosity, try this command:
 c:\perl\bin\spamassassin.bat -t  c:\esa\ham\message
 
 It may or may not take the -t argument, but if not, you can look at
 the headers in the message returned.
 
 -- 
 Bowie
 
 

-- 
View this message in context: 
http://www.nabble.com/Spam-on-Exchange-tf2239711.html#a6212649
Sent from the SpamAssassin - Users forum at Nabble.com.



Re: Spam on Exchange

2006-09-08 Thread Duane Hill
On Friday, September 8, 2006 at 4:09:55 PM, Bowie confabulated:

 Ok, I'll give you that one.  I tend to forget about that limitation
 since I mostly work with linux.

We migrated away from Windows to FreeBSD about three months ago mostly
because  of  tools  that  were  readily  available to the *nix and BSD
worlds.  I was already familiar with FreeBSD from an previous life and
forgot about FreeBSD stuff. So I know how that goes.

-- 
This message was sent using 100% recycled electrons.



Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread John D. Hardin
On Fri, 8 Sep 2006, Michael Grey wrote:

 However, there have been two occasions in the last 24 hrs where screenshots
 embedded into the emails caused false positives.
 
 One was an 'account summary' from a cell company, the other was some internal
 marketing info.
 
 Are there other approaches to getting certain images white listed if they
 contain, say, our specific company name ?

Don't run SA against internal email.

And what the heck is a cell-phone company doing sending you
screenshots?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If someone has a gun and is trying to kill you, it would be
  reasonable to shoot back with your own gun.
  -- the Dalai Lama, May 15, 2001
---
 9 days until The 219th anniversary of the signing of the U.S. Constitution



RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Michael Grey

You will have to ask the cell company about the first issue ...

In regards to the second, many large companies have outside companies do work
for them in the areas of marketing and other aspects. So this also will
happen regardless.

Let me clarify; this is an OUTSIDE relay to INSIDE...

A FuzzyOCR White List with (very privately held) keywords would help. 

Any other ideas ?



Michael Grey




-Original Message-
From: John D. Hardin [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 08, 2006 10:10 AM
To: Michael Grey
Cc: users@spamassassin.apache.org
Subject: Re: Fuzzy OCR false positives from Screenshots...

On Fri, 8 Sep 2006, Michael Grey wrote:

 However, there have been two occasions in the last 24 hrs where screenshots
 embedded into the emails caused false positives.
 
 One was an 'account summary' from a cell company, the other was some
internal
 marketing info.
 
 Are there other approaches to getting certain images white listed if they
 contain, say, our specific company name ?

Don't run SA against internal email.

And what the heck is a cell-phone company doing sending you
screenshots?

--
 John Hardin KA7OHZICQ#15735746http://www.impsec.org/~jhardin/
 [EMAIL PROTECTED]FALaholic #11174pgpk -a [EMAIL PROTECTED]
 key: 0xB8732E79 - 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  If someone has a gun and is trying to kill you, it would be
  reasonable to shoot back with your own gun.
  -- the Dalai Lama, May 15, 2001
---
 9 days until The 219th anniversary of the signing of the U.S. Constitution



Re: Quarantined Spam.

2006-09-08 Thread Vincent Li



Vincent Li
System Admin

On Fri, 8 Sep 2006, Jared wrote:


Hi,



Is there anyway I can resend the emails which have been quarantined.

as some of the emails should not have been quarantined.



I'm using plesk 7.5 reloaded with spam assassin.


I am using Amavisd-new SQL quarantine and MailZu http://www.mailzu.net, it 
just works






Thanks in advance.










Re: Which DB is actually used?

2006-09-08 Thread Logan Shaw

On Fri, 8 Sep 2006, Logan Shaw wrote:

Second, once you determine the correct user, in most cases
sa-learn should consult the same configuration file that
the learning process does, so there shouldn't be a reason to
give --dbpath.


Oops, that should have said that the scanning process does.

  - Logan


Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread Logan Shaw

On Fri, 8 Sep 2006, Michael Grey wrote:

We are testing a new configuration using FuzzyOCR, and found it to work very
well overall...

However, there have been two occasions in the last 24 hrs where screenshots
embedded into the emails caused false positives.

One was an 'account summary' from a cell company, the other was some internal
marketing info.

Are there other approaches to getting certain images white listed if they
contain, say, our specific company name ?


You could probably hack FuzzyOcr.pm pretty easily.

The basic strategy would be to create another list just like
@words, but with whitelist words instead.  You should be able
to duplicate the code where it parses config file options (look
for focr_word) and put in your own config file option, say
focr_word_whitelist.  Then at the bottom, there is a foreach
loop that iterates through @words and looks for matches.
You can just duplicate that loop and create a separate count
of whitelist words matched.  Then modify the way the score is
computed (the my $score = ...) line, and you're done.

  - Logan


RE: [Bump] No log to syslog after upgrade

2006-09-08 Thread Kurt Buff
This is news to me - I'll have to dig into the docs again, and see what I
can find.

Sigh.

I must be getting old or something.

| -Original Message-
| From: Stuart Johnston [mailto:[EMAIL PROTECTED]
| Sent: Thursday, September 07, 2006 15:27
| To: Kurt Buff; users@spamassassin.apache.org
| Subject: Re: [Bump] No log to syslog after upgrade
| 
| 
| Kurt Buff wrote:
|  I've requested an account, and am waiting for the password.
|  
|  I understand about command line tools and their use, but SA 
| is a bit of a
|  special case, as it's used as more than simply a command line tool -
|  especially when you consider its use with Amavis, etc.
| 
| amavisd-new has its own logging facilities including the 
| option to log to syslog or a separate log 
| file.  There is also an option to log debugging output from 
| SA.  You should ask on the amavis list 
| if you need more details.
| 


  



RE: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread David B Funk
On Fri, 8 Sep 2006, Michael Grey wrote:

 In regards to the second, many large companies have outside companies do work
 for them in the areas of marketing and other aspects. So this also will
 happen regardless.

 Let me clarify; this is an OUTSIDE relay to INSIDE...

 A FuzzyOCR White List with (very privately held) keywords would help.

 Any other ideas ?

Sit down and have a little hart-to-heart with your marketing people.
They may want to rethink their methods. Put the images on a web-server
and e-mail links to them.

You can hack your local mail system to not spam-tag those messages
but what about the intended potential customer recipients?
If you're tagging them then that should be an indication that other
people will too.

The world changes, sometimes due to actions of bad people. After 9/11
it became a bad idea to try to send white powder thru snail-mail.
Thanks to image-spammers it's becoming not so good an idea to send
imbedded-image e-mails (particularly if it's a marketing campain ;).

-- 
Dave Funk  University of Iowa
dbfunk (at) engineering.uiowa.eduCollege of Engineering
319/335-5751   FAX: 319/384-0549   1256 Seamans Center
Sys_admin/Postmaster/cell_adminIowa City, IA 52242-1527
#include std_disclaimer.h
Better is not better, 'standard' is better. B{


Re: Customizing RBL and SURBL lists

2006-09-08 Thread D . J .
On 9/8/06, Bowie Bailey [EMAIL PROTECTED] wrote:
D.J. wrote: Greetings all: I intend to eventually have local copies for the lists I wish to use, so it's important for me to figure this out.Currently, I'm just trying to get SA to only do checks against the lists that I want to
 eventually check.However, it still appearst to be checking all of them.I've basically taken the 20_dnsbl_tests.cf and 25_uribl.cf files, copied them to my /etc/mail/spamassassin directory, and
 commented out the lists that I don't want checked.However, as I said, it still appears to be checking things.I have a feeling this is because the original 20_dnsbl_tests.cf and 25_uribl.cf files are
 still in the /usr/share/spamassassin directory, and simply commenting them from a later file won't change matters.However, the catch is that from what I've read, I *don't* want to edit the files under
 /usr/share/spamassassin as they'll be overwritten upon upgrades, thus losing my customizations.What is the best course of action to accomplish this goal?First, delete the copied files.There is no point in having copies of
those files in /etc/mail/spamassassin.Then, add score lines to /etc/mail/spamassassin/local.cf to change thescore of the rules to 0.This will prevent them from running.For example:score RCVD_IN_SORBS_DUL 0
score RCVD_IN_NJABL_PROXY 0--BowieExcellent! This has worked flawlessly. Now for part 2 of this project, sort of related to the first part. So now I have only the zones I want to query. When I wish to move to my local servers, is it as simple as adding a new header line into my 
local.cf file for each list, or will I need to disable the old lists entirely via the 0 score method and completely rewrite the rules for my local server? Thanks for all your help!


A Note Regarding DHCP Zone

2006-09-08 Thread David Cary Hart
Based upon removal requests, we are seeing a considerable increase in
SA usage. I added some notes to our website recently that I wanted to
share on this list:

Please Note:The dhcp zone also contains some static generic hosts:

* Most of these are in mixed dynamic and static ranges. We are
white listing these immediately upon request and verification.
* Many of these have mis-configured DNS such as inconsistent
forward and reverse DNS or no A record for the host name.
* In all cases, where we have received maps from the ISP, those
will override all other considerations. We are continuing to make
progress in obtaining increasing cooperation from providers in that
regard.
* We add a dynamic range only when we received spam from the
range.

It's a balancing act.  

-- 
Our DNSRBL - Eliminate Spam at the Source: http://www.TQMcube.com
   Don't Subsidize Criminals: http://boulderpledge.org


BAYES_00

2006-09-08 Thread Michael Grey








Forgive what may be a newbie question;



If you hit on BAYES_00, does that mean explicitly that the
email has been learned as NOT SPAM ? 



If this is not the case ( or ONLY the case,) what other
conditions may cause this ? ( Presuming the DB is available / healthy etc. )




Thanks



Michael Grey














Re: BAYES_00

2006-09-08 Thread Theo Van Dinter
On Fri, Sep 08, 2006 at 12:04:12PM -0700, Michael Grey wrote:
 If you hit on BAYES_00, does that mean explicitly that the email has been
 learned as NOT SPAM ? 

The rule BAYES_00 has nothing to do with whether or not the message has
already been learned.

 If this is not the case ( or ONLY the case,) what other conditions may cause
 this ? ( Presuming the DB is available / healthy etc. )

This happens anytime the tokens found in the scanned message, that have been
previously learned (so they're in the bayes DB), indicate that the message is
very probably ham.

-- 
Randomly Generated Tagline:
 Professor: Oh, dear. She's stuck in an infinite loop and he's an idiot. 
Well, that's love for you.


pgpYjAcFpTYQS.pgp
Description: PGP signature


Re: Which DB is actually used?

2006-09-08 Thread jdow

From: Logan Shaw [EMAIL PROTECTED]


On Fri, 8 Sep 2006, Bo Mellberg wrote:
It seems like the exim-users database is being touched regularly, so I'm guessing that 
it has been set up by apt-get in some auto-learning state.


Yes, you might want to check whatever's running SpamAssassin and
see what user it's running as and also check the configuration
files (probably in /etc/mail/spamassassin) to see where it's
storing the database.

I have earlier trained spam and ham as user bosse, which is why there is a working db 
there as well.


As I am the only user on my system, it really doesn't matter if I use site-wide or not, 
but rather how I invoke sa-learn.


Lets say I remove the databases for bosse and root. Is this the proper  way to 
invoke sa-learn:


1. Log on as user bosse
2. sa-learn --showdots --sync --dbpath /var/spool/exim4/.spamassassin --spam 
/home/bosse/Maildir/.MissedSpam/cur


Probably not, or at least not the best way.


Absolutely not. The database under bosse is quite apparently not
being used except for his misplaced training. He needs to su -l exim4
and then run sa-learn.

(Were it me I'd rip out amavisd-new and put in something that
(IMAO {^,-}) works like procmail. I'd not sure I'd use Exim, either,
unless it can explicitly run spamc as the user bosse. At the VERY
least I'd read whatever manual existed for amavisd-new and Exim such
that spamc -u bosse will work and have spamd access the bosse
database. Of course, if spamd is running in a sandbox it can't make
that reach without some skullduggery. So the entire installation needs
to be examined and manipulated so that per user BAYES can work fer
shure. That's a LOT of RTFM and examine your system configuration,
to be sure. But learning only hurts a little and having learned is
a nice feeling.)


First of all, you need to run sa-learn as the same user that
runs the filtering.  Since you haven't said what user that it
is (whether it's bosse or some other user), it's impossible
to say whether that's the correct user to run sa-learn as.


Exactly - and he's not doing that.

If I set up a cron job to do the above I could just toss missed spam into the 
MissedSpam-folder right?


Yeah, but for efficiency reasons, you'd probably not want
messages in that folder to keep accumulating forever, so you'd
probably want a way to purge them after some period of time.
sa-learn can cope with a situation where you feed it the same
message repeatedly with no harm, but it's still a waste of
CPU cycles.


I have ham, spam, oldham, and oldspam entries for my learning process
done via IMAP folders. Once a night spam is learned. When the spam or
ham folder gets more than say a dozen entries I move them over to oldspam
or oldham respectively. That way I keep my old learn database around so
I can rebuild it. Of course, I manually train ONLY. There's none of that
silly autolearn happening here. It's too prone to going off in wild
strange new directions orthoganal (at the very least) to good sense.
Again, YMMV and IMAO liberally apply to the above statements.

{^_^} 



Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread jdow

It MAY be that FuzzyOcr needs to become aware of whitelists. But then,
if you use a whitelist entry for the cell account summaries the FuzzyOcr
scores are basically meaningless.

{^_^}
- Original Message - 
From: Michael Grey [EMAIL PROTECTED]

To: users@spamassassin.apache.org
Sent: Friday, September 08, 2006 09:40
Subject: Fuzzy OCR false positives from Screenshots...


We are testing a new configuration using FuzzyOCR, and found it to work very
well overall... 




However, there have been two occasions in the last 24 hrs where screenshots
embedded into the emails caused false positives.



One was an 'account summary' from a cell company, the other was some internal
marketing info.



Are there other approaches to getting certain images white listed if they
contain, say, our specific company name ?



Any other ideas on how to deal with this ?





Many thanks !





Michael Grey








RE: Customizing RBL and SURBL lists

2006-09-08 Thread Bowie Bailey
D.J. wrote:
 On 9/8/06, Bowie Bailey [EMAIL PROTECTED] wrote:
  D.J. wrote:
   Greetings all:
   
   I intend to eventually have local copies for the lists I wish to
   use, so it's important for me to figure this out.  Currently, I'm
   just trying to get SA to only do checks against the lists that I
   want to eventually check.  However, it still appearst to be
   checking all of them.  I've basically taken the 20_dnsbl_tests.cf
   and 25_uribl.cf files, copied them to my /etc/mail/spamassassin
   directory, and commented out the lists that I don't want checked. 
   However, as I said, it still appears to be checking things.  I have
   a feeling this is because the original 20_dnsbl_tests.cf and
   25_uribl.cf files are still in the /usr/share/spamassassin
   directory, and simply commenting them from a later file won't
   change matters.  However, the catch is that from what I've read, I
   *don't* want to edit the files under /usr/share/spamassassin as
   they'll be overwritten upon upgrades, thus losing my
   customizations.  What is the best course of action to accomplish
   this goal? 
  
  First, delete the copied files.  There is no point in having copies of
  those files in /etc/mail/spamassassin.
  
  Then, add score lines to /etc/mail/spamassassin/local.cf to change the
  score of the rules to 0.  This will prevent them from running.
  
  For example:
  
  score RCVD_IN_SORBS_DUL 0
  score RCVD_IN_NJABL_PROXY 0
 
 Excellent!  This has worked flawlessly.  Now for part 2 of this
 project, sort of related to the first part.  So now I have only the
 zones I want to query.  When I wish to move to my local servers, is
 it as simple as adding a new header line into my local.cf file for
 each list, or will I need to disable the old lists entirely via the 0
 score method and completely rewrite the rules for my local server? 
 Thanks for all your help!  

You could just change the header line, but I would suggest rewriting
the rules to avoid confusion.

1) Leave the current rule with a zero score
2) Copy the header, describe, tflags, and score settings for the rule
   from /usr/share/spamassassin to local.cf
3) Change the name and description of the rule to indicate that it is
   using your servers.
4) Change the server name so that it uses your server.
5) Give it the same or different scores than the original

Giving the rule a new name will avoid confusion in the future as it
will make it obvious that this is a different rule.  The simplest
thing would be to add a short prefix or postfix to the current
rulename (RCVD_IN_SORBS_DUL_DJ, for example).

-- 
Bowie


Re: Fuzzy OCR false positives from Screenshots...

2006-09-08 Thread jdow

From: David B Funk [EMAIL PROTECTED]


On Fri, 8 Sep 2006, Michael Grey wrote:


In regards to the second, many large companies have outside companies do work
for them in the areas of marketing and other aspects. So this also will
happen regardless.

Let me clarify; this is an OUTSIDE relay to INSIDE...

A FuzzyOCR White List with (very privately held) keywords would help.

Any other ideas ?


Sit down and have a little hart-to-heart with your marketing people.
They may want to rethink their methods. Put the images on a web-server
and e-mail links to them.

You can hack your local mail system to not spam-tag those messages
but what about the intended potential customer recipients?
If you're tagging them then that should be an indication that other
people will too.

The world changes, sometimes due to actions of bad people. After 9/11
it became a bad idea to try to send white powder thru snail-mail.
Thanks to image-spammers it's becoming not so good an idea to send
imbedded-image e-mails (particularly if it's a marketing campain ;).


Marketing campaigns via email are deadly these days regardless of
embedded whazzits or whozzits. I don't CARE if I have any prior
relationship with company fubar. If they send me marketing it goes
to the spam bucket. And it is liable to be used for anti-spam training
if the BAYES score was too low.

Sending an account summary as an email image is perhaps one of the
stupidest ideas any marketdroid has concocted. (If it is part of a
marketing plan this raises sincere concerns about the cell company's
privacy policy, too.)

{^_^}



Re: Customizing RBL and SURBL lists

2006-09-08 Thread D . J .
 Excellent!This has worked flawlessly.Now for part 2 of this project, sort of related to the first part.So now I have only the
 zones I want to query.When I wish to move to my local servers, is it as simple as adding a new header line into my local.cf file for each list, or will I need to disable the old lists entirely via the 0
 score method and completely rewrite the rules for my local server? Thanks for all your help!You could just change the header line, but I would suggest rewritingthe rules to avoid confusion.
1) Leave the current rule with a zero score2) Copy the header, describe, tflags, and score settings for the rule from /usr/share/spamassassin to local.cf3) Change the name and description of the rule to indicate that it is
 using your servers.4) Change the server name so that it uses your server.5) Give it the same or different scores than the originalGiving the rule a new name will avoid confusion in the future as it
will make it obvious that this is a different rule.The simplestthing would be to add a short prefix or postfix to the currentrulename (RCVD_IN_SORBS_DUL_DJ, for example).--Bowie
Thanks very much! This is the direction I was leaning (although without the pre/postfix, which is an excellent idea) from the start, but wanted to have a second opinion from the community as to the best way :-)



Marking Mail in the future as SPAM?

2006-09-08 Thread robert
Hi,

Are there any rules right now that pertain to mail with bogus dates?

A common trick is to use a date in the future in order to ensure the message
is noticed at the very top of the list.

Is there anything in place that can calculate based on your location where is in
the future (relative to the rest of the world?)

so for instance where I am located it must be possible to calculate the most in
future something could be datestamped because it's in a different timezone. but
anything else I want marked as SPAM





This message was sent using IMP, the Internet Messaging Program.



Re: Marking Mail in the future as SPAM?

2006-09-08 Thread Theo Van Dinter
On Fri, Sep 08, 2006 at 03:27:21PM -0500, [EMAIL PROTECTED] wrote:
 Are there any rules right now that pertain to mail with bogus dates?

Yes.

 Is there anything in place that can calculate based on your location where is 
 in
 the future (relative to the rest of the world?)

DATE_IN_FUTURE_* ?

-- 
Randomly Generated Tagline:
 I gotta be sure this isn't another scientific fraud like global warming
 or second-hand smoke. -Mayor 


pgp1RkWcsWPwn.pgp
Description: PGP signature


Re: Marking Mail in the future as SPAM?

2006-09-08 Thread robert
Turns out Horde's IMP allows me to sort by Arrival Date to that will do what I
want. Doing what I describe below would be when you don't want to rely on the
client but manipulate the message on it's way in and rewrite it's Date: header
since at the very least a Mail client will allow you to sort by Date.

Quoting [EMAIL PROTECTED]:

 When using a web client like IMP from Horde it seems the Date header is kept
 in the original format and never converted to my local timezone. I figure
 that
 if I converted the Date to my local timezone I would have people leaving
 messages in the future that always sit at the top of my Inbox. For instance
 it's still the 8th here in CDT but elsewhere it's the 9th and those messages
 now sit at the top of the list of messages to be read.
 
 Is there anybody here who chooses to convert Date: to their local timezone
 and
 to store the original date in say X-Original-Date: or something in order to
 ensure that you have a last in first seen approach to managing your email?
 
 So youd record Date: in terms of Date Received in your local timezone not
 just a
 local time zone conversion of the original date.
 
  
  
 
 
 
 This message was sent using IMP, the Internet Messaging Program.
 
 





This message was sent using IMP, the Internet Messaging Program.



Re: Marking Mail in the future as SPAM?

2006-09-08 Thread jdow

From: [EMAIL PROTECTED]


When using a web client like IMP from Horde it seems the Date header is kept
in the original format and never converted to my local timezone. I figure that
if I converted the Date to my local timezone I would have people leaving
messages in the future that always sit at the top of my Inbox. For instance
it's still the 8th here in CDT but elsewhere it's the 9th and those messages
now sit at the top of the list of messages to be read.

Is there anybody here who chooses to convert Date: to their local timezone and
to store the original date in say X-Original-Date: or something in order to
ensure that you have a last in first seen approach to managing your email?

So youd record Date: in terms of Date Received in your local timezone not just a
local time zone conversion of the original date.


There is an ancient observation that goes something like this, If
kicking that brick wall hurts your foot, stop kicking the wall. If
IMP has a fault THAT dramatic what other faults lurk below its hood?

There is no way to make the conversion in SpamAssassin. There are
ways within the standard C libraries that IMP could incorporate if
its developers either had the time or the brains to do so. You might
be able to gen up a futility that would allow a tool like procmail
to perform that modification. But it might eat more of your time
than finding a higher quality web mail wazzit.

{o.o}   - Not known to be charitable about web mail, so YMMV.


Re: LOG: Re: Marking Mail in the future as SPAM?

2006-09-08 Thread Robert Nicholson
I've used Horde/IMP for ever but is there a commonly regarded better quality web based IMAP client then?On Sep 8, 2006, at 5:50 PM, jdow wrote:    Accepting to folder lists/unix/spamassassin-usersFrom: "jdow" [EMAIL PROTECTED]Date: September 8, 2006 5:50:36 PM CDTTo: users@spamassassin.apache.orgSubject: Re: Marking Mail in the future as SPAM?From: [EMAIL PROTECTED] When using a web client like IMP from Horde it seems the Date header is keptin the original format and never converted to my local timezone. I figure thatif I converted the Date to my local timezone I would have people leavingmessages in the future that always sit at the top of my Inbox. For instanceit's still the 8th here in CDT but elsewhere it's the 9th and those messagesnow sit at the top of the list of messages to be read.Is there anybody here who chooses to convert Date: to their local timezone andto store the original date in say X-Original-Date: or something in order toensure that you have a last in first seen approach to managing your email?So youd record Date: in terms of Date Received in your local timezone not just alocal time zone conversion of the original date. There is an ancient observation that goes something like this, "Ifkicking that brick wall hurts your foot, stop kicking the wall." IfIMP has a fault THAT dramatic what other faults lurk below its hood?There is no way to make the conversion in SpamAssassin. There areways within the standard C libraries that IMP could incorporate ifits developers either had the time or the brains to do so. You mightbe able to gen up a futility that would allow a tool like procmailto perform that modification. But it might eat more of your timethan finding a higher quality web mail wazzit.{o.o}   - Not known to be charitable about "web mail", so YMMV. 

Re: LOG: Re: Marking Mail in the future as SPAM?

2006-09-08 Thread jdow

Take to heart my comment about web mail clients. At today's prices for
bits on disks IMAO Web Mail clients are not worth the price of their
storage for the executable file.

Even Outlook Express, which I use, is faster and sorts by either
received date or sent date apparently with correct regard for time
zone for the sending time. And I'm not one for kicking stone walls
about slow email presentation tools when I can use something that
reads from its cache on disk.

{^_^}
- Original Message - 
From: Robert Nicholson [EMAIL PROTECTED]



I've used Horde/IMP for ever but is there a commonly regarded better  
quality web based IMAP client then?


On Sep 8, 2006, at 5:50 PM, jdow wrote:


Accepting to folder lists/unix/spamassassin-users

From: jdow [EMAIL PROTECTED]

From: [EMAIL PROTECTED]

When using a web client like IMP from Horde it seems the Date  
header is kept
in the original format and never converted to my local timezone. I  
figure that
if I converted the Date to my local timezone I would have people  
leaving
messages in the future that always sit at the top of my Inbox. For  
instance
it's still the 8th here in CDT but elsewhere it's the 9th and  
those messages

now sit at the top of the list of messages to be read.
Is there anybody here who chooses to convert Date: to their local  
timezone and
to store the original date in say X-Original-Date: or something in  
order to
ensure that you have a last in first seen approach to managing  
your email?
So youd record Date: in terms of Date Received in your local  
timezone not just a

local time zone conversion of the original date.


There is an ancient observation that goes something like this, If
kicking that brick wall hurts your foot, stop kicking the wall. If
IMP has a fault THAT dramatic what other faults lurk below its hood?

There is no way to make the conversion in SpamAssassin. There are
ways within the standard C libraries that IMP could incorporate if
its developers either had the time or the brains to do so. You might
be able to gen up a futility that would allow a tool like procmail
to perform that modification. But it might eat more of your time
than finding a higher quality web mail wazzit.

{o.o}   - Not known to be charitable about web mail, so YMMV.







pyzor: check failed: internal error

2006-09-08 Thread John Thompson
I just updated SA from v3.1.4 to v3.1.5 and have started seeing this 
error (pyzor: check failed: internal error) in my maillog. I rebuilt 
pyzor and the message continues to appear. Is this significant? Running 
on FreeBSD-5.2 with SA/pyzor/etc. built from the ports collection.

-- 

John ([EMAIL PROTECTED])



OT: Webmail (was Re: LOG: Re: Marking Mail in the future as SPAM?)

2006-09-08 Thread Kelson

jdow wrote:

Take to heart my comment about web mail clients. At today's prices for
bits on disks IMAO Web Mail clients are not worth the price of their
storage for the executable file.

Even Outlook Express, which I use, is faster and sorts by either
received date or sent date apparently with correct regard for time
zone for the sending time. And I'm not one for kicking stone walls
about slow email presentation tools when I can use something that
reads from its cache on disk.


For most people who use it, the appeal of web-based email is not that it 
does email better or more capably than a native mail client (because 
generally speaking, it doesn't), but that it does email more 
*conveniently*.  Zero install, minimal configuration, virtually infinite 
portability, and you can let someone else worry about your backups.


--
Kelson Vibber
SpeedGate Communications www.speed.net


.spamassin folder not created after bugfix #4932

2006-09-08 Thread Jo for Groups and Lists
Greetings! I am hoping someone can confirm if this is a bug or
desired behaviour? Or if something else is causing it to go astray?

Running SA 3.13
SPAMDOPTIONS=-d -c -m5 -H

Runs on a per user basis - i.e. users who want to use SA have this
code placed in 
  /home/user/domain-mail/.rc.local.init
--
DROPPRIVS=yes
:0fw
*  256000
| spamc
--

I am having an issue since upgrading from 2.64 to 3.13. I believe I
have narrowed down to this:
SpamAssassin.pm - sub get_and_create_userstate_dir

  # bug 4932: use the last default_userstate_dir entry if none of
the others 
  # already exist 
  $fname ||= $self-sed_path($default_userstate_dir[-1]); 

Below is an excerpt from maillog. The top part shows a message
successfully going thru for user 'dilucy' who has had SA installed
previously. The bottom portion shows a message going thru for user
'council' - a new SA user. In theory this message encountered by SA
should trigger the creation of council's /.spamassassin/ folder and
user_prefs file. This was not happening. Via testing I found that
the user_prefs would be copied over if I manually created the
/.spamassassin/ folder first. Hence the reason for the No such file
or directory error means the .spamassassin folder is inexistant...
Not getting created.

In the logs below you will note that I added some extra warnings to
assist my troublehooting.

What I found is that it is $fname in this sub is being assigned the
file for 'dilucy', the last user instead of the current user. Then
the test   if (!-d $fname)   appears positive and so it skips the
mkpath command. In then returns to the other sub and resumes $fname
value for user 'council', trying to copy the user_prefs in a non
existant folder. I also found that if I hammered off several email
messages addressed to the 'council' account such that 2 messages in
a row handled by SA were both for user 'council', the second message
will correctly cause the creation of both the /.spamassassin/ folder
and the user_prefs file. This is because presumably, the last user
in this case ('council') was coincidentally the same as the current
user ('council').

-
 
Sep 8 17:16:17 host spamd[1482]: spamd: connection from localhost
[127.0.0.1] at port 46608 
Sep 8 17:16:17 host spamd[1482]: spamd: setuid to dilucy succeeded 
Sep 8 17:16:17 host spamd[1482]: spamd: processing message
[EMAIL PROTECTED] for dilucy:665

Sep 8 17:16:20 host spamd[1482]: spamd: identified spam (19.7/4.0)
for dilucy:665 in 3.2 seconds, 1540 bytes. 
Sep 8 17:16:20 host spamd[1482]: spamd: result: Y 19 -
MSGID_FROM_MTA_ID,RCVD_IN_BL_SPAMCOP_NET,RCVD_IN_NJABL_DUL,RCVD_IN_S
ORBS_DUL,RCVD_IN_XBL,URIBL_JP_SURBL,URIBL_OB_SURBL,URIBL_SBL,URIBL_S
C_SURBL
scantime=3.2,size=1540,user=dilucy,uid=665,required_score=4.0,rhost=
localhost,raddr=127.0.0.1,rport=46608,mid=200609082116.k88LGFn90020
[EMAIL PROTECTED],autolearn=disabled 


Sep 8 17:16:23 host spamd[1482]: spamd: connection from localhost
[127.0.0.1] at port 46638 
Sep 8 17:16:23 host spamd[1482]: spamd: setuid to council succeeded 
Sep 8 17:16:23 host spamd[1482]: spamd: creating default_prefs:
/home/council/.spamassassin/user_prefs 
Sep 8 17:16:23 host spamd[1482]: config:
/home/council/.spamassassin/user_prefs is not a file 
Sep 8 17:16:23 host spamd[1482]: in sub_get_and_create_userstate_dir

Sep 8 17:16:23 host spamd[1482]: config: fname is
/home/dilucy/.spamassassin 
Sep 8 17:16:23 host spamd[1482]: config: /home/dilucy/.spamassassin
is a directory 
Sep 8 17:16:23 host spamd[1482]: config: cannot write to prefs file
/home/council/.spamassassin/user_prefs: No such file or directory 
Sep 8 17:16:23 host spamd[1482]: spamd: failed to create readable
default_prefs: /home/council/.spamassassin/user_prefs 
Sep 8 17:16:23 host spamd[1482]: spamd: processing message
[EMAIL PROTECTED] for
council:855 
Sep 8 17:16:23 host spamd[1482]: spamd: clean message (0.5/5.0) for
council:855 in 0.1 seconds, 955 bytes. 
Sep 8 17:16:23 host spamd[1482]: spamd: result: . 0 -
NO_REAL_NAME,NO_RELAYS
scantime=0.1,size=955,user=council,uid=855,required_score=5.0,rhost=
localhost,raddr=127.0.0.1,rport=46638,mid=200609082116.k88LGMMw0020
[EMAIL PROTECTED],autolearn=disabled
- 
I notice often several messages are handled by the same pid [1482]
even if different users? Could this be contributing to this issue,
and if so, what could you suggest to change to avoid that from
happening?


Many thanks in advance,
Jo



Re: BUG? sa-learn --ham vs spamassassin -r different results

2006-09-08 Thread Matt Kettler
Michael Scheidell wrote:
 Matt Kettler wrote:


 Further, spamassassin -r and sa-learn --spam learn differently, give
 different results:
   
 
 By any chance was the message used scanned by SA already?

 I'm wondering if it's a bug where spamassassin -r is stripping markups,
 but sa-learn is not.

   
 no, see my test methodology
 I erased the file and created a blank one.
First, sorry for the delay, my internet service got knocked out.

Second, You mis-understood my question. I know you erased your bayes DB.

 I was asking if the message had already been scanned, and thus marked,
by SA. I'm asking a question about the text-content of the message, not
about the contents of your bayes DB.


Re: Marking Mail in the future as SPAM?

2006-09-08 Thread Stuart Johnston

[EMAIL PROTECTED] wrote:

When using a web client like IMP from Horde it seems the Date header is kept
in the original format and never converted to my local timezone. I figure that
if I converted the Date to my local timezone I would have people leaving
messages in the future that always sit at the top of my Inbox. For instance
it's still the 8th here in CDT but elsewhere it's the 9th and those messages
now sit at the top of the list of messages to be read.


The IMP4 install on my server (which I don't generally use) does convert 
dates to local timezone.  However, sorting by arrival is the only 
sensible default sort for an Inbox.  Now, ascending vs. descending is a 
different matter.


Re: SPF Scores

2006-09-08 Thread Matt Kettler
Michel Vaillancourt wrote:
   I set up SPF for Wolfstar.ca yesterday, and I've been reading a bit off 
 the website about SPF itself.  WRT to SA, I'm interested in knowing if folks 
 have adjusted their stock SPF scores or if they've done some custom rules 
 to lever this technology?
   
I make use of the SPF features of SA, but I've not adjusted any of the
scores.

Why, or to what end, would you want to adjust the scores?


Re: Marking Mail in the future as SPAM?

2006-09-08 Thread John Rudd


On Sep 8, 2006, at 5:59 PM, Stuart Johnston wrote:


[EMAIL PROTECTED] wrote:
When using a web client like IMP from Horde it seems the Date header 
is kept
in the original format and never converted to my local timezone. I 
figure that
if I converted the Date to my local timezone I would have people 
leaving
messages in the future that always sit at the top of my Inbox. For 
instance
it's still the 8th here in CDT but elsewhere it's the 9th and those 
messages

now sit at the top of the list of messages to be read.


The IMP4 install on my server (which I don't generally use) does 
convert dates to local timezone.  However, sorting by arrival is the 
only sensible default sort for an Inbox.  Now, ascending vs. 
descending is a different matter.


I sort by position in the mail folder, regardless of any date stamps.  
But I don't know of any webmail clients that do something that 
sensible.  I generally stick to using IMAP clients that support that 
feature (such as Apple Mail).




Re: pyzor: check failed: internal error

2006-09-08 Thread Daryl C. W. O'Shea

John Thompson wrote:
I just updated SA from v3.1.4 to v3.1.5 and have started seeing this 
error (pyzor: check failed: internal error) in my maillog. I rebuilt 
pyzor and the message continues to appear. Is this significant? Running 
on FreeBSD-5.2 with SA/pyzor/etc. built from the ports collection.


Quoting a mail I sent in January...


On 03/01/2006 5:19 AM, Chris Purves wrote:

[EMAIL PROTECTED] wrote:



I'm getting the errormessage below;
Who can help ?
Wolfgang

Jan  2 09:25:58 saxophon spamd[13330]: spamd: connection from localhost 
[127.0.0.1] at port 40156
Jan  2 09:25:58 saxophon spamd[13330]: spamd: checking message [EMAIL 
PROTECTED] for exim:502
Jan  2 09:26:00 saxophon spamd[13330]: internal error
Jan  2 09:26:00 saxophon spamd[13330]: pyzor: check failed: internal error



I get this a lot, and I've posted myself about this, but so far no help. 


This was actually discussed numerous times during the fall.


 Am I correct to assume that it works sometimes and fails others?


See http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4580 for a discussion 
(and available Pyzor patches) about the problem with Pyzor that SpamAssassin is 
now reporting instead of ignoring.


Daryl 







Re: Customizing RBL and SURBL lists

2006-09-08 Thread Daryl C. W. O'Shea

Bowie Bailey wrote:

D.J. wrote:



Excellent!  This has worked flawlessly.  Now for part 2 of this
project, sort of related to the first part.  So now I have only the
zones I want to query.  When I wish to move to my local servers, is
it as simple as adding a new header line into my local.cf file for
each list, or will I need to disable the old lists entirely via the 0
score method and completely rewrite the rules for my local server? 
Thanks for all your help!  


You could just change the header line, but I would suggest rewriting
the rules to avoid confusion.


Why go to all the trouble of rewriting/editing rules when it'd be a lot 
easier to maintain by just delegating the appropriate zones to your own 
DNSBL server?


Daryl


Re: Marking Mail in the future as SPAM?

2006-09-08 Thread Stuart Johnston

John Rudd wrote:


On Sep 8, 2006, at 5:59 PM, Stuart Johnston wrote:


[EMAIL PROTECTED] wrote:
When using a web client like IMP from Horde it seems the Date header 
is kept
in the original format and never converted to my local timezone. I 
figure that

if I converted the Date to my local timezone I would have people leaving
messages in the future that always sit at the top of my Inbox. For 
instance
it's still the 8th here in CDT but elsewhere it's the 9th and those 
messages

now sit at the top of the list of messages to be read.


The IMP4 install on my server (which I don't generally use) does 
convert dates to local timezone.  However, sorting by arrival is the 
only sensible default sort for an Inbox.  Now, ascending vs. 
descending is a different matter.


I sort by position in the mail folder, regardless of any date stamps.  
But I don't know of any webmail clients that do something that 
sensible.  I generally stick to using IMAP clients that support that 
feature (such as Apple Mail).


Hmm.  All of the webmail apps I use do: IMP4, Hastymail, CGP.  Sorting 
by arrival generally means the same as by folder position.


Re: Marking Mail in the future as SPAM?

2006-09-08 Thread John Rudd


On Sep 8, 2006, at 9:17 PM, Stuart Johnston wrote:


John Rudd wrote:

On Sep 8, 2006, at 5:59 PM, Stuart Johnston wrote:

[EMAIL PROTECTED] wrote:
When using a web client like IMP from Horde it seems the Date 
header is kept
in the original format and never converted to my local timezone. I 
figure that
if I converted the Date to my local timezone I would have people 
leaving
messages in the future that always sit at the top of my Inbox. For 
instance
it's still the 8th here in CDT but elsewhere it's the 9th and those 
messages

now sit at the top of the list of messages to be read.


The IMP4 install on my server (which I don't generally use) does 
convert dates to local timezone.  However, sorting by arrival is the 
only sensible default sort for an Inbox.  Now, ascending vs. 
descending is a different matter.
I sort by position in the mail folder, regardless of any date stamps. 
 But I don't know of any webmail clients that do something that 
sensible.  I generally stick to using IMAP clients that support that 
feature (such as Apple Mail).


Hmm.  All of the webmail apps I use do: IMP4, Hastymail, CGP.  Sorting 
by arrival generally means the same as by folder position.




IIRC, CGP's arrival date doesn't get updated when you move a message 
between folders (as, if I'm remembering correctly, it's the date/time 
the message was delivered, not the date/time the message was put into 
the folder), so if you end up moving messages around a lot, the arrival 
dates aren't in the same order as the order-in-the-folder.


CGP is the one I'm most familiar with (as that's what I use/run), but I 
don't actually frequently use the webmail interface for webmail; just 
for settings and rules.  So, I could be wrong about what arrival date 
means in CGP.