New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread darxus
While I still plan for this to primarily be used via rsync and a
spamassassin plugin, I've loaded the data into DNS records and created
spamassassin rules so it can easily be tested now.  It's updating
automatically once a day.

I'm hoping this will encourage people to contribute data.  Because now you
should get an immediate improvement in your spam filtration, based on data
you've provided on what IPs send you ham and spam.  

More info, including the script to submit data (either from spam/ham
folders, or individual emails piped to standard input) here:
http://www.chaosreigns.com/iprep/

The spamassassin rules:


ifplugin Mail::SpamAssassin::Plugin::DNSEval
header  __RCVD_IN_IPREP   eval:check_rbl('iprep-firsttrusted', 
'iprep.chaosreigns.com.')
tflags  __RCVD_IN_IPREP   nice net

header   RCVD_IN_IPREPDNS_100   eval:check_rbl_sub('iprep-firsttrusted', 
'127.\d+.\d+.100')
describe RCVD_IN_IPREPDNS_100   Sender listed at 
http://www.chaosreigns.com/iprep/, 100% ham
tflags   RCVD_IN_IPREPDNS_100   nice net

header   RCVD_IN_IPREPDNS_50eval:check_rbl_sub('iprep-firsttrusted', 
'127.\d+.\d+.50')
describe RCVD_IN_IPREPDNS_50Sender listed at 
http://www.chaosreigns.com/iprep/, 50% ham
tflags   RCVD_IN_IPREPDNS_50nice net

header   RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', 
'127.\d+.\d+.0')
describe RCVD_IN_IPREPDNS_0 Sender listed at 
http://www.chaosreigns.com/iprep/, 0% ham
tflags   RCVD_IN_IPREPDNS_0 net

meta RCVD_NOT_IN_IPREPDNS   ( ! RCVD_IN_IPREPDNS_100  ! 
RCVD_IN_IPREPDNS_50  ! RCVD_IN_IPREPDNS_0  ! NO_RELAYS )
describe RCVD_NOT_IN_IPREPDNS   Sender not listed at 
http://www.chaosreigns.com/iprep/
tflags   RCVD_NOT_IN_IPREPDNS   net

score RCVD_IN_IPREPDNS_100 -0.1
score RCVD_IN_IPREPDNS_50  -0.0001
score RCVD_IN_IPREPDNS_00.1
score RCVD_NOT_IN_IPREPDNS  0.0001
endif



For people not contributing data, this is not likely to be useful yet.

Out of the 86,899 IPs I have data for, all but 38 are either 100% spam or
100% ham, so a great predictor of what the next email from known IPs will
be.  This is why blacklists and whitelists, including spamassassin's AWL
(which is another combination of both) are nothing new.  

The advantages I'm providing over SA's AWL are:
1) It's based on human verified ham and spam, not SA's previous opinions of
   emails.
2) Shared knowledge from other people's email.

What I hope to be an advantage over dnswl.org, which I've been involved in,
is increased automation.


Here's a test I ran using only the last 500 of my own emails.  All hand
categorized as spam or ham, and sorted by received data.  One by one it
learns the IP as a ham source, spammer, or mix, and using what it has
learned, guesses what the next email is.  Every 100 emails it reports its
success rate for the last 100 emails:

$ ./progress.pl
Rank 100, hit 51.7647058823529% of ham, hit 0% of spam.
Rank 50, hit 0% of ham, hit 0% of spam.
Rank 0, hit 0% of ham, hit 0% of spam.
Rank none, hit 48.2352941176471% of ham, hit 100% of spam.

Rank 100, hit 76% of ham, hit 0% of spam.
Rank 50, hit 0% of ham, hit 0% of spam.
Rank 0, hit 0% of ham, hit 28% of spam.
Rank none, hit 24% of ham, hit 72% of spam.

Rank 100, hit 72.3684210526316% of ham, hit 0% of spam.
Rank 50, hit 0% of ham, hit 0% of spam.
Rank 0, hit 0% of ham, hit 4.17% of spam.
Rank none, hit 27.6315789473684% of ham, hit 95.8% of spam.

Rank 100, hit 79.4520547945205% of ham, hit 0% of spam.
Rank 50, hit 0% of ham, hit 0% of spam.
Rank 0, hit 0% of ham, hit 48.1481481481481% of spam.
Rank none, hit 20.5479452054795% of ham, hit 51.8518518518519% of spam.

Rank 100, hit 79.2682926829268% of ham, hit 0% of spam.
Rank 50, hit 0% of ham, hit 0% of spam.
Rank 0, hit 0% of ham, hit 27.8% of spam.
Rank none, hit 20.7317073170732% of ham, hit 72.2% of spam.


So after 400 emails, RCVD_IN_IPREPDNS_100 is hitting 79% of ham and no
spam.  I don't think anything else spamassassin uses can do this well.

But I have data from 184,335 emails.  Using all that data, results for
the last 10,000 emails were:

Rank 100, hit 94.1176470588235% of ham, hit 0.0101553772722657% of spam.
Rank 50, hit 1.30718954248366% of ham, hit 0.0101553772722657% of spam.
Rank 0, hit 0% of ham, hit 64.2022951152635% of spam.
Rank none, hit 4.57516339869281% of ham, hit 35.7773941301919% of spam.

RCVD_IN_IPREPDNS_100 hits 94% of ham, and 0.01% of spam.
RCVD_IN_IPREPDNS_0 hits 64% of spam and no ham.  Again, I don't think
anything else spamassassin uses can do this well.  

But results this good can only be expected for people contributing data.
At least until we get more people contributing data.

-- 
The price of freedom is the willingness to do sudden battle, anywhere,
at any time, and with utter recklessness. - Robert A. Heinlein
http://www.ChaosReigns.com


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread Mark Martinec
 eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe

Do not forget to backslash-quote dots in a regular expression
if you mean a literal dot instead of 'any character'.

  Mark


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread Michael Scheidell

On 4/1/11 2:34 PM, dar...@chaosreigns.com wrote:

header   RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', 
'127.\d+.\d+.0')
describe RCVD_IN_IPREPDNS_0 Sender listed 
athttp://www.chaosreigns.com/iprep/, 0% ham
tflags   RCVD_IN_IPREPDNS_0 net


might actually need a quantity qualifier.

(if this ip is 0 % ham... does that actually mean it is 100% spam?)

or does that mean that I (so far) only saw one email hit it, and it is spam?

other than this is marking 'spam rates' and DCC commercial does the same 
thing for 'bulk' rates,  what is the difference between this and DCC?


note: dcc uses (for large installs) a local, VLDB that they 'sync' 
(flood they call it) in real time.  but it not only tells you the bulk 
rate of the sender's ip, but the 'bulk hit rate' for the email you just got.


sounds similar, but bulk vs spam.

(and its inverse.. you collect percentages of HAM.  the collect 
percentages of BULK).


maybe 2nd or 3rd octet could contain 'confidence factor'.. eg:

some sliding scale of how many actual emails you have seen?



--
Michael Scheidell, CTO
o: 561-999-5000
d: 561-948-2259
ISN: 1259*1300
*| *SECNAP Network Security Corporation

   * Best Intrusion Prevention Product, Networks Product Guide
   * Certified SNORT Integrator
   * Hot Company Award, World Executive Alliance
   * Best in Email Security, 2010 Network Products Guide
   * King of Spam Filters, SC Magazine

__
This email has been scanned and certified safe by SpammerTrap(r). 
For Information please see http://www.secnap.com/products/spammertrap/
__  


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread darxus
On 04/01, Mark Martinec wrote:
  eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe
 
 Do not forget to backslash-quote dots in a regular expression
 if you mean a literal dot instead of 'any character'.

Eep.  That was copied from existing rules.  I believe you're right, and
there are a bunch of rules that need more escaping.  Thanks.

-- 
Will I ever learn? I hope not, I'm having too much fun.
- Brent Minime Avis, motorcycle.com
http://www.ChaosReigns.com


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread darxus
On 04/01, Michael Scheidell wrote:
 On 4/1/11 2:34 PM, dar...@chaosreigns.com wrote:
 header   RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', 
 '127.\d+.\d+.0')
 describe RCVD_IN_IPREPDNS_0 Sender listed 
 athttp://www.chaosreigns.com/iprep/, 0% ham
 tflags   RCVD_IN_IPREPDNS_0 net
 
 might actually need a quantity qualifier.
 
 (if this ip is 0 % ham... does that actually mean it is 100% spam?)
 
 or does that mean that I (so far) only saw one email hit it, and it is spam?

It means that all of the email seen from that IP so far has been spam.
Which may only have been one email.

 other than this is marking 'spam rates' and DCC commercial does the
 same thing for 'bulk' rates,  what is the difference between this
 and DCC?

The commercial part.  

 maybe 2nd or 3rd octet could contain 'confidence factor'.. eg:

It does, actually.  A logarithm of the count of emails seen from that IP
(newer emails weighted more than old emails, and scaled up so small old
counts are greater than 0).

I haven't studied data enough to figure out what threshold is best for
what, and I don't think the existing rule definition language provides
a good way to specify a range.

Also, ignoring it is working quite well.

-- 
I refuse to tip toe through life only to arrive safely at death.
http://www.ChaosReigns.com


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread Mark Martinec
  Do not forget to backslash-quote dots in a regular expression
  if you mean a literal dot instead of 'any character'.
 
 Eep.  That was copied from existing rules.  I believe you're right, and
 there are a bunch of rules that need more escaping.  Thanks.

True, there is a bunch of rules that need more escaping.
It is noted somewhere in the bug tracking (but not as a standalone ticket),
and needs a volunteer to do the cleaning :)

  Mark


Re: New DNS white/blacklist + spamassassin rules Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread darxus
On 04/01, Mark Martinec wrote:
  eval:check_rbl_sub('iprep-firsttrusted', '127.\d+.\d+.100') describe
 
 Do not forget to backslash-quote dots in a regular expression
 if you mean a literal dot instead of 'any character'.

Updated rules (thanks again):


ifplugin Mail::SpamAssassin::Plugin::DNSEval
header   __RCVD_IN_IPREPDNS eval:check_rbl('iprep-firsttrusted', 
'iprep.chaosreigns.com.')
tflags   __RCVD_IN_IPREPDNS nice net

header   RCVD_IN_IPREPDNS_100   eval:check_rbl_sub('iprep-firsttrusted', 
'^127\.\d+\.\d+\.100$')
describe RCVD_IN_IPREPDNS_100   Sender listed at 
http://www.chaosreigns.com/iprep/, 100% ham
tflags   RCVD_IN_IPREPDNS_100   nice net

header   RCVD_IN_IPREPDNS_50eval:check_rbl_sub('iprep-firsttrusted', 
'^127\.\d+\.\d+\.50$')
describe RCVD_IN_IPREPDNS_50Sender listed at 
http://www.chaosreigns.com/iprep/, 50% ham
tflags   RCVD_IN_IPREPDNS_50nice net

header   RCVD_IN_IPREPDNS_0 eval:check_rbl_sub('iprep-firsttrusted', 
'^127\.\d+\.\d+\.0$')
describe RCVD_IN_IPREPDNS_0 Sender listed at 
http://www.chaosreigns.com/iprep/, 0% ham
tflags   RCVD_IN_IPREPDNS_0 net

meta RCVD_NOT_IN_IPREPDNS   ( ! RCVD_IN_IPREPDNS_100  ! 
RCVD_IN_IPREPDNS_50  ! RCVD_IN_IPREPDNS_0  ! NO_RELAYS )
describe RCVD_NOT_IN_IPREPDNS   Sender not listed at 
http://www.chaosreigns.com/iprep/
tflags   RCVD_NOT_IN_IPREPDNS   net

scoreRCVD_IN_IPREPDNS_100   -0.1
scoreRCVD_IN_IPREPDNS_50-0.0001
scoreRCVD_IN_IPREPDNS_0 0.1
scoreRCVD_NOT_IN_IPREPDNS   0.0001
endif


-- 
Go forth, and be excellent to one another. - http://www.jhuger.com/fredski.php
http://www.ChaosReigns.com


Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread David F. Skoll
On Fri, 1 Apr 2011 14:34:16 -0400
dar...@chaosreigns.com wrote:

 Out of the 86,899 IPs I have data for, all but 38 are either 100%
 spam or 100% ham,

That sounds a bit funny.

We have data on over 17 million IP addresses (collected using
http://mimedefang.org/reputation) Of those, about 9 million report at
least one ham or one spam -- the remainder either never made it past
greylisting or only tried emailing nonexistent recipient addresses.

Of those 9,102,875 hosts:

o 536,596 (5.8%) sent _only_ ham

o 7,821,574 (86%) sent _only_ spam

o The remaining 744,705 (8.2%) sent a mixture.  Most Yahoo! servers are in
  this category.

You saw less than 0.05% sending a mixture, which means you are probably
not getting a good sample.

Regards,

David.

PS: If anyone wants to contribute to and download *our* reputation
list, please see http://mimedefang.org/reputation and email me
off-list.  Please be aware that unlike darxus' list, ours is not
freely-available, though we generally give free downloads to
organizations willing to feed us reputation data if they do a
statistically-useful amount of mail (= 50K messages/day).




Re: Please report IPs delivering ham and spam with this script

2011-04-01 Thread darxus
On 04/01, David F. Skoll wrote:
 o 536,596 (5.8%) sent _only_ ham
 
 o 7,821,574 (86%) sent _only_ spam
 
 o The remaining 744,705 (8.2%) sent a mixture.  Most Yahoo! servers are in
   this category.

Sounds reasonable.  It's nice to see the numbers, thanks.

 You saw less than 0.05% sending a mixture, which means you are probably
 not getting a good sample.

Yup.  I don't have enough data.  That's why I'm asking for more.

-- 
Life is either a daring adventure or it is nothing at all.
- Helen Keller
http://www.ChaosReigns.com


Please report IPs delivering ham and spam with this script

2011-03-30 Thread darxus
My plan is to create another free reputation service, like a combination of
a whitelist and a blacklist, except providing the actual data instead of
just yes/no/maybe.  To help SpamAssassin filtering, obviously.

The data I'm planning to provide is, for every IP address, the percentage
of email from it which was ham (normalized like the S/O value in
SpamAssassin ruleqa), and total count of recent emails from that IP
(a logarithm of it).  Output data based on my own email:

http://www.chaosreigns.com/iprep/iprep.txt


With my 2618 hams, and 2956 spams, there were only *two* IP addresses that
were not 100% spam or 100% ham (both belong to google).  This kind of thing
is why black lists and white lists are useful for predicting if an email is
spam or ham.  The highest ranked test in SpamAssassin is RCVD_IN_XBL, a
spamhaus.org blacklist.  #7 is RCVD_IN_PSBL, and #11 is RCVD_IN_DNSWL_HI,
which is also the highest ranking nice rule.


To do this, I need data from you.

Create a folder containing only email you've confirmed is ham, and another
containing what you've confirmed is spam.

http://www.chaosreigns.com/iprep/dl/iprep.pl

./iprep.pl ham:dir:~/masscheckwork/ham spam:dir:~/masscheckwork/spam/

The arguments are the same as the targets used by SpamAssassin's
mass-check (using its perl modules):

class:format:location
class   is spam or ham
format  is dir, file, mbx, mbox, or detect
locationis a file or directory name.  globbing of ~ and * is supported

You can specify many targets at once.  

Please run it as a daily cron job.

The required ~/.ipreprc config file:
$trusted_networks = 'space delimited list of trusted hosts';
$user = 'username';
$pass = 'password';

$trusted_networks is very important, and needs to contain everything from
both your trusted_networks and internal_networks values from SpamAssassin,
which are documented here:  
http://spamassassin.apache.org/full/3.3.x/doc/Mail_SpamAssassin_Conf.html#network_test_options
http://wiki.apache.org/spamassassin/TrustPath
This is to prevent reporting the IP of your trusted relays instead of the
actual IP sending the email.  

Email me to get an account to upload the data.  Please email me from a
non-freemail account, one not listed in
http://svn.apache.org/repos/asf/spamassassin/trunk/rules/20_freemail_domains.cf
Major examples of freemail accounts, which I don't want you to email me from,
are:  gmail.com, yahoo.com, and hotmail.com.  This is just to make it
slightly harder for spammers to send me bad data.  And if you're on this
list, I know you have a non-freemail account.

I won't tell anybody your email address, and I consider the uploaded data
confidential.


I'm thinking about providing the data only via rsync, instead of via DNS,
because I think that should reduce network load.  I'd create a plugin that
would grab the data directly.


Just as a disclosure, I have been involved with dnswl.org since November
2006.  I have no plan to use any of their data, other than to look for
problems in my data.

-- 
Let's just say that if complete and utter chaos was lightning, then
he'd be the sort to stand on a hilltop in a thunderstorm wearing wet
copper armour and shouting 'All gods are bastards'. - The Color of Magic
http://www.ChaosReigns.com


learn spam as ham or ham as spam

2007-08-13 Thread Robert Nicholson

so I have to make two calls to learn?

ie. one to forget and another to relearn?

never is learning combined with forgetting right?


Re: learn spam as ham or ham as spam

2007-08-13 Thread Theo Van Dinter
On Tue, Aug 14, 2007 at 10:32:56AM +0700, Robert Nicholson wrote:
 ie. one to forget and another to relearn?
 never is learning combined with forgetting right?

One call is sufficient.  If the message was previously learned the wrong way,
it'll be forgotten for you, then learned the way you specify.

-- 
Randomly Selected Tagline:
Yeah, that's it!  I was right!  It's reality that has it wrong!  - Jim Toth


pgpD3F7lnl7ow.pgp
Description: PGP signature


Re: HAM and SPAM mailboxes

2007-03-05 Thread Luis Hernán Otegui

OK, Chris, I think I'll go on with you suggestion. I seems simpler, and a
lower load for my busted servers. However, I'm not a Perl Guru myself, so,
mind if you could clarify what did you ment with In that case, Perl's
Mail::Box::Manager is your friend.

How do I extract the original mail from the forwarded one?


Thanks,



Luis

2007/3/2, Chris St. Pierre [EMAIL PROTECTED]:


On Fri, 2 Mar 2007, Luis Hernán Otegui wrote:

 Hi, people, I am currently researching, trying to implement a way for my
 POP3 users to train SA via message forwarding. I've read in the list
that
 the messages should be forwarded as attachments. My question is how do
you
 make SA process them. I was thinking of creating two accounts (
 [EMAIL PROTECTED], and [EMAIL PROTECTED]), but frankly, I don't
understand
 the way to hand the forwarded messages to SA...

Instead of forwarding as an attachment, I have my users
bounce/redirect/resend their mail, which maintains the message in its
original state and is a lot easier to process than messages in
attachments.  That way, I can just have a cron job go through the
[EMAIL PROTECTED] and [EMAIL PROTECTED] mailboxes and have sa-learn learn each 
message.
Otherwise, you'll have to strip the attachments and pipe them into
sa-learn, which is a lot less trivial.  In that case, Perl's
Mail::Box::Manager is your friend.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

Never send mail to [EMAIL PROTECTED]





--
-
GNU-GPL: May The Source Be With You...
-


Re: HAM and SPAM mailboxes

2007-03-05 Thread Chris St. Pierre

On Mon, 5 Mar 2007, Luis Hernán Otegui wrote:


OK, Chris, I think I'll go on with you suggestion. I seems simpler, and a
lower load for my busted servers. However, I'm not a Perl Guru myself, so,
mind if you could clarify what did you ment with In that case, Perl's
Mail::Box::Manager is your friend.

How do I extract the original mail from the forwarded one?


No idea -- I'm not doing things that way myself.  My suggestion was
actually to bounce or resend the FPs and FNs, since it's a lot
simpler.  You can just call sa-learn on the bounced messages
themselves rather than extracting the forwarded messages from the
attachments.

If you decide to have your users forward as an attachment, though,
Mail::Box[::Manager] is a Perl module for doing magic with mailboxes.
You'll probably want to do something like this, assuming you're using
Maildir:

my $mgr = Mail::Box::Manager-new();
my $folder = $mgr-open(folder  = /path/to/spam/mailbox,
   fix_headers = 1);

foreach my $msg ($folder-messages()) {
  magic with $msg-parts()
}

If you're not using Maildir, you'll have to figure out what to do from
there.  I know Mail::Box supports MH, Mbox, and who knows what else,
but haven't used those myself.

http://search.cpan.org/~markov/Mail-Box-2.069/lib/Mail/Box-Overview.pod
should get you started.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

Never send mail to [EMAIL PROTECTED]


Re: HAM and SPAM mailboxes

2007-03-05 Thread Johann Spies
On Mon, Mar 05, 2007 at 10:58:00AM -0300, Luis Hernán Otegui wrote:
 OK, Chris, I think I'll go on with you suggestion. I seems simpler, and a 
 lower
 load for my busted servers. However, I'm not a Perl Guru myself, so, mind if
 you could clarify what did you ment with In that case, Perl's
 Mail::Box::Manager is your friend.
 
 How do I extract the original mail from the forwarded one?

I have written a small program in Ocaml which I use for that purpose.
It extracts emails that was forwarded as attachments and put them in to
a separate diretory from where it can be processed.

At the moment the directories are hardcoded but I can adapt it for more
generic situations there is a need for.

If someone is interested, let me know and I will try and make it
available.

Regards
Johann
-- 
Johann Spies  Telefoon: 021-808 4036
Informasietegnologie, Universiteit van Stellenbosch

 The LORD is my light and my salvation; whom shall I 
  fear? the LORD is the strength of my life; of whom 
  shall I be afraid?   Psalms 27:1 


HAM and SPAM mailboxes

2007-03-02 Thread Luis Hernán Otegui

Hi, people, I am currently researching, trying to implement a way for my
POP3 users to train SA via message forwarding. I've read in the list that
the messages should be forwarded as attachments. My question is how do you
make SA process them. I was thinking of creating two accounts (
[EMAIL PROTECTED], and [EMAIL PROTECTED]), but frankly, I don't understand
the way to hand the forwarded messages to SA...
Currently, I run two production servers, with virtual users, and they have
separate HAM and SPAM IMAP folders for each user. Via a cron job, I teach
the system the spam messages (I've instructed my users to move the spam
messages there via our webmail). But now I'm looking forward to expand the
service to my POP3 users. Any suggests will be welcomed.

BTW, I run SA (v 3.1.7) trough AMaViS over Postfix, Debian Sarge based
install.


Thanks in advance,


Luis
--
-
GNU-GPL: May The Source Be With You...
-


Re: HAM and SPAM mailboxes

2007-03-02 Thread Chris St. Pierre

On Fri, 2 Mar 2007, Luis Hernán Otegui wrote:


Hi, people, I am currently researching, trying to implement a way for my
POP3 users to train SA via message forwarding. I've read in the list that
the messages should be forwarded as attachments. My question is how do you
make SA process them. I was thinking of creating two accounts (
[EMAIL PROTECTED], and [EMAIL PROTECTED]), but frankly, I don't understand
the way to hand the forwarded messages to SA...


Instead of forwarding as an attachment, I have my users
bounce/redirect/resend their mail, which maintains the message in its
original state and is a lot easier to process than messages in
attachments.  That way, I can just have a cron job go through the
[EMAIL PROTECTED] and [EMAIL PROTECTED] mailboxes and have sa-learn learn each 
message.
Otherwise, you'll have to strip the attachments and pipe them into
sa-learn, which is a lot less trivial.  In that case, Perl's
Mail::Box::Manager is your friend.

Chris St. Pierre
Unix Systems Administrator
Nebraska Wesleyan University

Never send mail to [EMAIL PROTECTED]


ham and spam

2006-06-20 Thread Michael Di Martino
I am sure this has been posted b/4 however I am having a hard
time finding it in the archives

How does one feed bayes ham and spam on an smpt gateway(no local
deliverey).
All sever does is accetp mail for one 2 domains scrub for virus and spam
and
then forward it to its nastly littly exchange server.
 


Regards,
Michael Di Martino
Director of MIS
The telx Group, Inc
17 State St 33rd Floor
New York, NY 10004
p:  212.480.3300
m: 646.207.6603
www.telx.com

-Original Message-
From: Dirk Bonengel [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, June 20, 2006 11:57 AM
To: users@spamassassin.apache.org
Subject: Re: How to install iXhash

Marc,

just drop both files (.cf and .pm) into the directory where your
local.cf is.
One important piece of (missing) info: You must be running SA v 3.1.0 or
higher (not 3.0 as stated). If this is a problem I can easily post a
version working with 3.0.x

Dirk

Marc Perkel schrieb:


 Matt Kettler wrote:
 Marc Perkel wrote:
   
 Here's the link to the wiki, but I don't know what to do with it.

 http://wiki.apache.org/spamassassin/iXhash

 
 Disclaimer: I've never tried this. However, the following is a fairly

 well educated guess at how to install it.

 1) copy paste the bottom half into a file called iXhash.pm
 2) copy-paste the top half into a file called ixhash.cf
 3) place iXhash.pm somewhere that is global r/x
 4) edit the ixhash.cf to reflect where iXhash.pm is.
 5) copy ixhash.cf into /etc/mail/spamassasssin
 6) run spamassassin --lint
 7) if it passes, restart spamd or any other persistent daemons that 
 use the spamassassin perl API.


   

 Thanks Matt - what directory would you put iXhash.pm in? If I get this

 to work I'll update the wiki.



RE: ham and spam

2006-06-20 Thread Gary W. Smith
Michael, 

There are a couple ways of doing.  It really depends on how easy you
want to make it for your users/admins.  It also depends on your
configuration.

We use MySQL for bayes and awl.  This make it easy for us as we have an
internal machine running Cyrus and SA.  We have a local account with an
imap account on it that we copy the email to.  From there we have a
script that runs it against the ham/spam folders (including unlearn for
the same).  If you are running bayes on local DB you options are a
little different.  


* You can create an imap account on the gateway, move mail to it and
learn through.  
* You can create another non-gateway machine, install SA on it, and load
the spamd servers to listen to additional subnets beyond localhost
* You can take the messages and SFTP them to the gateway and run some
type of automated job there (this is what we did in the very beginning
for us some years ago).

There are other ways, these are just the ones that come off the top of
my head.

Gary


 -Original Message-
 From: Michael Di Martino [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 20, 2006 12:36 PM
 To: users@spamassassin.apache.org
 Subject: ham and spam
 
 I am sure this has been posted b/4 however I am having a hard
 time finding it in the archives
 
 How does one feed bayes ham and spam on an smpt gateway(no local
 deliverey).
 All sever does is accetp mail for one 2 domains scrub for virus and
spam
 and
 then forward it to its nastly littly exchange server.
 
 
 
 Regards,
 Michael Di Martino
 Director of MIS
 The telx Group, Inc
 17 State St 33rd Floor
 New York, NY 10004
 p:  212.480.3300
 m: 646.207.6603
 www.telx.com
 
 -Original Message-
 From: Dirk Bonengel [mailto:[EMAIL PROTECTED]
 Sent: Tuesday, June 20, 2006 11:57 AM
 To: users@spamassassin.apache.org
 Subject: Re: How to install iXhash
 
 Marc,
 
 just drop both files (.cf and .pm) into the directory where your
 local.cf is.
 One important piece of (missing) info: You must be running SA v 3.1.0
or
 higher (not 3.0 as stated). If this is a problem I can easily post a
 version working with 3.0.x
 
 Dirk
 
 Marc Perkel schrieb:
 
 
  Matt Kettler wrote:
  Marc Perkel wrote:
 
  Here's the link to the wiki, but I don't know what to do with it.
 
  http://wiki.apache.org/spamassassin/iXhash
 
 
  Disclaimer: I've never tried this. However, the following is a
fairly
 
  well educated guess at how to install it.
 
  1) copy paste the bottom half into a file called iXhash.pm
  2) copy-paste the top half into a file called ixhash.cf
  3) place iXhash.pm somewhere that is global r/x
  4) edit the ixhash.cf to reflect where iXhash.pm is.
  5) copy ixhash.cf into /etc/mail/spamassasssin
  6) run spamassassin --lint
  7) if it passes, restart spamd or any other persistent daemons that
  use the spamassassin perl API.
 
 
 
 
  Thanks Matt - what directory would you put iXhash.pm in? If I get
this
 
  to work I'll update the wiki.



RE: ham and spam

2006-06-20 Thread Bret Miller

  How does one feed bayes ham and spam on an smpt gateway(no local
  deliverey). All sever does is accetp mail for one 2 domains scrub
  for virus and spam and then forward it to its nastly littly
  exchange server.

 Can you set up shared Exchange folders that can be exported to mbox
 format? If so, set up learn-ham and learn-spam folders, tell people to
 train to them, then periodically export them, transfer them to the SA
 host, and run sa-learn on them.

 Perhaps someone sufficiently motivated could write an sa-learn -
 IMAP client utility to train from arbitrary IMAP folders hosted
 remotely...

Actually a few people have created various versions of that already. I
modified one based on the IMAP interface found here
http://gagravarr.org/code/ and use that. I found another here
http://www.dmzs.com/tools/files/spam.phtml.

Essentially, they just use IMAP to retrieve messages from the learn-spam
and learn-ham folders (or whatever you want to call them), and pass it
to SA to learn.

I don't use Exchange, but it is my understanding that it supports IMAP
access to its folders...

Bret





Re: ham and spam

2006-06-20 Thread Steven Stern
John D. Hardin wrote:
 On Tue, 20 Jun 2006, Michael Di Martino wrote:
 
 How does one feed bayes ham and spam on an smpt gateway(no local
 deliverey). All sever does is accetp mail for one 2 domains scrub
 for virus and spam and then forward it to its nastly littly
 exchange server.
 
 Can you set up shared Exchange folders that can be exported to mbox
 format? If so, set up learn-ham and learn-spam folders, tell people to
 train to them, then periodically export them, transfer them to the SA
 host, and run sa-learn on them.
 
 Perhaps someone sufficiently motivated could write an sa-learn -
 IMAP client utility to train from arbitrary IMAP folders hosted
 remotely...
 

We have trained users to put misclassified ham and spam into two public
folders, should-be-spam and should-be-ham.  We created an exchange user,
spamiam, that has full rights to these folders.

At the top of every hour, this script is run on the one MX server:


# more get_ham_spam
#! /bin/sh
rm -f /var/spool/mail/spamiam
touch /var/spool/mail/spamiam
chown spamiam:mail /var/spool/mail/spamiam
su  spamiam -c 'fetchmail -a -K -f
/usr/local/scripts/spamiam.fetchmailrc -r Public Folders/should-
be-spam'
cat /var/spool/mail/spamiam  /var/www/html/spamstuff/should-be-spam
sa-learn --spam --mbox /var/www/html/spamstuff/should-be-spam
rm -f /var/spool/mail/spamiam
touch /var/spool/mail/spamiam
chown spamiam:mail /var/spool/mail/spamiam
su  spamiam -c 'fetchmail -a -K -f
/usr/local/scripts/spamiam.fetchmailrc -r Public Folders/should-
be-ham'
cat /var/spool/mail/spamiam  /var/www/html/spamstuff/should-be-ham
sa-learn --ham --mbox /var/www/html/spamstuff/should-be-ham

# more spamiam.fetchmailrc
pollexchange..com
proto imap
user spamiam
password x
is spamiam here

At 15 past each hour, the two other mail servers use wget to grab the
should-be files to their local /tmp and run sa-learn.

The files are included in logrotate, so they get zero'd every Sunday
morning.

-- 

  Steve


Re: enabable x-spam-report in all emails ham or spam

2005-09-28 Thread Matt Kettler
Keith Amling wrote:
Fascinating the man page seems to indicate this is not one of the options
for add_header. They mention other headers but not Report. I guess you
found a cheat.
 
 What is not one of the options?  'add_header', 'all', and '_REPORT_' are all
 mentioned directly in the perldoc for Conf.  How is my suggestion a 'cheat'?
 

All the options are documented, and using one configuration option to over-ride
another is also documented.

However, using a configuration option to over-ride a hard-coded setting from
Conf.pm, is definitely NOT documented.

The fact that it depends on the specific, and undocumented, way the developers
chose to implement adding the header for report_safe 0 makes it a cheat.

I wouldn't say it's a particularly egregious cheat, but it's certainly not
documented. It's just taking advantage of the fact that the developers
(sensibly) made use of existing functionality to do this.




Re: enabable x-spam-report in all emails ham or spam

2005-09-27 Thread Keith Amling
 Fascinating the man page seems to indicate this is not one of the options
 for add_header. They mention other headers but not Report. I guess you
 found a cheat.
What is not one of the options?  'add_header', 'all', and '_REPORT_' are all
mentioned directly in the perldoc for Conf.  How is my suggestion a 'cheat'?

 {^_^}
Keith


Re: enabable x-spam-report in all emails ham or spam

2005-09-27 Thread jdow

From: Keith Amling [EMAIL PROTECTED]


Fascinating the man page seems to indicate this is not one of the options
for add_header. They mention other headers but not Report. I guess you
found a cheat.
What is not one of the options?  'add_header', 'all', and '_REPORT_' are 
all
mentioned directly in the perldoc for Conf.  How is my suggestion a 
'cheat'?



{^_^}


I was looking for Report in and around add_header on the 3.04 docs
I have here.
{^_^} 





Re: enabable x-spam-report in all emails ham or spam

2005-09-25 Thread Keith Amling
 is it possible to enable the addition of the x-spam-report in all emails? 
I note

$self-{headers_spam}-{Report} = _REPORT_;

in Conf.pm which amounts to the configuration

add_header spam Report _REPORT_

I wanted the exact same thing you want and

add_header all Report _REPORT_

has worked perfectly for me.  YMMV, esp. wrt. report_safe.

 -matt
Keith


Re: enabable x-spam-report in all emails ham or spam

2005-09-25 Thread Matthew Lenz

works perfectly.  thanks dude!

-Matt

- Original Message - 
From: Keith Amling [EMAIL PROTECTED]

To: users@spamassassin.apache.org
Sent: Sunday, September 25, 2005 1:33 AM
Subject: Re: enabable x-spam-report in all emails ham or spam



is it possible to enable the addition of the x-spam-report in all emails?

I note

$self-{headers_spam}-{Report} = _REPORT_;

in Conf.pm which amounts to the configuration

add_header spam Report _REPORT_

I wanted the exact same thing you want and

add_header all Report _REPORT_

has worked perfectly for me.  YMMV, esp. wrt. report_safe.


-matt

Keith



Re: enabable x-spam-report in all emails ham or spam

2005-09-25 Thread jdow

From: Keith Amling [EMAIL PROTECTED]


is it possible to enable the addition of the x-spam-report in all emails?

I note

$self-{headers_spam}-{Report} = _REPORT_;

in Conf.pm which amounts to the configuration

add_header spam Report _REPORT_

I wanted the exact same thing you want and

add_header all Report _REPORT_

has worked perfectly for me.  YMMV, esp. wrt. report_safe.


Fascinating the man page seems to indicate this is not one of the options
for add_header. They mention other headers but not Report. I guess you
found a cheat.

{^_^}




Re: enabable x-spam-report in all emails ham or spam

2005-09-25 Thread Loren Wilton
 is it possible to enable the addition of the x-spam-report in all emails?

Depends on what you are using to integrate SA.  If you are using spamd, yes.
Some of the other tools that make their own headers, no.

Loren



enabable x-spam-report in all emails ham or spam

2005-09-24 Thread Matthew Lenz
is it possible to enable the addition of the x-spam-report in all emails? 


-matt


Re: enabable x-spam-report in all emails ham or spam

2005-09-24 Thread Matt Kettler
Matthew Lenz wrote:
 is it possible to enable the addition of the x-spam-report in all emails?
 -matt
 

Well, there is no X-Spam-Report header made by SA's default configuration.
By default SA does add X-Spam-Status to all messages, which would include the
score and list of rules that hit.

However, in general all you'd need to do is modify the add_header spam Report
... command to use all instead of spam.


Of course, that's assuming your X-Spam-Report header is being made by SA. If
you're using an integration tool like MailScanner, qmail or Mimedefang you may
have to change their configuration, not SA.

Post some more details about your configuration if you're still having problems.



Re: enabable x-spam-report in all emails ham or spam

2005-09-24 Thread Matthew Lenz
- Original Message - 
From: Matt Kettler [EMAIL PROTECTED]

To: Matthew Lenz [EMAIL PROTECTED]
Cc: users@spamassassin.apache.org
Sent: Saturday, September 24, 2005 6:59 PM
Subject: Re: enabable x-spam-report in all emails ham or spam



Matthew Lenz wrote:

is it possible to enable the addition of the x-spam-report in all emails?
-matt



Well, there is no X-Spam-Report header made by SA's default 
configuration.


huh? it addes it to spam by default

X-Spam-Status: Yes, score=6.4 required=5.0 tests=HTML_MESSAGE,RCVD_IN_DSBL,
RCVD_IN_XBL,URIBL_SBL,URIBL_WS_SURBL autolearn=no version=3.0.3
X-Spam-Report:
*  0.0 HTML_MESSAGE BODY: HTML included in message
*  2.8 RCVD_IN_DSBL RBL: Received via a relay in list.dsbl.org
*  [http://dsbl.org/listing?218.234.40.38]
*  2.5 RCVD_IN_XBL RBL: Received via a relay in Spamhaus XBL
*  [218.234.40.38 listed in sbl-xbl.spamhaus.org]
*  0.6 URIBL_SBL Contains an URL listed in the SBL blocklist
*  [URIs: grounansho.com]
*  0.5 URIBL_WS_SURBL Contains an URL listed in the WS SURBL blocklist
*  [URIs: grounansho.com]

unless thats my imagination

By default SA does add X-Spam-Status to all messages, which would include 
the

score and list of rules that hit.


yep sure does.

However, in general all you'd need to do is modify the add_header spam 
Report

... command to use all instead of spam.


what is ... ?  add_header all Report doesn't do squat.



Of course, that's assuming your X-Spam-Report header is being made by SA. 
If
you're using an integration tool like MailScanner, qmail or Mimedefang you 
may

have to change their configuration, not SA.


i gave you everything.. yes i'm only using SA and sending mail through spamc 
using procmail.  i don't have it encapsulate spams by default. by putting 
report_safe 0 in my local.cf


Post some more details about your configuration if you're still having 
problems.


Re: enabable x-spam-report in all emails ham or spam

2005-09-24 Thread jdow

From: Matthew Lenz [EMAIL PROTECTED]

From: Matt Kettler [EMAIL PROTECTED]


Matthew Lenz wrote:
is it possible to enable the addition of the x-spam-report in all 
emails?

-matt



Well, there is no X-Spam-Report header made by SA's default 
configuration.


huh? it addes it to spam by default


Can't be done acto the docs. The X-Spam-Report header is only added to
spam. How it is added is handled by the report_safe option.

If you want this for testing then use spamassassin -t. If you want
it for everybody then you cannot use spamc/spamd. You'd need to use
spamassassin itself and add the -t option explicitly.

Please RTFM, man Mail::SpamAssassin::Conf, for more details.
{^_^} 





Re: Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-06 Thread Kevin Peuhkurinen
Kelly Corbin wrote:
I have 4 machines configured identically (with the exception of the -m
option due to differences in resources on each machine) with
SpamAssassin and spamass-milter.  I recently upgraded to 3.0.2 from 2.64
and everything seems to be working pretty good with the exception of one
machine.  After watching the mail log, I noticed that it is not
autolearning any spam, no matter how high it scores.  It does autolearn
ham however, and the other 3 machines autolearn spam fine.
I've looked at everything I can think of (configuration files, file
permissions, checked FAQ's, searched list archives, etc.) and can't
figure out why it won't autolearn any spam.
Any ideas?
Take an email with lots of hits and save it as 'spam-email', then run 
'spamassassin -t -D  spam-email' and see what the debug has to say 
about it.   Feel free to post the Bayes-specific parts of the debug here 
if you aren't sure of how to read it.

Thanks!
Kelly


Re: Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-06 Thread Kelly Corbin
Here's my auto-learn lines from the machine that doesn't work:
debug: auto-learn: currently using scoreset 3, recomputing score based
on scoreset 1.
debug: auto-learn: message score: 23.316, computed score for autolearn:
24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82,
head-points=9.84, learned-points=-2.599
debug: auto-learn? no: scored as spam but learner indicated ham (-2.599
 -1)
debug: is spam? score=23.316 required=6
And here's my output from the machine that's learning OK:
debug: auto-learn: currently using scoreset 3, recomputing score based
on scoreset 1.
debug: auto-learn: message score: 25.916, computed score for autolearn:
24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82,
head-points=9.84, learned-points=0.001
debug: auto-learn? yes, spam (24.06  10)
debug: Learning Spam
What is this 'learned-points'?  Is my database poisoned on the affected
machine?
Thanks!
Kelly
Kevin Peuhkurinen wrote:
Kelly Corbin wrote:
I have 4 machines configured identically (with the exception of the -m
option due to differences in resources on each machine) with
SpamAssassin and spamass-milter.  I recently upgraded to 3.0.2 from 2.64
and everything seems to be working pretty good with the exception of one
machine.  After watching the mail log, I noticed that it is not
autolearning any spam, no matter how high it scores.  It does autolearn
ham however, and the other 3 machines autolearn spam fine.
I've looked at everything I can think of (configuration files, file
permissions, checked FAQ's, searched list archives, etc.) and can't
figure out why it won't autolearn any spam.
Any ideas?
Take an email with lots of hits and save it as 'spam-email', then run 
'spamassassin -t -D  spam-email' and see what the debug has to say 
about it.   Feel free to post the Bayes-specific parts of the debug here 
if you aren't sure of how to read it.

Thanks!
Kelly

--

-- Kelly Corbin
-- Network Administrator
--
-- http://www.theiqgroup.com
--
-- The IQ Group, Inc.
-- 6740 Antioch Suite 260
-- Merriam, KS 66204
-- (913)722-6700 x105
-- Fax (913)722-7264




Re: Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-06 Thread Kevin Peuhkurinen
Kelly Corbin wrote:
Here's my auto-learn lines from the machine that doesn't work:
debug: auto-learn: currently using scoreset 3, recomputing score based 
on scoreset 1.
debug: auto-learn: message score: 23.316, computed score for 
autolearn: 24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82, 
head-points=9.84, learned-points=-2.599
debug: auto-learn? no: scored as spam but learner indicated ham 
(-2.599  -1)
debug: is spam? score=23.316 required=6

And here's my output from the machine that's learning OK:
debug: auto-learn: currently using scoreset 3, recomputing score based 
on scoreset 1.
debug: auto-learn: message score: 25.916, computed score for 
autolearn: 24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82, 
head-points=9.84, learned-points=0.001
debug: auto-learn? yes, spam (24.06  10)
debug: Learning Spam

What is this 'learned-points'?  Is my database poisoned on the 
affected machine?

I'm guessing here that the email is hitting BAYES_00 (which has a score 
of -2.599 by default, and which is the learned points).SA now has 
some code to ensure that emails that hit low BAYES scores will not be 
autolearned as spam and emails that hit high BAYES scores will not be 
autolearned as ham, no matter what they score otherwise.  I'm assuming, 
then, that all or most of your emails are hitting BAYES_00 to BAYES_40 
only.   This means that indeed your Bayes database is pooched.   

The easiest solution is likely to just delete the database from this 
machine and copy over the database from one of your other systems, 
provided that they are handling similar types of emails.




Re: Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-06 Thread Kelly Corbin
That did the trick!  I just copied over the databases from one of the 
good machines and right away it started doing the autolearn=spam.

Thanks for all your help.
Kelly
Kevin Peuhkurinen wrote:
Kelly Corbin wrote:
Here's my auto-learn lines from the machine that doesn't work:
debug: auto-learn: currently using scoreset 3, recomputing score based 
on scoreset 1.
debug: auto-learn: message score: 23.316, computed score for 
autolearn: 24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82, 
head-points=9.84, learned-points=-2.599
debug: auto-learn? no: scored as spam but learner indicated ham 
(-2.599  -1)
debug: is spam? score=23.316 required=6

And here's my output from the machine that's learning OK:
debug: auto-learn: currently using scoreset 3, recomputing score based 
on scoreset 1.
debug: auto-learn: message score: 25.916, computed score for 
autolearn: 24.06
debug: auto-learn? ham=0.1, spam=10, body-points=16.82, 
head-points=9.84, learned-points=0.001
debug: auto-learn? yes, spam (24.06  10)
debug: Learning Spam

What is this 'learned-points'?  Is my database poisoned on the 
affected machine?

I'm guessing here that the email is hitting BAYES_00 (which has a score 
of -2.599 by default, and which is the learned points).SA now has 
some code to ensure that emails that hit low BAYES scores will not be 
autolearned as spam and emails that hit high BAYES scores will not be 
autolearned as ham, no matter what they score otherwise.  I'm assuming, 
then, that all or most of your emails are hitting BAYES_00 to BAYES_40 
only.   This means that indeed your Bayes database is pooched.  
The easiest solution is likely to just delete the database from this 
machine and copy over the database from one of your other systems, 
provided that they are handling similar types of emails.


--

-- Kelly Corbin
-- Network Administrator
--
-- http://www.theiqgroup.com
--
-- The IQ Group, Inc.
-- 6740 Antioch Suite 260
-- Merriam, KS 66204
-- (913)722-6700 x105
-- Fax (913)722-7264



Re: Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-06 Thread Matt Kettler
Kelly Corbin wrote:


 What is this 'learned-points'?

That's what score the BAYES_* rules would have given this message based
on existing learning.

This is basically used to prevent SA from automatically learning
anything that noticeably contradicts the existing training.

   Is my database poisoned on the affected
 machine?

Possibly. It's either poisoned, or it's just not trained on a wide
enough variety of spam.

It looks like SA's existing training tells it to regard that message as
BAYES_00. (ie: less than 1% chance of being spam). I'm basing the
BAYES_00 claim on the learned points being -2.599, which matches the
score of the BAYES_00 rule.





Spamassassin only autolearning ham, not spam after upgrade to 3.0.2

2005-04-05 Thread Kelly Corbin
I have 4 machines configured identically (with the exception of the -m
option due to differences in resources on each machine) with
SpamAssassin and spamass-milter.  I recently upgraded to 3.0.2 from 2.64
and everything seems to be working pretty good with the exception of one
machine.  After watching the mail log, I noticed that it is not
autolearning any spam, no matter how high it scores.  It does autolearn
ham however, and the other 3 machines autolearn spam fine.
I've looked at everything I can think of (configuration files, file
permissions, checked FAQ's, searched list archives, etc.) and can't
figure out why it won't autolearn any spam.
Any ideas?
Thanks!
Kelly
--

-- Kelly Corbin
-- Network Administrator
--
-- http://www.theiqgroup.com
--
-- The IQ Group, Inc.
-- 6740 Antioch Suite 260
-- Merriam, KS 66204
-- (913)722-6700 x105
-- Fax (913)722-7264




List number of ham and spam in Bayes

2004-10-26 Thread Mathieu Nantel
Good day,

I'm sorry if that question has been answered before, but I could not find an 
answer.

Is there a command / way that will show how many spams and hams have been 
learned by the Bayesian filter?

-- 
Mathieu Nantel, RHCE - Systems Manager
Ecopia BioSciences Inc.
(514) 336-2724 x434


Re: List number of ham and spam in Bayes

2004-10-26 Thread Adam Lanier
Mathieu Nantel wrote:
Good day,
I'm sorry if that question has been answered before, but I could not find an 
answer.

Is there a command / way that will show how many spams and hams have been 
learned by the Bayesian filter?

sa-learn --dump magic
--
Adam Lanier
Bernard L. Madoff Investment Securities LLC


Re: Manually learnt HAM as SPAM. Can I undo?

2004-10-16 Thread Tobias von Koch
Hey,

On Sat, 16 Oct 2004 09:46:36 +0200, Nicolas wrote:

N This morning I made a mistake with spamassassin. I manually learnt
N (/usr/bin/sa-learn --spam) an IMPORTANT message as SPAM. I
N immediately learnt it as HAM, but is that sufficient?
N Do I have to delete all of the bayes tokens I accumulated for over 1
N year?

Nope, you're fine. If you RTM, you'd know that:

   --ham
   Learn the input message(s) as ham. If you have previously
   learnt any of the messages as spam, SpamAssassin
   will forget them first, then re-learn them as ham.
   Alternatively, if you have previously learnt them as ham,
   it'll skip them this time around. If the messages have
   already been filtered through SpamAssassin, the learner
   will ignore any modifications SpamAssassin may have made.

tobias