About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
I have set up my spamassassin to traing individual database (mysql ) with
this filter in postfix

spamassassin unix - n   n   -   -   pipe
flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e
/usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

as this filter works it auto learn in the database to individual user it
gets learn!

But if i send the same email that was autolearned it does not get an higher
score..  it should be lik eit or it shoul get higher ?

and how do i know if the training is working ?

thanks!


[]'sf.rique


Re: About Training ( sa-learn )

2010-03-04 Thread Kai Schaetzl
Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300:

 But if i send the same email that was autolearned it does not get an higher
 score..  it should be lik eit or it shoul get higher ?

I if understand you correctly you want to learn a message twice. sa-learn 
won't do this. And the docs tell.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Nops, i wnat that after i trained, the same email, should get a higher score
cause the spamassassin was trained that is a spam, so when it comes again ,
it should look in the database and add some extra point on the score right ?



[]'sf.rique


On Thu, Mar 4, 2010 at 1:31 PM, Kai Schaetzl mailli...@conactive.comwrote:

 Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300:

  But if i send the same email that was autolearned it does not get an
 higher
  score..  it should be lik eit or it shoul get higher ?

 I if understand you correctly you want to learn a message twice. sa-learn
 won't do this. And the docs tell.

 Kai

 --
 Get your web at Conactive Internet Services: http://www.conactive.com






Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Henrique Fernandes wrote:
 Nops, i wnat that after i trained, the same email, should get a higher
 score cause the spamassassin was trained that is a spam, so when it
 comes again , it should look in the database and add some extra point
 on the score right ?

That is a fairly common misconception.  When you learn an email as spam,
the Bayes system breaks it into tokens (words/character strings) and
then makes a note that each of those tokens was seen in a spam.  When an
email comes in, it breaks up the new email into tokens and then checks
to see how frequently each of those tokens was previously seen in spam
or ham.  Based on what it finds, it ranks the email from BAYES_00 (very
unlikely to be spam) to BAYES_99 (almost certainly spam).

Since learning from a single email only adds one data point to each
token, it is unlikely to make a major difference on its own.  The value
comes in learning from lots of spam and ham.  This is why the Bayes
rules will not run until you have learned from at least 200 ham and 200
spam.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
(Please send replies to the list)

Henrique Fernandes wrote:

 On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey bowie_bai...@buc.com
 mailto:bowie_bai...@buc.com wrote:

 Henrique Fernandes wrote:
  Nops, i wnat that after i trained, the same email, should get a
 higher
  score cause the spamassassin was trained that is a spam, so when it
  comes again , it should look in the database and add some extra
 point
  on the score right ?

 That is a fairly common misconception.  When you learn an email as
 spam,
 the Bayes system breaks it into tokens (words/character strings) and
 then makes a note that each of those tokens was seen in a spam.
  When an
 email comes in, it breaks up the new email into tokens and then checks
 to see how frequently each of those tokens was previously seen in spam
 or ham.  Based on what it finds, it ranks the email from BAYES_00
 (very
 unlikely to be spam) to BAYES_99 (almost certainly spam).

 Since learning from a single email only adds one data point to each
 token, it is unlikely to make a major difference on its own.  The
 value
 comes in learning from lots of spam and ham.  This is why the Bayes
 rules will not run until you have learned from at least 200 ham
 and 200
 spam.


 hmm

 Thanks, so ech individual user has to have learned lots of emails so
 after that they will start to have an difference on score ?

Yes. Each individual user will need to learn at least 200 ham and 200
spam (manually or via auto-learn) before Bayes will start scoring.  The
more they learn, the better the accuracy.

 So is better to just traing one database to all user instead one base
 for each user ?

 Making just one base i am afraid of getting to many false-positives.
 Cause sometimes Viagra is not spam for some one that researhc it, but
 if it is in the same base, it will be marked as spam...

Depends on your users.  Unless they are wildly different, a single
database should work fairly well.  Individual databases can be more
accurate in some instances, but a single well-trained database will
probably work better than a bunch of individual databases that are not
trained consistently.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Thanks!

I will discuss here and find out with one is better.

What are the weight of the bayser score after they well trained ? Have any
ideas about it ?

[]'sf.rique


On Thu, Mar 4, 2010 at 2:41 PM, Bowie Bailey bowie_bai...@buc.com wrote:

 (Please send replies to the list)

 Henrique Fernandes wrote:
 
  On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey bowie_bai...@buc.com
  mailto:bowie_bai...@buc.com wrote:
 
  Henrique Fernandes wrote:
   Nops, i wnat that after i trained, the same email, should get a
  higher
   score cause the spamassassin was trained that is a spam, so when it
   comes again , it should look in the database and add some extra
  point
   on the score right ?
 
  That is a fairly common misconception.  When you learn an email as
  spam,
  the Bayes system breaks it into tokens (words/character strings) and
  then makes a note that each of those tokens was seen in a spam.
   When an
  email comes in, it breaks up the new email into tokens and then
 checks
  to see how frequently each of those tokens was previously seen in
 spam
  or ham.  Based on what it finds, it ranks the email from BAYES_00
  (very
  unlikely to be spam) to BAYES_99 (almost certainly spam).
 
  Since learning from a single email only adds one data point to each
  token, it is unlikely to make a major difference on its own.  The
  value
  comes in learning from lots of spam and ham.  This is why the Bayes
  rules will not run until you have learned from at least 200 ham
  and 200
  spam.
 
 
  hmm
 
  Thanks, so ech individual user has to have learned lots of emails so
  after that they will start to have an difference on score ?

 Yes. Each individual user will need to learn at least 200 ham and 200
 spam (manually or via auto-learn) before Bayes will start scoring.  The
 more they learn, the better the accuracy.

  So is better to just traing one database to all user instead one base
  for each user ?
 
  Making just one base i am afraid of getting to many false-positives.
  Cause sometimes Viagra is not spam for some one that researhc it, but
  if it is in the same base, it will be marked as spam...

 Depends on your users.  Unless they are wildly different, a single
 database should work fairly well.  Individual databases can be more
 accurate in some instances, but a single well-trained database will
 probably work better than a bunch of individual databases that are not
 trained consistently.

 --
 Bowie



Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Henrique Fernandes wrote:
 Thanks!

 I will discuss here and find out with one is better.

 What are the weight of the bayser score after they well trained ? Have
 any ideas about it ?

I'm not sure what you are asking.  What do you mean by weight?

The default scores (as of 3.2.5) are:

BAYES_00-2.599
BAYES_05-1.110
BAYES_20-0.740
BAYES_40-0.185
BAYES_500.001
BAYES_601.0
BAYES_802.0
BAYES_953.0
BAYES_993.5

Take a look at
/var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf to
see the scores on your system.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
It was wht i asked, sorry i am not fluent in english

It is the score that the bayes add to the final scores right ?


[]'sf.rique


On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey bowie_bai...@buc.com wrote:

 Henrique Fernandes wrote:
  Thanks!
 
  I will discuss here and find out with one is better.
 
  What are the weight of the bayser score after they well trained ? Have
  any ideas about it ?

 I'm not sure what you are asking.  What do you mean by weight?

 The default scores (as of 3.2.5) are:

 BAYES_00-2.599
 BAYES_05-1.110
 BAYES_20-0.740
 BAYES_40-0.185
 BAYES_500.001
 BAYES_601.0
 BAYES_802.0
 BAYES_953.0
 BAYES_993.5

 Take a look at
 /var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf to
 see the scores on your system.

 --
 Bowie



Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Right.

Henrique Fernandes wrote:
 It was wht i asked, sorry i am not fluent in english

 It is the score that the bayes add to the final scores right ?


 []'sf.rique


 On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey bowie_bai...@buc.com
 mailto:bowie_bai...@buc.com wrote:

 Henrique Fernandes wrote:
  Thanks!
 
  I will discuss here and find out with one is better.
 
  What are the weight of the bayser score after they well trained
 ? Have
  any ideas about it ?

 I'm not sure what you are asking.  What do you mean by weight?

 The default scores (as of 3.2.5) are:

 BAYES_00-2.599
 BAYES_05-1.110
 BAYES_20-0.740
 BAYES_40-0.185
 BAYES_500.001
 BAYES_601.0
 BAYES_802.0
 BAYES_953.0
 BAYES_993.5

 Take a look at
 /var/lib/spamassassin/version/updates_spamassassin_org/50_scores.cf
 http://50_scores.cf to
 see the scores on your system.

 --
 Bowie




Re: About Training ( sa-learn )

2010-03-04 Thread LuKreme
On 4-Mar-2010, at 07:45, Henrique Fernandes wrote:
 
 I have set up my spamassassin to traing individual database (mysql ) with
 this filter in postfix
 
 spamassassin unix - n   n   -   -   pipe
flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e
/usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

Wait, what exactly is this doing?


-- 
Windle shook his head sadly. Five exclamation marks, the sure sign of an insane 
mind. --Reaper Man



Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Every email that comes in postfix i send to that filter, and this filter
send the email.  When i use the  with the option -u ${recipient}  it
override the user that is runing and do the process with the user that is
reciving the email, when it autolearn it goes to a diferent user in the
table. So i have diferent databases for each user.

And after go through the spamc filter it repass the email.

good enough ?

[]'sf.rique


On Thu, Mar 4, 2010 at 9:54 PM, LuKreme krem...@kreme.com wrote:

 On 4-Mar-2010, at 07:45, Henrique Fernandes wrote:
 
  I have set up my spamassassin to traing individual database (mysql ) with
  this filter in postfix
 
  spamassassin unix - n   n   -   -   pipe
 flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f
 -e
 /usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

 Wait, what exactly is this doing?


 --
 Windle shook his head sadly. Five exclamation marks, the sure sign of an
 insane mind. --Reaper Man




Training SA

2009-06-08 Thread snowweb

Hi, I'm new to SA. I run an Exim/Dovecot CentOS 5.0 mailserver (VPS), on
which I have recently installed SA.

I have configured 'Autolearn = yes' but I have no way to know whether this
is working. Please can someone explain to me how this works, since my
understanding of this is as follows, and makes no sense!

SpamAssassin identifies a mail as spam and stores the details of it so that
it is easier to identify future emails which are similar. However, I fail to
understand how this will help, since it's already successfully identifying
those emails?

Furthermore, I can I train SpamAssassin to recognize emails that it is
currently giving only a very low score to, as spam? I'm getting many emails
each day about Acai Berries but SA they are only getting a score of around
3.3! How can I train it to recognize these, server wide?

Thanks.

pete
-- 
View this message in context: 
http://www.nabble.com/Training-SA-tp23921166p23921166.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



RE: training SA

2007-06-27 Thread Bowie Bailey
zigniew szalbot wrote:
 Hi,
 
  I tried to learn SA and used the following syntax:
  
  sa-learn --spam -f /usr/home/zbyszek/june.txt
  I guess I made a mistake with the syntax but how should I change it
  so that I can train SA?
 
 I already found out:
 sa-learn --spam --no-sync /usr/home/zbyszek/june.txt

The important bit is that you leave off the '-f' since that specifies
that the directories to learn from are IN the file you specify.

The '--no-sync' can be useful, but remember that if you always learn
that way, you need to run 'sa-learn --sync' from time to time.

-- 
Bowie


training SA

2007-06-26 Thread zigniew szalbot
Hello,

I tried to learn SA and used the following syntax:

sa-learn --spam -f /usr/home/zbyszek/june.txt
archive-iterator: unable to open  Dear Valued Customer,: No such file
or directory

june.txt is a spam email message downloaded from squirrelmail for the
purpose of feeding to SA. I only got unable to open message. And at the
end:
Learned tokens from 0 message(s) (0 message(s) examined)

I guess I made a mistake with the syntax but how should I change it so
that I can train SA?

Thank you in advance!

Zbigniew Szalbot




Re: training SA

2007-06-26 Thread zigniew szalbot
Hi,

 I tried to learn SA and used the following syntax:

 sa-learn --spam -f /usr/home/zbyszek/june.txt
 I guess I made a mistake with the syntax but how should I change it so
 that I can train SA?

I already found out:
sa-learn --spam --no-sync /usr/home/zbyszek/june.txt

Sorry to have bothered!

Warm regards,

Zbigniew Szalbot



Re: training SA

2007-06-26 Thread Nigel Frankcom
On Wed, 27 Jun 2007 07:35:01 +0200 (CEST), zigniew szalbot
[EMAIL PROTECTED] wrote:

Hello,

I tried to learn SA and used the following syntax:

sa-learn --spam -f /usr/home/zbyszek/june.txt
archive-iterator: unable to open  Dear Valued Customer,: No such file
or directory

june.txt is a spam email message downloaded from squirrelmail for the
purpose of feeding to SA. I only got unable to open message. And at the
end:
Learned tokens from 0 message(s) (0 message(s) examined)

I guess I made a mistake with the syntax but how should I change it so
that I can train SA?



Hi,

Have you double checked the path for typos?

Also, you may well need the -u switch. I use:


sa-learn --spam -u sauser /downloads/spam  mv -f /downloads/spam/*.Mail 
/downloads/spam/fn

The last bit   mv -f /downloads/spam/*.Mail /downloads/spam/fn is
just copying the file to a dir so I can track what's been trained and
is probably surplus to your requirements.

I have mine as a script so I just call ./ham or ./spam as required.

HTH

Nigel


Re: Training SA-Migrating from old IMAP to new IMAP server

2007-03-12 Thread Magnus Holmgren
On Sunday 11 March 2007 18:09, Don Ireland wrote:
 I'm my email over from the services of fusemail.com to the IMAP server that
 comes with my shared hosting account.

 When I copy my messages over from the old server, do I just run SA-learn
 against the messages as they are?  Or will the fact that they have fusemail
 headers in them cause SA to think messages without fusemail headers are
 spam?

If so, you can make bayes ignore those headers with bayes_ignore_header in 
local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage.

 I've always deleted spam after training the filters so I don't have any to
 feed to to the new system.  Will that be a problem?

Having too great an imbalance in numbers between ham and spam will bias the 
bayes classifier towards everything is spam or in this case everything is 
ham.

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)


pgpaKX1rPnVSG.pgp
Description: PGP signature


Re: Training SA-Migrating from old IMAP to new IMAP server

2007-03-12 Thread Don Ireland
So it sounds like I may be better off NOT training on existing messages.  Only 
on new that come in.

Don Ireland
-Original Message-
From: Magnus Holmgren [EMAIL PROTECTED]
Date: Monday, Mar 12, 2007 5:23 am
Subject: Re: Training SA-Migrating from old IMAP to new IMAP server
To: users@spamassassin.apache.org

On Sunday 11 March 2007 18:09, Don Ireland wrote:
 I'm my email over from the services of fusemail.com to the IMAP server that
 comes with my shared hosting account.

 When I copy my messages over from the old server, do I just run SA-learn
 against the messages as they are?  Or will the fact that they have fusemail
 headers in them cause SA to think messages without fusemail headers are
 spam?

If so, you can make bayes ignore those headers with bayes_ignore_header in 
local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage.

 I've always deleted spam after training the filters so I don't have any to
 feed to to the new system.  Will that be a problem?

Having too great an imbalance in numbers between ham and spam will bias the 
bayes classifier towards everything is spam or in this case everything is 
ham.

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)




Training SA-Migrating from old IMAP to new IMAP server

2007-03-11 Thread Don Ireland
I'm my email over from the services of fusemail.com to the IMAP server that 
comes with my shared hosting account.

When I copy my messages over from the old server, do I just run SA-learn 
against the messages as they are?  Or will the fact that they have fusemail 
headers in them cause SA to think messages without fusemail headers are spam?

I've always deleted spam after training the filters so I don't have any to feed 
to to the new system.  Will that be a problem?

Don Ireland



Re: Training sa-learn from Outlook.

2006-09-21 Thread Loren Wilton



This sort of question gets asked a lot and there are various answers.

The most common solution is to set up some public folders that are really 
IMAP folders, probably on your main mail machine, but that doesn't really matter 
much. Then as you suggest, run a cron job to pull the mail from them and 
do the learning.

If you look in the wiki I believe there is a page or two devoted to this 
sort of thing with Outlook or OE.

Do you have individual bayes databases or site-wide? If you have 
individual bayes databases then they would most likely each be under a usercode 
for the individual owner. In that case having global spam and ham folders 
won't work all that well, since you would have to learn the whole mess many 
times, once into each bayes database. It would make more sense to have 
per-user ham and spam folders, which could still use the IMAP solution.

I assume that you have a global bayes database. In that case you 
should run sa-learn under whichever usercode SA is running under when it 
accesses that database.

  Loren

  - Original Message - 
  From: 
  Andrew 
  van Tilburg 
  To: users@spamassassin.apache.org 
  
  Sent: Wednesday, September 20, 2006 10:37 
  PM
  Subject: Training sa-learn from 
  Outlook.
  
  
  I imagine the following questions 
  have been asked a lot, but I haven’t seen the exact answers I’m after yet so 
  here goes.
  
  We are running qmail, vpopmail, 
  spamassassin, smb shares using samba, among other things, on freebsd. I want 
  to set up public ham and spam folders such that our users can drag emails from 
  Outlook. I can then set up a cron job that runs sa-learn on those folders and deletes the mail. 
  
  
  Can I just create two public 
  samba shares, then use those for the emails and run s-learn on them ? I guess not because the emails by this stage are 
  wrecked by Outlook. How else can I do this 
  ?
  
  Also, I don’t understand exactly 
  the implications of which user you run sa-learn 
  under. How do I set this up when running sa-learn ? I 
  suppose if I run it as the same user as vpopmail then this will work ?
  
  Apologies if these questions have 
  already been covered in this mailing list or 
  elsewhere.
  
  Andrew.


Training sa-learn from Outlook.

2006-09-20 Thread Andrew van Tilburg








I imagine the following questions have been asked a lot,
but I havent seen the exact answers Im after yet so here goes.



We are running qmail, vpopmail, spamassassin, smb shares
using samba, among other things, on freebsd. I want to set up public ham and
spam folders such that our users can drag emails from Outlook. I can then set
up a cron job that runs sa-learn on those folders and deletes
the mail. 



Can I just create two public samba shares, then use those
for the emails and run s-learn on them ? I guess not because
the emails by this stage are wrecked by Outlook. How else can I do this ?



Also, I dont understand exactly the implications of
which user you run sa-learn under. How do I set this up
when running sa-learn ? I suppose if I run it as the
same user as vpopmail then this will work ?



Apologies if these questions have already been covered in
this mailing list or elsewhere.



Andrew.








Re: Training SA with Thunderbird Junk folder

2006-03-29 Thread martin
Edward Diener eddielee at tropicsoft.com writes:

deleted...
  
sth like this?
  
sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?
 
 That was what I was looking for. Thanks !
  and also pls take care of running user (-u) and database path (--dbpath), as
without running user parameter, sa-learn will find bayes_* files at current
login user home directory, .spamassassin/ folder and overwrite/create file by
this user.

beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
  ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
  folder file
 
 I have WinScp running, so I should be able to tell it to transform any 
 Windows line endings to Unix line endings since the server is Linux and 
 the client Windows.
 
 

  i just had another question, how to know the effect in blocking spam after
sa-learn run? For example, dump the bayes_* file can give any hint on increase
the accuracy?

   thx



Re: Training SA with Thunderbird Junk folder

2006-03-25 Thread Edward Diener

martin wrote:

Craig Morrison craigsa at 2cah.com writes:


JamesDR wrote:

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents 
of the Thunderbird Junk folder ?
Upload them as single messages to your ISP account. If you have a 
special folder in TB (Thunderbird) for the messages you want to train on 
you can find that folder file (in your TB user folder) and upload that. 
TB stores messages in mbox format which SA can parse.


I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 

These will help for the TB headers:

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2

Craig




  sth like this?

  sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?


That was what I was looking for. Thanks !



  beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
folder file


I have WinScp running, so I should be able to tell it to transform any 
Windows line endings to Unix line endings since the server is Linux and 
the client Windows.




Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Sander Holthaus
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Matt Kettler wrote:
 Forrest Aldrich wrote:
 Such a mechanism would still depend upon some organization on the
 server side... as far as I can tell, it's very much to the local
 sysadmin (ie: aliases to send to, forward or attach properly,
 etc). Would this even work well potentially?

 You don't need any of that in modern SA.

 Spamd allows clients to connect and perform a learn operation if
 you start it with the --allow-tell command. All you'd need to do
 is set up spamd that way and have the t-bird plugin speak the same
 protocol as spamc does.

 (possibly not suited to all environments, but if you trust your
 users..)


 Might be interesting if there were somehow a way to collect data
 on the client side (ie: thunderbird/windows or whichever
 platform) and have a mechanism to contribute that data to your
 account (or database entry, if it's MySQL backend), to your
 bayes.

 Like spamd --allow-tell ? :)

The problem with using that approach is that you can't authenticate
users. In small, closed, trusted environments it can be useful, but in
most situations, I don't think it will be usable. The nice thing about
using an IMAP-based sollution is that the user is authenticated
(provided you set it up correctly).

Kind Regards,
Sander Holthaus
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEJAP6Vf373DysOTURArOVAJ91dXwfG1puzqTP/qXhWk848Ca3JACggnea
qA2JXSGsngZmr3rsNHMJ8WQ=
=ZHDo
-END PGP SIGNATURE-



Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Michael Parker
Sander Holthaus wrote:
 The problem with using that approach is that you can't authenticate
 users. In small, closed, trusted environments it can be useful, but in
 most situations, I don't think it will be usable. The nice thing about
 using an IMAP-based sollution is that the user is authenticated
 (provided you set it up correctly).

Actually, there exists a plugin and a patch for a new plugin hook at
implements a password for spamd protocol transactions.  It never really
went anywhere but could probably be picked up and fixed up a bit if
there was enough interest.

Michael


Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Mike Pepe

mouss wrote:

Edward Diener a écrit :

Does anybody know the instructions for training SA with the contents of
the Thunderbird Junk folder ?

My web host, where SA is tunning, suggests I do this in order to reduce
the amount of spam I get, and I can login to my web host, transfer files
from my local machine to my web host, and run SA commands.



so the messages are accessible on your SA system? if so, then run
spamassassin or spamc with the right option.

what I would like to see is a plugin to J a message...


If your mail server and users are using IMAP, the Junk E-mail folder 
is on the server already.


I've got a script that runs from cron that will learn from that folder 
and then delete its contents several times a day.


looks like this:

#!/bin/bash

sa-learn --spam --mbox ./mail/Junk E-mail
rm ./mail/Junk E-mail
touch ./mail/Junk E-mail

you could probably adapt the concept to work system-wide, though I'm not 
sure how your hosting people would take to it.


-Mike


Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread mouss

Mike Pepe wrote:
If your mail server and users are using IMAP, the Junk E-mail folder 
is on the server already.


I've got a script that runs from cron that will learn from that folder 
and then delete its contents several times a day.




My issue is when spam is missed, I'd like to J it so it goes to the 
Junk folder. This way, the server script will pick it.


Unfortunately, if you don't enable TB adaptive filter, TB won't move 
the message to the Junk folder. This is a bug, but I don't know if it 
will ever be fixed (it dates back...). Now, I don't want the TB 
adaptive filter.


Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread mouss
Edward Diener a écrit :
 Does anybody know the instructions for training SA with the contents of
 the Thunderbird Junk folder ?
 
 My web host, where SA is tunning, suggests I do this in order to reduce
 the amount of spam I get, and I can login to my web host, transfer files
 from my local machine to my web host, and run SA commands.
 

so the messages are accessible on your SA system? if so, then run
spamassassin or spamc with the right option.

what I would like to see is a plugin to J a message...


Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Michael Parker
mouss wrote:
 
 what I would like to see is a plugin to J a message...
 

AOL

Me Too!

/AOL

If anyone is a Thunderbird plugin wizard and interested in doing a
plugin that will report/learn to spamd speak up, I'm very interested.

Michael



Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Forrest Aldrich
Such a mechanism would still depend upon some organization on the server 
side... as far as I can tell, it's very much to the local sysadmin (ie: 
aliases to send to, forward or attach properly, etc).  


Would this even work well potentially?

Might be interesting if there were somehow a way to collect data on the 
client side (ie: thunderbird/windows or whichever platform) and have a 
mechanism to contribute that data to your account (or database entry, if 
it's MySQL backend), to your bayes.


Just some ramblings. 




Michael Parker wrote:

mouss wrote:
  

what I would like to see is a plugin to J a message...




AOL

Me Too!

/AOL

If anyone is a Thunderbird plugin wizard and interested in doing a
plugin that will report/learn to spamd speak up, I'm very interested.

Michael

  


Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread mouss
Forrest Aldrich a écrit :
 Such a mechanism would still depend upon some organization on the server
 side... as far as I can tell, it's very much to the local sysadmin (ie:
 aliases to send to, forward or attach properly, etc). 
 Would this even work well potentially?

oh I'm not asking for that much.


currently, TB offers you to makr a message as junk (in which case it can
move it to a junk folder, or other). but it has two problems:

- this enables TB filter. which I don't want
- I see no keybinding (I'd like to just click J).

I don't know how to write TB plugins, but this shouldn't be that hard,
is it?

 
 Might be interesting if there were somehow a way to collect data on the
 client side (ie: thunderbird/windows or whichever platform) and have a
 mechanism to contribute that data to your account (or database entry, if
 it's MySQL backend), to your bayes.
 

that would be another thing. but for those using imap, just putting it
in a Junk folder is enough. for others, this is feasible, but more
elaborate.



Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Matt Kettler
Forrest Aldrich wrote:
 Such a mechanism would still depend upon some organization on the server
 side... as far as I can tell, it's very much to the local sysadmin (ie:
 aliases to send to, forward or attach properly, etc). 
 Would this even work well potentially?

You don't need any of that in modern SA.

Spamd allows clients to connect and perform a learn operation if you start it
with the --allow-tell command. All you'd need to do is set up spamd that way
and have the t-bird plugin speak the same protocol as spamc does.

(possibly not suited to all environments, but if you trust your users..)


 
 Might be interesting if there were somehow a way to collect data on the
 client side (ie: thunderbird/windows or whichever platform) and have a
 mechanism to contribute that data to your account (or database entry, if
 it's MySQL backend), to your bayes.

Like spamd --allow-tell ? :)


Training SA with Thunderbird Junk folder

2006-03-22 Thread Edward Diener
Does anybody know the instructions for training SA with the contents of 
the Thunderbird Junk folder ?


My web host, where SA is tunning, suggests I do this in order to reduce 
the amount of spam I get, and I can login to my web host, transfer files 
from my local machine to my web host, and run SA commands.




Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread JamesDR

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents of 
the Thunderbird Junk folder ?


My web host, where SA is tunning, suggests I do this in order to reduce 
the amount of spam I get, and I can login to my web host, transfer files 
from my local machine to my web host, and run SA commands.




I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 
redirecting to a mail box. I have a script (VBS) that runs on the mail 
server every night that takes the redirected mails, cleans the headers, 
and moves them over to the folder for the SA server to pick up from. A 
little later a bash script grabs the mail off the win mail server, runs 
through the files and learns them as spam (ham is done in bulk learns 
manually, but I could automate this as well.)
This same mailbox is my spam trap, so any other mails that end up there 
are also trained as spam. With my user base, they are good enough to 
police themselves, and I have just about all of our customers and 
vendors whitelisted.


--
Thanks,
James


Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread Craig Morrison

JamesDR wrote:

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents 
of the Thunderbird Junk folder ?


Upload them as single messages to your ISP account. If you have a 
special folder in TB (Thunderbird) for the messages you want to train on 
you can find that folder file (in your TB user folder) and upload that. 
TB stores messages in mbox format which SA can parse.


I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 


These will help for the TB headers:

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2

Craig


Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread Sander Holthaus
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Craig Morrison wrote:
 JamesDR wrote:
 Edward Diener wrote:
 Does anybody know the instructions for training SA with the
 contents of the Thunderbird Junk folder ?

 Upload them as single messages to your ISP account. If you have a
 special folder in TB (Thunderbird) for the messages you want to
 train on you can find that folder file (in your TB user folder) and
  upload that. TB stores messages in mbox format which SA can parse.


 I have my users use the redirect plugin to send spams to an
 account on the server just for this purpose. The redirect plugin
 will add a few headers (and so will your mail server) that need
 to be cleaned out first. If you just train on the junk folder
 file, you'll have to remove all of the thunderbird related stuff
 first -- this was more work than

 These will help for the TB headers:

 bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL
 bayes_ignore_header X-Mozilla-Status bayes_ignore_header
 X-Mozilla-Status2

 Craig

Optionally

X-WebMail
X-JunkFolder
X-Message-Status
X-SID-PRA
X-SID-Result
X-Message-Info

if you're using the webmail-extension and a few other extensions...

If you look back on the maillinglist, you should be able to find a
discussion on using IMAP-folders to train SA. Might be helpfull as well.

Kind Regards,
Sander Holthaus
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEIXFJVf373DysOTURAnX3AKCqCUoeQnBQLNBeKTJTWiq4qXY7OQCg63Rm
NK6LfxwlrzYtioTUi26rlu8=
=TOaz
-END PGP SIGNATURE-



Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread martin
Craig Morrison craigsa at 2cah.com writes:

 
 JamesDR wrote:
  Edward Diener wrote:
  Does anybody know the instructions for training SA with the contents 
  of the Thunderbird Junk folder ?
 
 Upload them as single messages to your ISP account. If you have a 
 special folder in TB (Thunderbird) for the messages you want to train on 
 you can find that folder file (in your TB user folder) and upload that. 
 TB stores messages in mbox format which SA can parse.
 
  I have my users use the redirect plugin to send spams to an account on 
  the server just for this purpose. The redirect plugin will add a few 
  headers (and so will your mail server) that need to be cleaned out 
  first. If you just train on the junk folder file, you'll have to remove 
  all of the thunderbird related stuff first -- this was more work than 
 
 These will help for the TB headers:
 
 bayes_ignore_header X-Account-Key
 bayes_ignore_header X-UIDL
 bayes_ignore_header X-Mozilla-Status
 bayes_ignore_header X-Mozilla-Status2
 
 Craig
 
 

  sth like this?

  sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?

  beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
folder file
  hope helpful.







Training SA with postfix

2004-12-31 Thread Jason Gauthier
Title: Training SA with postfix






Hey all,


 I've just spend a good amount of time installing postfix, amavis-new, clamAV and SA (with DCC, razor, pyzor) -- [All the latest versions]

I'm trying to figure out if there is anyway I get incorporate sa-learn to learn ham based on what my people send through the box. This is a relay only server, which from my reading, kind of complicates things. 

My end goal, if possible, is to have sa-learn train itself on ham whenever I send mail outbound.


Is this possible? If so, can someone help me with how it's done or point me to documentation?





Re: Training SA with postfix

2004-12-31 Thread Matt Kettler
At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote:
I'm trying to figure out if there is anyway I get incorporate sa-learn to 
learn ham based on what my people send through the box.   This is a relay 
only server, which from my reading, kind of complicates things.

My end goal, if possible, is to have sa-learn train itself on ham whenever 
I send mail outbound.

Is this possible?  If so, can someone help me with how it's done or point 
me to documentation?
One possible way of approximating this is to take some advantage of the 
autolearner...

Write yourself a negative scoring rule that looks at the Received: headers 
for signs of relay from the inside. For added security against forgery you 
could use a meta rule and also check other header fields (message ID, from, 
etc).

With a decently hefty negative scoring rule firing, the autolearner should 
try to learn most of the messages as ham.



RE: Training SA with postfix

2004-12-31 Thread Jason Gauthier
Thanks for the tip.  Due to my newbie-ness with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.

Would I build this rule within amavis or SA?

And of course, could you (or someone) point me to some documentation or
example?
I'm not sure where to even begin.

Thanks,

Jason

 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Friday, December 31, 2004 9:31 AM
 To: Jason Gauthier; users@spamassassin.apache.org
 Subject: Re: Training SA with postfix
 
 At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote:
 I'm trying to figure out if there is anyway I get 
 incorporate sa-learn to 
 learn ham based on what my people send through the box.   
 This is a relay 
 only server, which from my reading, kind of complicates things.
 
 My end goal, if possible, is to have sa-learn train itself on ham 
 whenever I send mail outbound.
 
 Is this possible?  If so, can someone help me with how it's done or 
 point me to documentation?
 
 One possible way of approximating this is to take some 
 advantage of the autolearner...
 
 Write yourself a negative scoring rule that looks at the 
 Received: headers for signs of relay from the inside. For 
 added security against forgery you could use a meta rule and 
 also check other header fields (message ID, from, etc).
 
 With a decently hefty negative scoring rule firing, the 
 autolearner should try to learn most of the messages as ham.
 
 


RE: Training SA with postfix

2004-12-31 Thread Matt Kettler
At 02:45 PM 12/31/2004, Jason Gauthier wrote:
Thanks for the tip.  Due to my newbie-ness with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.
Would I build this rule within amavis or SA?

I'd do the rule as a SA rule, since it's SA's autolearner you want to affect.

And of course, could you (or someone) point me to some documentation or
example?
http://wiki.apache.org/spamassassin/WritingRules
So for this header:
Received: from mattk-801-567.evi-inc.com (mattk-801-567.evitechnology.com 
[10.0.6.249])
by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id iBV0gIZP031926

Assuming my internal machines are 10.0.6.0/24, and all RDNS to 
evitechnology.com names, I might write:

header L_OUTBOUND_MAIL  Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com .{1,50} with ESMTP id/s
score L_OUTBOUND_MAIL   -1.0

Other, less specific variants:
header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s
score L_OUTBOUND_MAIL0  -1.0

Caution: these last two are easily forged:
header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\)/
score L_OUTBOUND_MAIL2  -1.0

header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/
score L_OUTBOUND_MAIL3  -1.0


RE: Training SA with postfix

2004-12-31 Thread Jason Gauthier
Great!

Using your example and the website I'm able to understand this much
better.
My idea is to start small and make sure it works.

So I simply added this:

header L_FROM Received =~ /server24/
score L_FROM -1.0

If the received line contains server24 then score it as -1.0.  I know
this is easy to fib, but like I said, it's just for testing :)

I go ahead and look at the headers and see the following:
Microsoft Mail Internet Headers Version 2.0
 
Received: from server24.ctg.com (unknown [192.168.50.11])
by spamfilter.lastar.com (Postfix) with ESMTP id 9EACAEFCC1
for [EMAIL PROTECTED]; Fri, 31 Dec 2004 16:09:23 -0500
(EST)

The originating server is server24, then it hits spamfilter.
As you can see server24 is contained in that string.

But looking below, I see spam_scan is scored as 0.28.

Dec 31 16:09:24 spamfilter amavis[8276]: (08276-02) spam_scan: hits=0.28
tests=ALL_TRUSTED,AWL,HTML_90_100,HTML_MESSAGE,HTML_SHORT_COMMENT 

I looked at the headers and I don't see the X-Spam-* headers at all, (I
set it to -999), so I'm not sure why amavisd-new didn't add the headers.


 -Original Message-
 From: Matt Kettler [mailto:[EMAIL PROTECTED] 
 Sent: Friday, December 31, 2004 3:07 PM
 To: users@spamassassin.apache.org
 Subject: RE: Training SA with postfix
 
 At 02:45 PM 12/31/2004, Jason Gauthier wrote:
 Thanks for the tip.  Due to my newbie-ness with these 
 products I'm a
 little uncertain were to start.  Amavis seems to build many 
 rules, and
 interface with SA where it actually has options in it.
 
 Would I build this rule within amavis or SA?
 
 
 I'd do the rule as a SA rule, since it's SA's autolearner you 
 want to affect.
 
 
 
 And of course, could you (or someone) point me to some 
 documentation or
 example?
 
 http://wiki.apache.org/spamassassin/WritingRules
 
 
 So for this header:
 
 Received: from mattk-801-567.evi-inc.com 
 (mattk-801-567.evitechnology.com 
 [10.0.6.249])
  by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id 
 iBV0gIZP031926
 
 Assuming my internal machines are 10.0.6.0/24, and all RDNS to 
 evitechnology.com names, I might write:
 
 header L_OUTBOUND_MAIL  Received =~ /from .{1,60}\.evitechnology.com 
 \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com 
 .{1,50} with ESMTP id/s
 score L_OUTBOUND_MAIL   -1.0
 
 Other, less specific variants:
 header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com 
 \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s
 score L_OUTBOUND_MAIL0  -1.0
 
 Caution: these last two are easily forged:
 
 header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com 
 \[10\.0\.6\.\d{1,3}\]\)/
 score L_OUTBOUND_MAIL2  -1.0
 
 header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/
 score L_OUTBOUND_MAIL3  -1.0
 
 


Re: Training SA with postfix

2004-12-31 Thread Sam Nilsson
Jason Gauthier wrote:
Thanks for the tip.  Due to my newbie-ness with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.
Read the docs at the amavisd-new site here:
  -- http://www.ijs.si/software/amavisd/
Amavis runs SA, but does not allow SA to rewrite the message. Amavis 
does the rewriting, quarantining, and ultimate scoring.

SA still looks to its own config file (typically named local.cf) to run 
and score all of its tests, it just doesn't get to rewrite the original 
message.

More info here:
  -- http://www.ijs.si/software/amavisd/

Would I build this rule within amavis or SA?
All SA rules go in SA config (ok, this may be too absolute, I just can't 
think of any at the moment ;-).

There are many ways to train this anti-spam software stack 
(amavis/sa/razor/pyzor/bayes/etc.). Amavisd can soft-blacklist, 
blacklist, and soft-whitelist based on *envelope senders*, while SA's 
black and whitelists work on message headers. SA also has the trainable 
bayes engine. It all depends on what kind of features, performance, 
flexibility, accuracy, etc. etc. etc. that you need.

- Sam Nilsson


Re: Training SA with postfix

2004-12-31 Thread Sam Nilsson
Sam Nilsson wrote:
SA still looks to its own config file (typically named local.cf) to run 
and score all of its tests, it just doesn't get to rewrite the original 
message.

More info here:
  -- http://www.ijs.si/software/amavisd/
Sorry! More info here:
  -- http://www.ijs.si/software/amavisd/#faq-spam
- Sam