Training SA

2009-06-08 Thread snowweb

Hi, I'm new to SA. I run an Exim/Dovecot CentOS 5.0 mailserver (VPS), on
which I have recently installed SA.

I have configured 'Autolearn = yes' but I have no way to know whether this
is working. Please can someone explain to me how this works, since my
understanding of this is as follows, and makes no sense!

SpamAssassin identifies a mail as spam and stores the details of it so that
it is easier to identify future emails which are similar. However, I fail to
understand how this will help, since it's already successfully identifying
those emails?

Furthermore, I can I train SpamAssassin to recognize emails that it is
currently giving only a very low score to, as spam? I'm getting many emails
each day about "Acai Berries" but SA they are only getting a score of around
3.3! How can I train it to recognize these, server wide?

Thanks.

pete
-- 
View this message in context: 
http://www.nabble.com/Training-SA-tp23921166p23921166.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



training SA

2007-06-26 Thread zigniew szalbot
Hello,

I tried to learn SA and used the following syntax:

sa-learn --spam -f /usr/home/zbyszek/june.txt
archive-iterator: unable to open  Dear Valued Customer,: No such file
or directory

june.txt is a spam email message downloaded from squirrelmail for the
purpose of feeding to SA. I only got "unable to open message". And at the
end:
Learned tokens from 0 message(s) (0 message(s) examined)

I guess I made a mistake with the syntax but how should I change it so
that I can train SA?

Thank you in advance!

Zbigniew Szalbot




Re: training SA

2007-06-26 Thread zigniew szalbot
Hi,

> I tried to learn SA and used the following syntax:
>
> sa-learn --spam -f /usr/home/zbyszek/june.txt
> I guess I made a mistake with the syntax but how should I change it so
> that I can train SA?

I already found out:
sa-learn --spam --no-sync /usr/home/zbyszek/june.txt

Sorry to have bothered!

Warm regards,

Zbigniew Szalbot



Re: training SA

2007-06-26 Thread Nigel Frankcom
On Wed, 27 Jun 2007 07:35:01 +0200 (CEST), "zigniew szalbot"
<[EMAIL PROTECTED]> wrote:

>Hello,
>
>I tried to learn SA and used the following syntax:
>
>sa-learn --spam -f /usr/home/zbyszek/june.txt
>archive-iterator: unable to open  Dear Valued Customer,: No such file
>or directory
>
>june.txt is a spam email message downloaded from squirrelmail for the
>purpose of feeding to SA. I only got "unable to open message". And at the
>end:
>Learned tokens from 0 message(s) (0 message(s) examined)
>
>I guess I made a mistake with the syntax but how should I change it so
>that I can train SA?
>


Hi,

Have you double checked the path for typos?

Also, you may well need the -u switch. I use:


>sa-learn --spam -u sauser /downloads/spam && mv -f /downloads/spam/*.Mail 
>/downloads/spam/fn

The last bit  && mv -f /downloads/spam/*.Mail /downloads/spam/fn is
just copying the file to a dir so I can track what's been trained and
is probably surplus to your requirements.

I have mine as a script so I just call ./ham or ./spam as required.

HTH

Nigel


RE: training SA

2007-06-27 Thread Bowie Bailey
zigniew szalbot wrote:
> Hi,
> 
> > I tried to learn SA and used the following syntax:
> > 
> > sa-learn --spam -f /usr/home/zbyszek/june.txt
> > I guess I made a mistake with the syntax but how should I change it
> > so that I can train SA?
> 
> I already found out:
> sa-learn --spam --no-sync /usr/home/zbyszek/june.txt

The important bit is that you leave off the '-f' since that specifies
that the directories to learn from are IN the file you specify.

The '--no-sync' can be useful, but remember that if you always learn
that way, you need to run 'sa-learn --sync' from time to time.

-- 
Bowie


About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
I have set up my spamassassin to traing individual database (mysql ) with
this filter in postfix

spamassassin unix - n   n   -   -   pipe
flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e
/usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

as this filter works it auto learn in the database to individual user it
gets learn!

But if i send the same email that was autolearned it does not get an higher
score..  it should be lik eit or it shoul get higher ?

and how do i know if the training is working ?

thanks!


[]'sf.rique


Training SA with postfix

2004-12-31 Thread Jason Gauthier
Title: Training SA with postfix






Hey all,


  I've just spend a good amount of time installing postfix, amavis-new, clamAV and  SA (with DCC, razor, pyzor) -- [All the "latest" versions]

I'm trying to figure out if there is anyway I get incorporate sa-learn to learn ham based on what my people send through the box.   This is a relay only server, which from my reading, kind of complicates things.  

My end goal, if possible, is to have sa-learn train itself on ham whenever I send mail "outbound".


Is this possible?  If so, can someone help me with how it's done or point me to documentation?





Training sa-learn from Outlook.

2006-09-20 Thread Andrew van Tilburg








I imagine the following questions have been asked a lot,
but I haven’t seen the exact answers I’m after yet so here goes.

 

We are running qmail, vpopmail, spamassassin, smb shares
using samba, among other things, on freebsd. I want to set up public ham and
spam folders such that our users can drag emails from Outlook. I can then set
up a cron job that runs sa-learn on those folders and deletes
the mail. 

 

Can I just create two public samba shares, then use those
for the emails and run s-learn on them ? I guess not because
the emails by this stage are wrecked by Outlook. How else can I do this ?

 

Also, I don’t understand exactly the implications of
which user you run sa-learn under. How do I set this up
when running sa-learn ? I suppose if I run it as the
same user as vpopmail then this will work ?

 

Apologies if these questions have already been covered in
this mailing list or elsewhere.

 

Andrew.








Re: About Training ( sa-learn )

2010-03-04 Thread Kai Schaetzl
Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300:

> But if i send the same email that was autolearned it does not get an higher
> score..  it should be lik eit or it shoul get higher ?

I if understand you correctly you want to learn a message twice. sa-learn 
won't do this. And the docs tell.

Kai

-- 
Get your web at Conactive Internet Services: http://www.conactive.com





Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Nops, i wnat that after i trained, the same email, should get a higher score
cause the spamassassin was trained that is a spam, so when it comes again ,
it should look in the database and add some extra point on the score right ?



[]'sf.rique


On Thu, Mar 4, 2010 at 1:31 PM, Kai Schaetzl wrote:

> Henrique Fernandes wrote on Thu, 4 Mar 2010 11:45:38 -0300:
>
> > But if i send the same email that was autolearned it does not get an
> higher
> > score..  it should be lik eit or it shoul get higher ?
>
> I if understand you correctly you want to learn a message twice. sa-learn
> won't do this. And the docs tell.
>
> Kai
>
> --
> Get your web at Conactive Internet Services: http://www.conactive.com
>
>
>
>


Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Henrique Fernandes wrote:
> Nops, i wnat that after i trained, the same email, should get a higher
> score cause the spamassassin was trained that is a spam, so when it
> comes again , it should look in the database and add some extra point
> on the score right ?

That is a fairly common misconception.  When you learn an email as spam,
the Bayes system breaks it into tokens (words/character strings) and
then makes a note that each of those tokens was seen in a spam.  When an
email comes in, it breaks up the new email into tokens and then checks
to see how frequently each of those tokens was previously seen in spam
or ham.  Based on what it finds, it ranks the email from BAYES_00 (very
unlikely to be spam) to BAYES_99 (almost certainly spam).

Since learning from a single email only adds one data point to each
token, it is unlikely to make a major difference on its own.  The value
comes in learning from lots of spam and ham.  This is why the Bayes
rules will not run until you have learned from at least 200 ham and 200
spam.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
(Please send replies to the list)

Henrique Fernandes wrote:
>
> On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey  > wrote:
>
> Henrique Fernandes wrote:
> > Nops, i wnat that after i trained, the same email, should get a
> higher
> > score cause the spamassassin was trained that is a spam, so when it
> > comes again , it should look in the database and add some extra
> point
> > on the score right ?
>
> That is a fairly common misconception.  When you learn an email as
> spam,
> the Bayes system breaks it into tokens (words/character strings) and
> then makes a note that each of those tokens was seen in a spam.
>  When an
> email comes in, it breaks up the new email into tokens and then checks
> to see how frequently each of those tokens was previously seen in spam
> or ham.  Based on what it finds, it ranks the email from BAYES_00
> (very
> unlikely to be spam) to BAYES_99 (almost certainly spam).
>
> Since learning from a single email only adds one data point to each
> token, it is unlikely to make a major difference on its own.  The
> value
> comes in learning from lots of spam and ham.  This is why the Bayes
> rules will not run until you have learned from at least 200 ham
> and 200
> spam.
>
>
> hmm
>
> Thanks, so ech individual user has to have learned lots of emails so
> after that they will start to have an difference on score ?

Yes. Each individual user will need to learn at least 200 ham and 200
spam (manually or via auto-learn) before Bayes will start scoring.  The
more they learn, the better the accuracy.

> So is better to just traing one database to all user instead one base
> for each user ?
>
> Making just one base i am afraid of getting to many false-positives.
> Cause sometimes Viagra is not spam for some one that researhc it, but
> if it is in the same base, it will be marked as spam...

Depends on your users.  Unless they are wildly different, a single
database should work fairly well.  Individual databases can be more
accurate in some instances, but a single well-trained database will
probably work better than a bunch of individual databases that are not
trained consistently.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Thanks!

I will discuss here and find out with one is better.

What are the weight of the bayser score after they well trained ? Have any
ideas about it ?

[]'sf.rique


On Thu, Mar 4, 2010 at 2:41 PM, Bowie Bailey  wrote:

> (Please send replies to the list)
>
> Henrique Fernandes wrote:
> >
> > On Thu, Mar 4, 2010 at 2:22 PM, Bowie Bailey  > > wrote:
> >
> > Henrique Fernandes wrote:
> > > Nops, i wnat that after i trained, the same email, should get a
> > higher
> > > score cause the spamassassin was trained that is a spam, so when it
> > > comes again , it should look in the database and add some extra
> > point
> > > on the score right ?
> >
> > That is a fairly common misconception.  When you learn an email as
> > spam,
> > the Bayes system breaks it into tokens (words/character strings) and
> > then makes a note that each of those tokens was seen in a spam.
> >  When an
> > email comes in, it breaks up the new email into tokens and then
> checks
> > to see how frequently each of those tokens was previously seen in
> spam
> > or ham.  Based on what it finds, it ranks the email from BAYES_00
> > (very
> > unlikely to be spam) to BAYES_99 (almost certainly spam).
> >
> > Since learning from a single email only adds one data point to each
> > token, it is unlikely to make a major difference on its own.  The
> > value
> > comes in learning from lots of spam and ham.  This is why the Bayes
> > rules will not run until you have learned from at least 200 ham
> > and 200
> > spam.
> >
> >
> > hmm
> >
> > Thanks, so ech individual user has to have learned lots of emails so
> > after that they will start to have an difference on score ?
>
> Yes. Each individual user will need to learn at least 200 ham and 200
> spam (manually or via auto-learn) before Bayes will start scoring.  The
> more they learn, the better the accuracy.
>
> > So is better to just traing one database to all user instead one base
> > for each user ?
> >
> > Making just one base i am afraid of getting to many false-positives.
> > Cause sometimes Viagra is not spam for some one that researhc it, but
> > if it is in the same base, it will be marked as spam...
>
> Depends on your users.  Unless they are wildly different, a single
> database should work fairly well.  Individual databases can be more
> accurate in some instances, but a single well-trained database will
> probably work better than a bunch of individual databases that are not
> trained consistently.
>
> --
> Bowie
>


Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Henrique Fernandes wrote:
> Thanks!
>
> I will discuss here and find out with one is better.
>
> What are the weight of the bayser score after they well trained ? Have
> any ideas about it ?

I'm not sure what you are asking.  What do you mean by "weight"?

The default scores (as of 3.2.5) are:

BAYES_00-2.599
BAYES_05-1.110
BAYES_20-0.740
BAYES_40-0.185
BAYES_500.001
BAYES_601.0
BAYES_802.0
BAYES_953.0
BAYES_993.5

Take a look at
/var/lib/spamassassin//updates_spamassassin_org/50_scores.cf to
see the scores on your system.

-- 
Bowie


Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
It was wht i asked, sorry i am not fluent in english

It is the score that the bayes add to the final scores right ?


[]'sf.rique


On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey  wrote:

> Henrique Fernandes wrote:
> > Thanks!
> >
> > I will discuss here and find out with one is better.
> >
> > What are the weight of the bayser score after they well trained ? Have
> > any ideas about it ?
>
> I'm not sure what you are asking.  What do you mean by "weight"?
>
> The default scores (as of 3.2.5) are:
>
> BAYES_00-2.599
> BAYES_05-1.110
> BAYES_20-0.740
> BAYES_40-0.185
> BAYES_500.001
> BAYES_601.0
> BAYES_802.0
> BAYES_953.0
> BAYES_993.5
>
> Take a look at
> /var/lib/spamassassin//updates_spamassassin_org/50_scores.cf to
> see the scores on your system.
>
> --
> Bowie
>


Re: About Training ( sa-learn )

2010-03-04 Thread Bowie Bailey
Right.

Henrique Fernandes wrote:
> It was wht i asked, sorry i am not fluent in english
>
> It is the score that the bayes add to the final scores right ?
>
>
> []'sf.rique
>
>
> On Thu, Mar 4, 2010 at 4:36 PM, Bowie Bailey  > wrote:
>
> Henrique Fernandes wrote:
> > Thanks!
> >
> > I will discuss here and find out with one is better.
> >
> > What are the weight of the bayser score after they well trained
> ? Have
> > any ideas about it ?
>
> I'm not sure what you are asking.  What do you mean by "weight"?
>
> The default scores (as of 3.2.5) are:
>
> BAYES_00-2.599
> BAYES_05-1.110
> BAYES_20-0.740
> BAYES_40-0.185
> BAYES_500.001
> BAYES_601.0
> BAYES_802.0
> BAYES_953.0
> BAYES_993.5
>
> Take a look at
> /var/lib/spamassassin//updates_spamassassin_org/50_scores.cf
>  to
> see the scores on your system.
>
> --
> Bowie
>
>


Re: About Training ( sa-learn )

2010-03-04 Thread LuKreme
On 4-Mar-2010, at 07:45, Henrique Fernandes wrote:
> 
> I have set up my spamassassin to traing individual database (mysql ) with
> this filter in postfix
> 
> spamassassin unix - n   n   -   -   pipe
>flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f -e
>/usr/sbin/sendmail -oi -f ${sender} -- ${recipient}

Wait, what exactly is this doing?


-- 
Windle shook his head sadly. Five exclamation marks, the sure sign of an insane 
mind. --Reaper Man



Re: About Training ( sa-learn )

2010-03-04 Thread Henrique Fernandes
Every email that comes in postfix i send to that filter, and this filter
send the email.  When i use the  with the option -u ${recipient}  it
override the user that is runing and do the process with the user that is
reciving the email, when it autolearn it goes to a diferent user in the
table. So i have diferent databases for each user.

And after go through the spamc filter it repass the email.

good enough ?

[]'sf.rique


On Thu, Mar 4, 2010 at 9:54 PM, LuKreme  wrote:

> On 4-Mar-2010, at 07:45, Henrique Fernandes wrote:
> >
> > I have set up my spamassassin to traing individual database (mysql ) with
> > this filter in postfix
> >
> > spamassassin unix - n   n   -   -   pipe
> >flags=Rq user=spamassassin argv=/usr/bin/spamc -u ${recipient} -f
> -e
> >/usr/sbin/sendmail -oi -f ${sender} -- ${recipient}
>
> Wait, what exactly is this doing?
>
>
> --
> Windle shook his head sadly. Five exclamation marks, the sure sign of an
> insane mind. --Reaper Man
>
>


Re: Training SA with postfix

2004-12-31 Thread Matt Kettler
At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote:
I'm trying to figure out if there is anyway I get incorporate sa-learn to 
learn ham based on what my people send through the box.   This is a relay 
only server, which from my reading, kind of complicates things.

My end goal, if possible, is to have sa-learn train itself on ham whenever 
I send mail "outbound".

Is this possible?  If so, can someone help me with how it's done or point 
me to documentation?
One possible way of approximating this is to take some advantage of the 
autolearner...

Write yourself a negative scoring rule that looks at the Received: headers 
for signs of relay from the inside. For added security against forgery you 
could use a meta rule and also check other header fields (message ID, from, 
etc).

With a decently hefty negative scoring rule firing, the autolearner should 
try to learn most of the messages as ham.



RE: Training SA with postfix

2004-12-31 Thread Jason Gauthier
Thanks for the tip.  Due to my "newbie-ness" with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.

Would I build this rule within amavis or SA?

And of course, could you (or someone) point me to some documentation or
example?
I'm not sure where to even begin.

Thanks,

Jason

> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Friday, December 31, 2004 9:31 AM
> To: Jason Gauthier; users@spamassassin.apache.org
> Subject: Re: Training SA with postfix
> 
> At 09:10 AM 12/31/2004 -0500, Jason Gauthier wrote:
> >I'm trying to figure out if there is anyway I get 
> incorporate sa-learn to 
> >learn ham based on what my people send through the box.   
> This is a relay 
> >only server, which from my reading, kind of complicates things.
> >
> >My end goal, if possible, is to have sa-learn train itself on ham 
> >whenever I send mail "outbound".
> >
> >Is this possible?  If so, can someone help me with how it's done or 
> >point me to documentation?
> 
> One possible way of approximating this is to take some 
> advantage of the autolearner...
> 
> Write yourself a negative scoring rule that looks at the 
> Received: headers for signs of relay from the inside. For 
> added security against forgery you could use a meta rule and 
> also check other header fields (message ID, from, etc).
> 
> With a decently hefty negative scoring rule firing, the 
> autolearner should try to learn most of the messages as ham.
> 
> 


RE: Training SA with postfix

2004-12-31 Thread Matt Kettler
At 02:45 PM 12/31/2004, Jason Gauthier wrote:
Thanks for the tip.  Due to my "newbie-ness" with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.
Would I build this rule within amavis or SA?

I'd do the rule as a SA rule, since it's SA's autolearner you want to affect.

And of course, could you (or someone) point me to some documentation or
example?
http://wiki.apache.org/spamassassin/WritingRules
So for this header:
Received: from mattk-801-567.evi-inc.com (mattk-801-567.evitechnology.com 
[10.0.6.249])
by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id iBV0gIZP031926

Assuming my "internal machines" are 10.0.6.0/24, and all RDNS to 
evitechnology.com names, I might write:

header L_OUTBOUND_MAIL  Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com .{1,50} with ESMTP id/s
score L_OUTBOUND_MAIL   -1.0

Other, less specific variants:
header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s
score L_OUTBOUND_MAIL0  -1.0

Caution: these last two are easily forged:
header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com 
\[10\.0\.6\.\d{1,3}\]\)/
score L_OUTBOUND_MAIL2  -1.0

header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/
score L_OUTBOUND_MAIL3  -1.0


RE: Training SA with postfix

2004-12-31 Thread Jason Gauthier
Great!

Using your example and the website I'm able to understand this much
better.
My idea is to start small and make sure it works.

So I simply added this:

header L_FROM Received =~ /server24/
score L_FROM -1.0

If the received line contains server24 then score it as -1.0.  I know
this is easy to fib, but like I said, it's just for testing :)

I go ahead and look at the headers and see the following:
Microsoft Mail Internet Headers Version 2.0
 
Received: from server24.ctg.com (unknown [192.168.50.11])
by spamfilter.lastar.com (Postfix) with ESMTP id 9EACAEFCC1
for <[EMAIL PROTECTED]>; Fri, 31 Dec 2004 16:09:23 -0500
(EST)

The originating server is server24, then it hits "spamfilter".
As you can see "server24" is contained in that string.

But looking below, I see spam_scan is scored as 0.28.

Dec 31 16:09:24 spamfilter amavis[8276]: (08276-02) spam_scan: hits=0.28
tests=ALL_TRUSTED,AWL,HTML_90_100,HTML_MESSAGE,HTML_SHORT_COMMENT 

I looked at the headers and I don't see the X-Spam-* headers at all, (I
set it to -999), so I'm not sure why amavisd-new didn't add the headers.


> -Original Message-
> From: Matt Kettler [mailto:[EMAIL PROTECTED] 
> Sent: Friday, December 31, 2004 3:07 PM
> To: users@spamassassin.apache.org
> Subject: RE: Training SA with postfix
> 
> At 02:45 PM 12/31/2004, Jason Gauthier wrote:
> >Thanks for the tip.  Due to my "newbie-ness" with these 
> products I'm a
> >little uncertain were to start.  Amavis seems to build many 
> rules, and
> >interface with SA where it actually has options in it.
> >
> >Would I build this rule within amavis or SA?
> 
> 
> I'd do the rule as a SA rule, since it's SA's autolearner you 
> want to affect.
> 
> 
> 
> >And of course, could you (or someone) point me to some 
> documentation or
> >example?
> 
> http://wiki.apache.org/spamassassin/WritingRules
> 
> 
> So for this header:
> 
> Received: from mattk-801-567.evi-inc.com 
> (mattk-801-567.evitechnology.com 
> [10.0.6.249])
>  by xanadu.evi-inc.com (8.12.8/8.12.8) with ESMTP id 
> iBV0gIZP031926
> 
> Assuming my "internal machines" are 10.0.6.0/24, and all RDNS to 
> evitechnology.com names, I might write:
> 
> header L_OUTBOUND_MAIL  Received =~ /from .{1,60}\.evitechnology.com 
> \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com 
> .{1,50} with ESMTP id/s
> score L_OUTBOUND_MAIL   -1.0
> 
> Other, less specific variants:
> header L_OUTBOUND_MAIL0 Received =~ /from .{1,60}\.evitechnology.com 
> \[10\.0\.6\.\d{1,3}\]\).{0,10}by xanadu\.evi\-inc\.com/s
> score L_OUTBOUND_MAIL0  -1.0
> 
> Caution: these last two are easily forged:
> 
> header L_OUTBOUND_MAIL2 Received =~ /from .{1,60}\.evitechnology.com 
> \[10\.0\.6\.\d{1,3}\]\)/
> score L_OUTBOUND_MAIL2  -1.0
> 
> header L_OUTBOUND_MAIL3 Received =~ /from .{1,60}\.evitechnology.com/
> score L_OUTBOUND_MAIL3  -1.0
> 
> 


Re: Training SA with postfix

2004-12-31 Thread Sam Nilsson
Jason Gauthier wrote:
Thanks for the tip.  Due to my "newbie-ness" with these products I'm a
little uncertain were to start.  Amavis seems to build many rules, and
interface with SA where it actually has options in it.
Read the docs at the amavisd-new site here:
  -- http://www.ijs.si/software/amavisd/
Amavis runs SA, but does not allow SA to rewrite the message. Amavis 
does the rewriting, quarantining, and ultimate scoring.

SA still looks to its own config file (typically named local.cf) to run 
and score all of its tests, it just doesn't get to rewrite the original 
message.

More info here:
  -- http://www.ijs.si/software/amavisd/

Would I build this rule within amavis or SA?
All SA rules go in SA config (ok, this may be too absolute, I just can't 
think of any at the moment ;-).

There are many ways to train this anti-spam software stack 
(amavis/sa/razor/pyzor/bayes/etc.). Amavisd can soft-blacklist, 
blacklist, and soft-whitelist based on *envelope senders*, while SA's 
black and whitelists work on message headers. SA also has the trainable 
bayes engine. It all depends on what kind of features, performance, 
flexibility, accuracy, etc. etc. etc. that you need.

- Sam Nilsson


Re: Training SA with postfix

2004-12-31 Thread Sam Nilsson
Sam Nilsson wrote:
SA still looks to its own config file (typically named local.cf) to run 
and score all of its tests, it just doesn't get to rewrite the original 
message.

More info here:
  -- http://www.ijs.si/software/amavisd/
Sorry! More info here:
  -- http://www.ijs.si/software/amavisd/#faq-spam
- Sam


RE: Training SA with postfix

2004-12-31 Thread Jason Gauthier
Yup, I'm well aware that amavisd-new doesn't allow SA to change the
header.  I am mostly looking to detect certain info in the header and
score from it. 

-Original Message-
From: Sam Nilsson [mailto:[EMAIL PROTECTED] 
Sent: Friday, December 31, 2004 5:47 PM
To: users@spamassassin.apache.org
Cc: Jason Gauthier
Subject: Re: Training SA with postfix

Jason Gauthier wrote:
> Thanks for the tip.  Due to my "newbie-ness" with these products I'm a

> little uncertain were to start.  Amavis seems to build many rules, and

> interface with SA where it actually has options in it.

Read the docs at the amavisd-new site here:
   -- http://www.ijs.si/software/amavisd/

Amavis runs SA, but does not allow SA to rewrite the message. Amavis
does the rewriting, quarantining, and ultimate scoring.

SA still looks to its own config file (typically named local.cf) to run
and score all of its tests, it just doesn't get to rewrite the original
message.

More info here:
   -- http://www.ijs.si/software/amavisd/


> Would I build this rule within amavis or SA?

All SA rules go in SA config (ok, this may be too absolute, I just can't
think of any at the moment ;-).

There are many ways to train this anti-spam software stack
(amavis/sa/razor/pyzor/bayes/etc.). Amavisd can soft-blacklist,
blacklist, and soft-whitelist based on *envelope senders*, while SA's
black and whitelists work on message headers. SA also has the trainable
bayes engine. It all depends on what kind of features, performance,
flexibility, accuracy, etc. etc. etc. that you need.

- Sam Nilsson


Re: Training sa-learn from Outlook.

2006-09-20 Thread Loren Wilton



This sort of question gets asked a lot and there are various answers.
 
The most common solution is to set up some public folders that are really 
IMAP folders, probably on your main mail machine, but that doesn't really matter 
much.  Then as you suggest, run a cron job to pull the mail from them and 
do the learning.
 
If you look in the wiki I believe there is a page or two devoted to this 
sort of thing with Outlook or OE.
 
Do you have individual bayes databases or site-wide?  If you have 
individual bayes databases then they would most likely each be under a usercode 
for the individual owner.  In that case having global spam and ham folders 
won't work all that well, since you would have to learn the whole mess many 
times, once into each bayes database.  It would make more sense to have 
per-user ham and spam folders, which could still use the IMAP solution.
 
I assume that you have a global bayes database.  In that case you 
should run sa-learn under whichever usercode SA is running under when it 
accesses that database.
 
        Loren

  - Original Message - 
  From: 
  Andrew 
  van Tilburg 
  To: users@spamassassin.apache.org 
  
  Sent: Wednesday, September 20, 2006 10:37 
  PM
  Subject: Training sa-learn from 
  Outlook.
  
  
  I imagine the following questions 
  have been asked a lot, but I haven’t seen the exact answers I’m after yet so 
  here goes.
   
  We are running qmail, vpopmail, 
  spamassassin, smb shares using samba, among other things, on freebsd. I want 
  to set up public ham and spam folders such that our users can drag emails from 
  Outlook. I can then set up a cron job that runs sa-learn on those folders and deletes the mail. 
  
   
  Can I just create two public 
  samba shares, then use those for the emails and run s-learn on them ? I guess not because the emails by this stage are 
  wrecked by Outlook. How else can I do this 
  ?
   
  Also, I don’t understand exactly 
  the implications of which user you run sa-learn 
  under. How do I set this up when running sa-learn ? I 
  suppose if I run it as the same user as vpopmail then this will work ?
   
  Apologies if these questions have 
  already been covered in this mailing list or 
  elsewhere.
   
  Andrew.


Training SA with Thunderbird Junk folder

2006-03-22 Thread Edward Diener
Does anybody know the instructions for training SA with the contents of 
the Thunderbird Junk folder ?


My web host, where SA is tunning, suggests I do this in order to reduce 
the amount of spam I get, and I can login to my web host, transfer files 
from my local machine to my web host, and run SA commands.




Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread JamesDR

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents of 
the Thunderbird Junk folder ?


My web host, where SA is tunning, suggests I do this in order to reduce 
the amount of spam I get, and I can login to my web host, transfer files 
from my local machine to my web host, and run SA commands.




I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 
redirecting to a mail box. I have a script (VBS) that runs on the mail 
server every night that takes the redirected mails, cleans the headers, 
and moves them over to the folder for the SA server to pick up from. A 
little later a bash script grabs the mail off the win mail server, runs 
through the files and learns them as spam (ham is done in bulk learns 
manually, but I could automate this as well.)
This same mailbox is my spam trap, so any other mails that end up there 
are also trained as spam. With my user base, they are good enough to 
police themselves, and I have just about all of our customers and 
vendors whitelisted.


--
Thanks,
James


Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread Craig Morrison

JamesDR wrote:

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents 
of the Thunderbird Junk folder ?


Upload them as single messages to your ISP account. If you have a 
special folder in TB (Thunderbird) for the messages you want to train on 
you can find that folder file (in your TB user folder) and upload that. 
TB stores messages in mbox format which SA can parse.


I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 


These will help for the TB headers:

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2

Craig


Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread Sander Holthaus
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Craig Morrison wrote:
> JamesDR wrote:
>> Edward Diener wrote:
>>> Does anybody know the instructions for training SA with the
>>> contents of the Thunderbird Junk folder ?
>
> Upload them as single messages to your ISP account. If you have a
> special folder in TB (Thunderbird) for the messages you want to
> train on you can find that folder file (in your TB user folder) and
>  upload that. TB stores messages in mbox format which SA can parse.
>
>
>> I have my users use the redirect plugin to send spams to an
>> account on the server just for this purpose. The redirect plugin
>> will add a few headers (and so will your mail server) that need
>> to be cleaned out first. If you just train on the junk folder
>> file, you'll have to remove all of the thunderbird related stuff
>> first -- this was more work than
>
> These will help for the TB headers:
>
> bayes_ignore_header X-Account-Key bayes_ignore_header X-UIDL
> bayes_ignore_header X-Mozilla-Status bayes_ignore_header
> X-Mozilla-Status2
>
> Craig
>
Optionally

X-WebMail
X-JunkFolder
X-Message-Status
X-SID-PRA
X-SID-Result
X-Message-Info

if you're using the webmail-extension and a few other extensions...

If you look back on the maillinglist, you should be able to find a
discussion on using IMAP-folders to train SA. Might be helpfull as well.

Kind Regards,
Sander Holthaus
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEIXFJVf373DysOTURAnX3AKCqCUoeQnBQLNBeKTJTWiq4qXY7OQCg63Rm
NK6LfxwlrzYtioTUi26rlu8=
=TOaz
-END PGP SIGNATURE-



Re: Training SA with Thunderbird Junk folder

2006-03-22 Thread martin
Craig Morrison  2cah.com> writes:

> 
> JamesDR wrote:
> > Edward Diener wrote:
> >> Does anybody know the instructions for training SA with the contents 
> >> of the Thunderbird Junk folder ?
> 
> Upload them as single messages to your ISP account. If you have a 
> special folder in TB (Thunderbird) for the messages you want to train on 
> you can find that folder file (in your TB user folder) and upload that. 
> TB stores messages in mbox format which SA can parse.
> 
> > I have my users use the redirect plugin to send spams to an account on 
> > the server just for this purpose. The redirect plugin will add a few 
> > headers (and so will your mail server) that need to be cleaned out 
> > first. If you just train on the junk folder file, you'll have to remove 
> > all of the thunderbird related stuff first -- this was more work than 
> 
> These will help for the TB headers:
> 
> bayes_ignore_header X-Account-Key
> bayes_ignore_header X-UIDL
> bayes_ignore_header X-Mozilla-Status
> bayes_ignore_header X-Mozilla-Status2
> 
> Craig
> 
> 

  sth like this?

  sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?

  beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
folder file
  hope helpful.







Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread mouss
Edward Diener a écrit :
> Does anybody know the instructions for training SA with the contents of
> the Thunderbird Junk folder ?
> 
> My web host, where SA is tunning, suggests I do this in order to reduce
> the amount of spam I get, and I can login to my web host, transfer files
> from my local machine to my web host, and run SA commands.
> 

so the messages are accessible on your SA system? if so, then run
spamassassin or spamc with the right option.

what I would like to see is a plugin to "J" a message...


Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Michael Parker
mouss wrote:
> 
> what I would like to see is a plugin to "J" a message...
> 



Me Too!



If anyone is a Thunderbird plugin wizard and interested in doing a
plugin that will report/learn to spamd speak up, I'm very interested.

Michael



Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Forrest Aldrich
Such a mechanism would still depend upon some organization on the server 
side... as far as I can tell, it's very much to the local sysadmin (ie: 
aliases to send to, forward or attach properly, etc).  


Would this even work well potentially?

Might be interesting if there were somehow a way to collect data on the 
client side (ie: thunderbird/windows or whichever platform) and have a 
mechanism to contribute that data to your account (or database entry, if 
it's MySQL backend), to your bayes.


Just some ramblings. 




Michael Parker wrote:

mouss wrote:
  

what I would like to see is a plugin to "J" a message...






Me Too!



If anyone is a Thunderbird plugin wizard and interested in doing a
plugin that will report/learn to spamd speak up, I'm very interested.

Michael

  


Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread mouss
Forrest Aldrich a écrit :
> Such a mechanism would still depend upon some organization on the server
> side... as far as I can tell, it's very much to the local sysadmin (ie:
> aliases to send to, forward or attach properly, etc). 
> Would this even work well potentially?

oh I'm not asking for that much.


currently, TB offers you to makr a message as junk (in which case it can
move it to a junk folder, or other). but it has two problems:

- this enables TB filter. which I don't want
- I see no keybinding (I'd like to just click "J").

I don't know how to write TB plugins, but this shouldn't be that hard,
is it?

> 
> Might be interesting if there were somehow a way to collect data on the
> client side (ie: thunderbird/windows or whichever platform) and have a
> mechanism to contribute that data to your account (or database entry, if
> it's MySQL backend), to your bayes.
> 

that would be another thing. but for those using imap, just putting it
in a Junk folder is enough. for others, this is feasible, but more
elaborate.



Re: Training SA with Thunderbird Junk folder

2006-03-23 Thread Matt Kettler
Forrest Aldrich wrote:
> Such a mechanism would still depend upon some organization on the server
> side... as far as I can tell, it's very much to the local sysadmin (ie:
> aliases to send to, forward or attach properly, etc). 
> Would this even work well potentially?

You don't need any of that in modern SA.

Spamd allows clients to connect and perform a learn operation if you start it
with the "--allow-tell" command. All you'd need to do is set up spamd that way
and have the t-bird plugin speak the same protocol as spamc does.

(possibly not suited to all environments, but if you trust your users..)


> 
> Might be interesting if there were somehow a way to collect data on the
> client side (ie: thunderbird/windows or whichever platform) and have a
> mechanism to contribute that data to your account (or database entry, if
> it's MySQL backend), to your bayes.

Like spamd --allow-tell ? :)


Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Sander Holthaus
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
 
Matt Kettler wrote:
> Forrest Aldrich wrote:
>> Such a mechanism would still depend upon some organization on the
>> server side... as far as I can tell, it's very much to the local
>> sysadmin (ie: aliases to send to, forward or attach properly,
>> etc). Would this even work well potentially?
>
> You don't need any of that in modern SA.
>
> Spamd allows clients to connect and perform a learn operation if
> you start it with the "--allow-tell" command. All you'd need to do
> is set up spamd that way and have the t-bird plugin speak the same
> protocol as spamc does.
>
> (possibly not suited to all environments, but if you trust your
> users..)
>
>
>> Might be interesting if there were somehow a way to collect data
>> on the client side (ie: thunderbird/windows or whichever
>> platform) and have a mechanism to contribute that data to your
>> account (or database entry, if it's MySQL backend), to your
>> bayes.
>
> Like spamd --allow-tell ? :)
>
The problem with using that approach is that you can't authenticate
users. In small, closed, trusted environments it can be useful, but in
most situations, I don't think it will be usable. The nice thing about
using an IMAP-based sollution is that the user is authenticated
(provided you set it up correctly).

Kind Regards,
Sander Holthaus
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (MingW32)
 
iD8DBQFEJAP6Vf373DysOTURArOVAJ91dXwfG1puzqTP/qXhWk848Ca3JACggnea
qA2JXSGsngZmr3rsNHMJ8WQ=
=ZHDo
-END PGP SIGNATURE-



Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Michael Parker
Sander Holthaus wrote:
> The problem with using that approach is that you can't authenticate
> users. In small, closed, trusted environments it can be useful, but in
> most situations, I don't think it will be usable. The nice thing about
> using an IMAP-based sollution is that the user is authenticated
> (provided you set it up correctly).

Actually, there exists a plugin and a patch for a new plugin hook at
implements a password for spamd protocol transactions.  It never really
went anywhere but could probably be picked up and fixed up a bit if
there was enough interest.

Michael


Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread Mike Pepe

mouss wrote:

Edward Diener a écrit :

Does anybody know the instructions for training SA with the contents of
the Thunderbird Junk folder ?

My web host, where SA is tunning, suggests I do this in order to reduce
the amount of spam I get, and I can login to my web host, transfer files
from my local machine to my web host, and run SA commands.



so the messages are accessible on your SA system? if so, then run
spamassassin or spamc with the right option.

what I would like to see is a plugin to "J" a message...


If your mail server and users are using IMAP, the "Junk E-mail" folder 
is on the server already.


I've got a script that runs from cron that will learn from that folder 
and then delete its contents several times a day.


looks like this:

#!/bin/bash

sa-learn --spam --mbox "./mail/Junk E-mail"
rm "./mail/Junk E-mail"
touch "./mail/Junk E-mail"

you could probably adapt the concept to work system-wide, though I'm not 
sure how your hosting people would take to it.


-Mike


Re: Training SA with Thunderbird Junk folder

2006-03-24 Thread mouss

Mike Pepe wrote:
If your mail server and users are using IMAP, the "Junk E-mail" folder 
is on the server already.


I've got a script that runs from cron that will learn from that folder 
and then delete its contents several times a day.




My issue is when spam is missed, I'd like to "J" it so it goes to the 
Junk folder. This way, the server script will pick it.


Unfortunately, if you don't enable TB "adaptive filter", TB won't move 
the message to the Junk folder. This is a bug, but I don't know if it 
will ever be fixed (it dates back...). Now, I don't want the TB 
"adaptive filter".


Re: Training SA with Thunderbird Junk folder

2006-03-25 Thread Edward Diener

martin wrote:

Craig Morrison  2cah.com> writes:


JamesDR wrote:

Edward Diener wrote:
Does anybody know the instructions for training SA with the contents 
of the Thunderbird Junk folder ?
Upload them as single messages to your ISP account. If you have a 
special folder in TB (Thunderbird) for the messages you want to train on 
you can find that folder file (in your TB user folder) and upload that. 
TB stores messages in mbox format which SA can parse.


I have my users use the redirect plugin to send spams to an account on 
the server just for this purpose. The redirect plugin will add a few 
headers (and so will your mail server) that need to be cleaned out 
first. If you just train on the junk folder file, you'll have to remove 
all of the thunderbird related stuff first -- this was more work than 

These will help for the TB headers:

bayes_ignore_header X-Account-Key
bayes_ignore_header X-UIDL
bayes_ignore_header X-Mozilla-Status
bayes_ignore_header X-Mozilla-Status2

Craig




  sth like this?

  sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?


That was what I was looking for. Thanks !



  beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
folder file


I have WinScp running, so I should be able to tell it to transform any 
Windows line endings to Unix line endings since the server is Linux and 
the client Windows.




Re: Training SA with Thunderbird Junk folder

2006-03-29 Thread martin
Edward Diener  tropicsoft.com> writes:

deleted...
> > 
> >   sth like this?
> > 
> >   sa-learn --mbox --spam --showdots Thunderbird_Junk_folder?
> 
> That was what I was looking for. Thanks !
  and also pls take care of running user (-u) and database path (--dbpath), as
without running user parameter, sa-learn will find bayes_* files at current
login user home directory, .spamassassin/ folder and overwrite/create file by
this user.

> >   beware of dos/unix format after uploaded Junk folder file, as at FreeBSD,
> > ascii upload seem no problem, but FC3 need to run dos2unix to reformat the
> > folder file
> 
> I have WinScp running, so I should be able to tell it to transform any 
> Windows line endings to Unix line endings since the server is Linux and 
> the client Windows.
> 
> 

  i just had another question, how to know the effect in blocking spam after
sa-learn run? For example, dump the bayes_* file can give any hint on increase
the accuracy?

   thx



Training SA-Migrating from old IMAP to new IMAP server

2007-03-11 Thread Don Ireland
I'm my email over from the services of fusemail.com to the IMAP server that 
comes with my shared hosting account.

When I copy my messages over from the old server, do I just run SA-learn 
against the messages as they are?  Or will the fact that they have fusemail 
headers in them cause SA to think messages without fusemail headers are spam?

I've always deleted spam after training the filters so I don't have any to feed 
to to the new system.  Will that be a problem?

Don Ireland



Re: Training SA-Migrating from old IMAP to new IMAP server

2007-03-12 Thread Magnus Holmgren
On Sunday 11 March 2007 18:09, Don Ireland wrote:
> I'm my email over from the services of fusemail.com to the IMAP server that
> comes with my shared hosting account.
>
> When I copy my messages over from the old server, do I just run SA-learn
> against the messages as they are?  Or will the fact that they have fusemail
> headers in them cause SA to think messages without fusemail headers are
> spam?

If so, you can make bayes ignore those headers with bayes_ignore_header in 
local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage.

> I've always deleted spam after training the filters so I don't have any to
> feed to to the new system.  Will that be a problem?

Having too great an imbalance in numbers between ham and spam will bias the 
bayes classifier towards "everything is spam" or in this case "everything is 
ham".

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)


pgpaKX1rPnVSG.pgp
Description: PGP signature


Re: Training SA-Migrating from old IMAP to new IMAP server

2007-03-12 Thread Don Ireland
So it sounds like I may be better off NOT training on existing messages.  Only 
on new that come in.

Don Ireland
-Original Message-
From: Magnus Holmgren <[EMAIL PROTECTED]>
Date: Monday, Mar 12, 2007 5:23 am
Subject: Re: Training SA-Migrating from old IMAP to new IMAP server
To: users@spamassassin.apache.org

On Sunday 11 March 2007 18:09, Don Ireland wrote:
> I'm my email over from the services of fusemail.com to the IMAP server that
> comes with my shared hosting account.
>
> When I copy my messages over from the old server, do I just run SA-learn
> against the messages as they are?  Or will the fact that they have fusemail
> headers in them cause SA to think messages without fusemail headers are
> spam?

If so, you can make bayes ignore those headers with bayes_ignore_header in 
local.cf. See the Mail::SpamAssassin::Conf(3pm) manpage.

> I've always deleted spam after training the filters so I don't have any to
> feed to to the new system.  Will that be a problem?

Having too great an imbalance in numbers between ham and spam will bias the 
bayes classifier towards "everything is spam" or in this case "everything is 
ham".

-- 
Magnus Holmgren[EMAIL PROTECTED]
   (No Cc of list mail needed, thanks)