Re: Bayes classifier

2010-07-26 Thread Bowie Bailey
 On 7/26/2010 5:58 AM, andrij wrote:
 Hi all,

 I am new to spamassassin and bayes classifier. I have several questions and
 I will greatly appreciate your help with that.

 1) Training of the bayes classifier with _multipart_ e-mails (e.g., an
 e-mail contains other e-mails within its body). If I set
 bayes_ignore_header Some-header, will bayes classifier ignore (while
 learning) the header Some-header in the nested messages as well?

As far as SA is concerned, this is a single message with a single set of
headers.  Bayes will ignore the specified header in the main message,
but not in the body (where the rest of the e-mails are stored).  If you
want them treated as separate messages, you will need to run something
to split them into separate files and then learn them.

 2) Evaluating whether an email is spam or not. Again, if I set
 bayes_ignore_header Some-header, will the bayes classifier ignore the
 header while evaluating an e-mail?

Yes.  That's what it's for.

 3) Evaluating whether an email is spam or not. Does the bayes classifier
 analyze headers if I have, for example, the following rule: body BAYES_05
 eval:check_bayes('0.00', '0.05'). According to the
 http://wiki.apache.org/spamassassin/WritingRules : Body rules also include
 the Subject as the first line of the body content. So, any headers that
 precede subject header are not considered by the bayes classifier?

I don't have an answer for you here, but just another question.  Why do
you want to mess with the bayes rules?  They work very well as-is as
long as you make sure the database is being fed properly (learning spam
as spam and ham as ham with a decent mix of both being learned on a
regular basis).

-- 
Bowie


Re: Bayes classifier

2010-07-26 Thread andrij



 2) Evaluating whether an email is spam or not. Again, if I set
 bayes_ignore_header Some-header, will the bayes classifier ignore the
 header while evaluating an e-mail?
 
 Yes.  That's what it's for.
 

So, the bayes clasifier will ignore Some-header in both learning and spam
detection phases. Did I understand it correctly?



 3) Evaluating whether an email is spam or not. Does the bayes classifier
 analyze headers if I have, for example, the following rule: body
 BAYES_05
 eval:check_bayes('0.00', '0.05'). According to the
 http://wiki.apache.org/spamassassin/WritingRules : Body rules also
 include
 the Subject as the first line of the body content. So, any headers that
 precede subject header are not considered by the bayes classifier?
 
 I don't have an answer for you here, but just another question.  Why do
 you want to mess with the bayes rules?
 

Maybe I am mistaken, but what is the sense to train the bayes classifier on
headers if headers (at least those that precede a subject header) are not
considered during the spam detection phase?

Thank you.
-- 
View this message in context: 
http://old.nabble.com/Bayes-classifier-tp29264841p29266978.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Bayes classifier

2010-07-26 Thread Bowie Bailey
 On 7/26/2010 10:12 AM, andrij wrote:
 2) Evaluating whether an email is spam or not. Again, if I set
 bayes_ignore_header Some-header, will the bayes classifier ignore the
 header while evaluating an e-mail?
 Yes.  That's what it's for.
 So, the bayes clasifier will ignore Some-header in both learning and spam
 detection phases. Did I understand it correctly?

I'm not an expert, just another user, but as I understand it, this
config option causes Bayes to ignore that particular header in both
learning and scoring modes.

 3) Evaluating whether an email is spam or not. Does the bayes classifier
 analyze headers if I have, for example, the following rule: body
 BAYES_05
 eval:check_bayes('0.00', '0.05'). According to the
 http://wiki.apache.org/spamassassin/WritingRules : Body rules also
 include
 the Subject as the first line of the body content. So, any headers that
 precede subject header are not considered by the bayes classifier?
 I don't have an answer for you here, but just another question.  Why do
 you want to mess with the bayes rules?
 Maybe I am mistaken, but what is the sense to train the bayes classifier on
 headers if headers (at least those that precede a subject header) are not
 considered during the spam detection phase?

Bayes learns based on the entire message -- headers and all. 
(Otherwise, what would be the point of the bayes_ignore_header option?)

I can see where you might get that impression by looking at the rule,
but if I understand it correctly, Bayes has already been run and the
rule is just checking the result.

-- 
Bowie


Re: Bayes classifier

2010-07-26 Thread John Hardin

On Mon, 26 Jul 2010, Bowie Bailey wrote:


3) Evaluating whether an email is spam or not. Does the bayes
   classifier analyze headers if I have, for example, the following
   rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to
   the http://wiki.apache.org/spamassassin/WritingRules : Body rules
   also include the Subject as the first line of the body content. So,
   any headers that precede subject header are not considered by the
   bayes classifier?


I don't have an answer for you here, but just another question.  Why do 
you want to mess with the bayes rules?  They work very well as-is as 
long as you make sure the database is being fed properly (learning spam 
as spam and ham as ham with a decent mix of both being learned on a 
regular basis).


A better answer here would be the order of the headers doesn't matter.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  I'm seriously considering getting one of those bright-orange prison
  overalls and stencilling PASSENGER on the back. Along with the paper
  slippers, I ought to be able to walk right through security.
 -- Brian Kantor in a.s.r
---
 10 days until the 275th anniversary of John Peter Zenger's acquittal


Re: Bayes classifier

2010-07-26 Thread Matus UHLAR - fantomas
 On Mon, 26 Jul 2010, Bowie Bailey wrote:

 3) Evaluating whether an email is spam or not. Does the bayes
classifier analyze headers if I have, for example, the following
rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to
the http://wiki.apache.org/spamassassin/WritingRules : Body rules
also include the Subject as the first line of the body content. So,
any headers that precede subject header are not considered by the
bayes classifier?

 I don't have an answer for you here, but just another question.  Why do 
 you want to mess with the bayes rules?  They work very well as-is as  
 long as you make sure the database is being fed properly (learning spam 
 as spam and ham as ham with a decent mix of both being learned on a  
 regular basis).

On 26.07.10 08:13, John Hardin wrote:
 A better answer here would be the order of the headers doesn't matter.

at least until we won't have a rule that will score by header order :)
(a bayes score probably)
-- 
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Microsoft dick is soft to do no harm


Re: Bayes classifier

2010-07-26 Thread John Hardin

On Mon, 26 Jul 2010, Matus UHLAR - fantomas wrote:


On Mon, 26 Jul 2010, Bowie Bailey wrote:


3) Evaluating whether an email is spam or not. Does the bayes
   classifier analyze headers if I have, for example, the following
   rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to
   the http://wiki.apache.org/spamassassin/WritingRules : Body rules
   also include the Subject as the first line of the body content. So,
   any headers that precede subject header are not considered by the
   bayes classifier?


I don't have an answer for you here, but just another question.  Why do
you want to mess with the bayes rules?  They work very well as-is as
long as you make sure the database is being fed properly (learning spam
as spam and ham as ham with a decent mix of both being learned on a
regular basis).


On 26.07.10 08:13, John Hardin wrote:

A better answer here would be the order of the headers doesn't matter.


at least until we won't have a rule that will score by header order :)
(a bayes score probably)


The context of the question (as far as I can determine - it's a pretty 
rambling question) was within the Bayes classifier, not within general 
rules. There _are_ some rules where header order is significant and 
explicitly checked for.


So, let me amend my response:

A better answer here would be the order of the headers doesn't matter to 
the bayes classifier.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Insofar as the police deter by their presence, they are very, very
  good. Criminals take great pains not to commit a crime in front of
  them. -- Jeffrey Snyder
---
 10 days until the 275th anniversary of John Peter Zenger's acquittal


Re: Bayes classifier

2010-07-26 Thread RW
On Mon, 26 Jul 2010 09:47:24 -0400
Bowie Bailey bowie_bai...@buc.com wrote:


  3) Evaluating whether an email is spam or not. Does the bayes
  classifier analyze headers if I have, for example, the following
  rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According
  to the http://wiki.apache.org/spamassassin/WritingRules : Body
  rules also include the Subject as the first line of the body
  content. So, any headers that precede subject header are not
  considered by the bayes classifier?
 
 I don't have an answer for you here, but just another question.  Why
 do you want to mess with the bayes rules?  

That's actually the way BAYES rules are already set up (except that
BAYES_05 has '0.01' not '0.00'), so they are already body rules. It
doesn't mean they only run on the body and subject.


Re: Bayes classifier

2010-07-26 Thread andrij



Bowie Bailey wrote:
 
 3) Evaluating whether an email is spam or not. Does the bayes
 classifier
 analyze headers if I have, for example, the following rule: body
 BAYES_05
 eval:check_bayes('0.00', '0.05'). According to the
 http://wiki.apache.org/spamassassin/WritingRules : Body rules also
 include
 the Subject as the first line of the body content. So, any headers
 that
 precede subject header are not considered by the bayes classifier?
 I don't have an answer for you here, but just another question.  Why do
 you want to mess with the bayes rules?
 Maybe I am mistaken, but what is the sense to train the bayes classifier
 on
 headers if headers (at least those that precede a subject header) are not
 considered during the spam detection phase?
 
 Bayes learns based on the entire message -- headers and all. 
 (Otherwise, what would be the point of the bayes_ignore_header option?)
 
 I can see where you might get that impression by looking at the rule,
 but if I understand it correctly, Bayes has already been run and the
 rule is just checking the result.
 

Thank you for the clarifying. The word body at the begining of the rule
confused me. So, in general it does not matter what word (body or
header) is put there -- the Bayes clasifier analyzes both headers (except
those introduced by bayes_ignore_header) and body during both learning and
scoring phases. Right?

-- 
View this message in context: 
http://old.nabble.com/Bayes-classifier-tp29264841p29269574.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.



Re: Bayes classifier

2010-07-26 Thread Bowie Bailey
 On 7/26/2010 2:46 PM, andrij wrote:
 Bowie Bailey wrote:
 3) Evaluating whether an email is spam or not. Does the bayes
 classifier
 analyze headers if I have, for example, the following rule: body
 BAYES_05
 eval:check_bayes('0.00', '0.05'). According to the
 http://wiki.apache.org/spamassassin/WritingRules : Body rules also
 include
 the Subject as the first line of the body content. So, any headers
 that
 precede subject header are not considered by the bayes classifier?
 I don't have an answer for you here, but just another question.  Why do
 you want to mess with the bayes rules?
 Maybe I am mistaken, but what is the sense to train the bayes classifier
 on
 headers if headers (at least those that precede a subject header) are not
 considered during the spam detection phase?
 Bayes learns based on the entire message -- headers and all. 
 (Otherwise, what would be the point of the bayes_ignore_header option?)

 I can see where you might get that impression by looking at the rule,
 but if I understand it correctly, Bayes has already been run and the
 rule is just checking the result.
 Thank you for the clarifying. The word body at the begining of the rule
 confused me. So, in general it does not matter what word (body or
 header) is put there -- the Bayes clasifier analyzes both headers (except
 those introduced by bayes_ignore_header) and body during both learning and
 scoring phases. Right?

Right.

-- 
Bowie