Re: Bayes classifier
On 7/26/2010 5:58 AM, andrij wrote: Hi all, I am new to spamassassin and bayes classifier. I have several questions and I will greatly appreciate your help with that. 1) Training of the bayes classifier with _multipart_ e-mails (e.g., an e-mail contains other e-mails within its body). If I set bayes_ignore_header Some-header, will bayes classifier ignore (while learning) the header Some-header in the nested messages as well? As far as SA is concerned, this is a single message with a single set of headers. Bayes will ignore the specified header in the main message, but not in the body (where the rest of the e-mails are stored). If you want them treated as separate messages, you will need to run something to split them into separate files and then learn them. 2) Evaluating whether an email is spam or not. Again, if I set bayes_ignore_header Some-header, will the bayes classifier ignore the header while evaluating an e-mail? Yes. That's what it's for. 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? They work very well as-is as long as you make sure the database is being fed properly (learning spam as spam and ham as ham with a decent mix of both being learned on a regular basis). -- Bowie
Re: Bayes classifier
2) Evaluating whether an email is spam or not. Again, if I set bayes_ignore_header Some-header, will the bayes classifier ignore the header while evaluating an e-mail? Yes. That's what it's for. So, the bayes clasifier will ignore Some-header in both learning and spam detection phases. Did I understand it correctly? 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? Maybe I am mistaken, but what is the sense to train the bayes classifier on headers if headers (at least those that precede a subject header) are not considered during the spam detection phase? Thank you. -- View this message in context: http://old.nabble.com/Bayes-classifier-tp29264841p29266978.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes classifier
On 7/26/2010 10:12 AM, andrij wrote: 2) Evaluating whether an email is spam or not. Again, if I set bayes_ignore_header Some-header, will the bayes classifier ignore the header while evaluating an e-mail? Yes. That's what it's for. So, the bayes clasifier will ignore Some-header in both learning and spam detection phases. Did I understand it correctly? I'm not an expert, just another user, but as I understand it, this config option causes Bayes to ignore that particular header in both learning and scoring modes. 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? Maybe I am mistaken, but what is the sense to train the bayes classifier on headers if headers (at least those that precede a subject header) are not considered during the spam detection phase? Bayes learns based on the entire message -- headers and all. (Otherwise, what would be the point of the bayes_ignore_header option?) I can see where you might get that impression by looking at the rule, but if I understand it correctly, Bayes has already been run and the rule is just checking the result. -- Bowie
Re: Bayes classifier
On Mon, 26 Jul 2010, Bowie Bailey wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? They work very well as-is as long as you make sure the database is being fed properly (learning spam as spam and ham as ham with a decent mix of both being learned on a regular basis). A better answer here would be the order of the headers doesn't matter. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- I'm seriously considering getting one of those bright-orange prison overalls and stencilling PASSENGER on the back. Along with the paper slippers, I ought to be able to walk right through security. -- Brian Kantor in a.s.r --- 10 days until the 275th anniversary of John Peter Zenger's acquittal
Re: Bayes classifier
On Mon, 26 Jul 2010, Bowie Bailey wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? They work very well as-is as long as you make sure the database is being fed properly (learning spam as spam and ham as ham with a decent mix of both being learned on a regular basis). On 26.07.10 08:13, John Hardin wrote: A better answer here would be the order of the headers doesn't matter. at least until we won't have a rule that will score by header order :) (a bayes score probably) -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Microsoft dick is soft to do no harm
Re: Bayes classifier
On Mon, 26 Jul 2010, Matus UHLAR - fantomas wrote: On Mon, 26 Jul 2010, Bowie Bailey wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? They work very well as-is as long as you make sure the database is being fed properly (learning spam as spam and ham as ham with a decent mix of both being learned on a regular basis). On 26.07.10 08:13, John Hardin wrote: A better answer here would be the order of the headers doesn't matter. at least until we won't have a rule that will score by header order :) (a bayes score probably) The context of the question (as far as I can determine - it's a pretty rambling question) was within the Bayes classifier, not within general rules. There _are_ some rules where header order is significant and explicitly checked for. So, let me amend my response: A better answer here would be the order of the headers doesn't matter to the bayes classifier. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Insofar as the police deter by their presence, they are very, very good. Criminals take great pains not to commit a crime in front of them. -- Jeffrey Snyder --- 10 days until the 275th anniversary of John Peter Zenger's acquittal
Re: Bayes classifier
On Mon, 26 Jul 2010 09:47:24 -0400 Bowie Bailey bowie_bai...@buc.com wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? That's actually the way BAYES rules are already set up (except that BAYES_05 has '0.01' not '0.00'), so they are already body rules. It doesn't mean they only run on the body and subject.
Re: Bayes classifier
Bowie Bailey wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? Maybe I am mistaken, but what is the sense to train the bayes classifier on headers if headers (at least those that precede a subject header) are not considered during the spam detection phase? Bayes learns based on the entire message -- headers and all. (Otherwise, what would be the point of the bayes_ignore_header option?) I can see where you might get that impression by looking at the rule, but if I understand it correctly, Bayes has already been run and the rule is just checking the result. Thank you for the clarifying. The word body at the begining of the rule confused me. So, in general it does not matter what word (body or header) is put there -- the Bayes clasifier analyzes both headers (except those introduced by bayes_ignore_header) and body during both learning and scoring phases. Right? -- View this message in context: http://old.nabble.com/Bayes-classifier-tp29264841p29269574.html Sent from the SpamAssassin - Users mailing list archive at Nabble.com.
Re: Bayes classifier
On 7/26/2010 2:46 PM, andrij wrote: Bowie Bailey wrote: 3) Evaluating whether an email is spam or not. Does the bayes classifier analyze headers if I have, for example, the following rule: body BAYES_05 eval:check_bayes('0.00', '0.05'). According to the http://wiki.apache.org/spamassassin/WritingRules : Body rules also include the Subject as the first line of the body content. So, any headers that precede subject header are not considered by the bayes classifier? I don't have an answer for you here, but just another question. Why do you want to mess with the bayes rules? Maybe I am mistaken, but what is the sense to train the bayes classifier on headers if headers (at least those that precede a subject header) are not considered during the spam detection phase? Bayes learns based on the entire message -- headers and all. (Otherwise, what would be the point of the bayes_ignore_header option?) I can see where you might get that impression by looking at the rule, but if I understand it correctly, Bayes has already been run and the rule is just checking the result. Thank you for the clarifying. The word body at the begining of the rule confused me. So, in general it does not matter what word (body or header) is put there -- the Bayes clasifier analyzes both headers (except those introduced by bayes_ignore_header) and body during both learning and scoring phases. Right? Right. -- Bowie