Re: Bayes Filtering
On Wed, 22 Jul 2015 09:52:10 -0400 Bill Cole wrote: On 22 Jul 2015, at 8:18, RW wrote: YMMV but personally I've never had a single ham hit BAYES_99. There's currently no evidence to suggest that the OP would have any problem with short-circuiting on it. Experiences with that absolutely do vary, widely. That's rather my point. Keep in mind that Bayesian classification gives a statistical metric, not a human claim; the delta from 100% isn't a polite warning, it's as hard a fact as statistical prediction can provide, given a valid Bayes DB. 99.00% spam certainty from Bayes will be wrong 1% of the time, on average. This is at best a naive executive summary. None of the above is really true. If you've actually NEVER had BAYES_99 hit on ham, you're quite lucky or don't get a lot of ham. I get enough to know that for me the upper limit to the FP rate on BAYES_99 is negligible compared with the FP rate for SA as a whole. What's important is to compare the FP rate increase that would be caused by raising the score of BAYES_99 with the SA FP rate caused by the rule rescoring and custom rules that were added to avoid the FNs in spam that hit BAYES_99 . It's also useful to repeat that analysis without the FPs that no-one cares about. Unless you have done this you don't really know whether increasing the score of BAYES_99 is a good idea or not.
Re: Bayes Filtering
Am 22.07.2015 um 05:05 schrieb Roman Gelfand: shortcircuit BAYES_99 spam shortcircuit BAYES_00 ham On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! however, for the OP it is another reason not even to score high on BAYES_* -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Linux - It's now safe to turn on your computer. Linux - Teraz mozete pocitac bez obav zapnut.
Re: Report spam to Razor
On 21.07.15 21:31, Bill Shirley wrote: I'm looking into modifying my spam processing script so it will report spam to Razor. IIRC Razor says it should only be fed up manually (FYI) From the Spamassassin Wiki: https://wiki.apache.org/spamassassin/ReportingSpam I should use: spamassassin -r message.txt It states The message will also be submitted to SpamAssassin's learning systems. Looking at the parms for spamassassin there is not --dbpath like there is for sa-learn. Does it in fact train the Bayes DB and if so why is there no way to specify --dbpath ? that's because spamassassin is not sa-learn. you ev en should have your db_path in your SA config. using per user Bayes and have some vmail accounts so the --dbpath is not /home/vmail/.spamassassin Also 'spamassassin --help' says: Usage: spamassassin [options] [ *mailmessage* | *path* ... ] Does that mean I can use a directory: smapassassin -r /home/bob/Maildir/.Spam/ ? No: it explicitly says you can only use with message, you must specify path without the . -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. We are but packets in the Internet of life (userfriendly.org)
Re: Bayes Filtering
On Wed, 22 Jul 2015 13:40:12 +0200 Matus UHLAR - fantomas wrote: Am 22.07.2015 um 05:05 schrieb Roman Gelfand: shortcircuit BAYES_99 spam shortcircuit BAYES_00 ham On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! It really doesn't if you think about it. What does explain it is his increased score for BAYES_50, and an increase in some non-Bayes scores. however, for the OP it is another reason not even to score high on BAYES_* YMMV but personally I've never had a single ham hit BAYES_99. There's currently no evidence to suggest that the OP would have any problem with short-circuiting on it.
Re: DKIM, SPF and Bayesian Learning
On 7/21/2015 8:55 PM, Roman Gelfand wrote: It seems that if DKIM or SPF is verified, the bayesian learning doesn't matter. X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no version=3.3.2 If you mean autolearn, it requires a mixture of body and header rules. Most all the rules hit appear to be header rules Normally, SpamAssassin will require 3 points from the header and 3 points from the body to be auto-learned as spam. See perldoc for Mail::SpamAssassin::Plugin::AutoLearnThreshold and Mail::SpamAssassin::Conf Regards, KAM
Re: Report spam to Razor
On Tue, 21 Jul 2015 21:31:57 -0400 Bill Shirley wrote: I'm looking into modifying my spam processing script so it will report spam to Razor. From the Spamassassin Wiki: https://wiki.apache.org/spamassassin/ReportingSpam I should use: spamassassin -r message.txt It states The message will also be submitted to SpamAssassin's learning systems. Looking at the parms for spamassassin there is not --dbpath like there is for sa-learn. Does it in fact train the Bayes DB and if so why is there no way to specify --dbpath ? I'm using per user Bayes and have some vmail accounts so the --dbpath is not /home/vmail/.spamassassin I'm not sure what you mean by vmail, but if you are using virtual home directories you can probably work around it by setting HOME. That's how I use sa-learn, which looks in $HOME/.spamassassin/ rather than the actual unix home directory. I would expect the spamassassin script to do the same thing.
Re: Bayes Filtering
On 22 Jul 2015, at 8:18, RW wrote: YMMV but personally I've never had a single ham hit BAYES_99. There's currently no evidence to suggest that the OP would have any problem with short-circuiting on it. Experiences with that absolutely do vary, widely. Keep in mind that Bayesian classification gives a statistical metric, not a human claim; the delta from 100% isn't a polite warning, it's as hard a fact as statistical prediction can provide, given a valid Bayes DB. 99.00% spam certainty from Bayes will be wrong 1% of the time, on average. If you've actually NEVER had BAYES_99 hit on ham, you're quite lucky or don't get a lot of ham. If you've never *noticed* it hit on ham because other SA rules fail to push the total score past your threshold, SA is working as designed. FWIW, a large slice of the certain ham I saw hit BAYES_99 when I was watching a mailstream large enough to make detailed analysis useful was what one might call boneless canned ham with artificial smoke flavoring, water added: mail addressed to people who had in fact signed up to receive willingly it and that they would never report as spam, but which they didn't really care much about receiving and which most people would believe to be spam if they saw it without knowing the fact that it was intentionally requested. In many cases essentially identical mail *was* outright spam, e.g. social network invites.
Re: Bayes Filtering
On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas: So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! On 22.07.15 14:01, Reindl Harald wrote: completly untrue, if something hits BAYES_999 i expect it to get rejected by a corpus of currently 35000 spam samples, 25000 ham samples and a total of 2.5 Mio tokens handtrained, while a default autolearning/autoexpire setup purges anything above 15 tokens so that you are running in circles if already trained junk comes back after two months which happens regulary a FP is a FP and in doubt questionable, always I'm talking about a few cases you were complaining about low scoring rules, for example DCC (don't remember others). no idea why on this list any qestions are blaming because of the way how you have asked about them ;-) however, for the OP it is another reason not even to score high on BAYES_* for the OP the shortcircuit is questionable because with low scoring BAYES_* and skip all other rules because shortcircuit he won't get useful results the shortcircuiting on BAYES_00 and BAYES_99(9) is questionable no matter what score those rules have... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. I'm not interested in your website anymore. If you need cookies, bake them yourself.
Re: Bayes Filtering
Am 22.07.2015 um 05:05 schrieb Roman Gelfand: shortcircuit BAYES_99 spam shortcircuit BAYES_00 ham i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason signature.asc Description: OpenPGP digital signature
Re: Bayes Filtering
Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas: Am 22.07.2015 um 05:05 schrieb Roman Gelfand: shortcircuit BAYES_99 spam shortcircuit BAYES_00 ham On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! completly untrue, if something hits BAYES_999 i expect it to get rejected by a corpus of currently 35000 spam samples, 25000 ham samples and a total of 2.5 Mio tokens handtrained, while a default autolearning/autoexpire setup purges anything above 15 tokens so that you are running in circles if already trained junk comes back after two months which happens regulary a FP is a FP and in doubt questionable, always no idea why on this list any qestions are blaming however, for the OP it is another reason not even to score high on BAYES_* for the OP the shortcircuit is questionable because with low scoring BAYES_* and skip all other rules because shortcircuit he won't get useful results signature.asc Description: OpenPGP digital signature
Re: Bayes Filtering
Am 22.07.2015 um 14:18 schrieb RW: On Wed, 22 Jul 2015 13:40:12 +0200 Matus UHLAR - fantomas wrote: Am 22.07.2015 um 05:05 schrieb Roman Gelfand: shortcircuit BAYES_99 spam shortcircuit BAYES_00 ham On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! It really doesn't if you think about it. What does explain it is his increased score for BAYES_50, and an increase in some non-Bayes scores which don't change the fact that in cases a rule hits more ham than spam or around 50% in both directions questions about it are legit but that's a completly differnet topic however, for the OP it is another reason not even to score high on BAYES_* YMMV but personally I've never had a single ham hit BAYES_99. There's currently no evidence to suggest that the OP would have any problem with short-circuiting on it well, if someone would read the manuals before talk about score high on BAYES_* he would know that is does *not* matter at all in the context of the OP because BAYES_99 would lead in 100 points and BAYES_00 in -100 points by skip all other non-dns rules and so BAYES_00 and BAYES_999 becomes the final result https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html signature.asc Description: OpenPGP digital signature
Re: Bayes Filtering
On Wed, 22 Jul 2015 03:31:04 + Roman Gelfand wrote: I think the issue was because I never ran sa-learn --sync. That only matters if you set bayes_learn_to_journal 1
Re: Bayes Filtering
Am 22.07.2015 um 15:52 schrieb Matus UHLAR - fantomas: On 22.07.15 10:09, Reindl Harald wrote: i doubt that you really want that and even if for sure not for BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the only decision in a scoring system, it's one component that said from someone scoring BAYES_999 with 7.9 while milter-reject is 8.0 - the other rules are there to avoid false-positives and false-negatives for a good reason Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas: So THIS explains, why you blame (us) for every single low-scoring rule for hitting something you don't like! On 22.07.15 14:01, Reindl Harald wrote: completly untrue, if something hits BAYES_999 i expect it to get rejected by a corpus of currently 35000 spam samples, 25000 ham samples and a total of 2.5 Mio tokens handtrained, while a default autolearning/autoexpire setup purges anything above 15 tokens so that you are running in circles if already trained junk comes back after two months which happens regulary a FP is a FP and in doubt questionable, always I'm talking about a few cases you were complaining about low scoring rules, for example DCC (don't remember others). because there is no good justification to give a legit double-optin newsletter a penalty just because it is a newsletter and i think that i've explained that well in the thread you refer and since i don't use DCC but wondered that it works that way in the thread So THIS explains, why you blame (us) is completly wrong from the begin in context of said from someone scoring BAYES_999 with 7.9 no idea why on this list any qestions are blaming because of the way how you have asked about them ;-) people often are hypersensitive.. however, for the OP it is another reason not even to score high on BAYES_* for the OP the shortcircuit is questionable because with low scoring BAYES_* and skip all other rules because shortcircuit he won't get useful results the shortcircuiting on BAYES_00 and BAYES_99(9) is questionable no matter what score those rules have... and now explain me what was all that crap why you blame (us)... about when in fact your response could have been agreed? signature.asc Description: OpenPGP digital signature