Re: Bayes Filtering

2015-07-22 Thread RW
On Wed, 22 Jul 2015 09:52:10 -0400
Bill Cole wrote:

 On 22 Jul 2015, at 8:18, RW wrote:
 
  YMMV but personally I've never had a single ham hit BAYES_99.
  There's currently no evidence to suggest that the OP would have any
  problem with short-circuiting on it.
 
 Experiences with that absolutely do vary, widely. 

That's rather my point.

 Keep in mind that 
 Bayesian classification gives a statistical metric, not a human
 claim; the delta from 100% isn't a polite warning, it's as hard a
 fact as statistical prediction can provide, given a valid Bayes DB.
 99.00% spam certainty from Bayes will be wrong 1% of the time, on
 average.

This is at best a naive executive summary. None of the above is really
true.

  If you've actually NEVER had BAYES_99 hit on ham, you're
 quite lucky or don't get a lot of ham.

I get enough to know that for me the upper limit to the FP rate on
BAYES_99 is negligible compared with the FP rate for SA as a whole. 

What's important is to compare the FP rate increase that would be caused
by raising the score of BAYES_99 with the SA FP rate caused by the rule
rescoring and custom rules  that were added to avoid the FNs in spam
that hit BAYES_99 . It's also useful to repeat that analysis without
the FPs that no-one cares about.

Unless you have done this you don't really know whether increasing the
score of BAYES_99 is a good idea or not.




Re: Bayes Filtering

2015-07-22 Thread Matus UHLAR - fantomas

Am 22.07.2015 um 05:05 schrieb Roman Gelfand:

shortcircuit BAYES_99 spam
shortcircuit BAYES_00 ham


On 22.07.15 10:09, Reindl Harald wrote:
i doubt that you really want that and even if for sure not for 
BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the 
only decision in a scoring system, it's one component


that said from someone scoring BAYES_999 with 7.9 while milter-reject 
is 8.0 - the other rules are there to avoid false-positives and 
false-negatives for a good reason


So THIS explains, why you blame (us) for every single low-scoring rule for
hitting something you don't like!

however, for the OP it is another reason not even to score high on BAYES_*

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Linux - It's now safe to turn on your computer.
Linux - Teraz mozete pocitac bez obav zapnut.


Re: Report spam to Razor

2015-07-22 Thread Matus UHLAR - fantomas

On 21.07.15 21:31, Bill Shirley wrote:

I'm looking into modifying my spam processing script so it will report spam to 
Razor.


IIRC Razor says it should only be fed up manually (FYI)


From the Spamassassin Wiki: https://wiki.apache.org/spamassassin/ReportingSpam
I should use:
spamassassin -r  message.txt
It states The message will also be submitted to SpamAssassin's learning 
systems.  Looking
at the parms for spamassassin there is not --dbpath like there is for sa-learn.
Does it in fact train the Bayes DB and if so why is there no way to specify 
--dbpath ?


that's because spamassassin is not sa-learn. you ev en should have your
db_path in your SA config.


using per user Bayes and have some vmail accounts so the --dbpath is not 
/home/vmail/.spamassassin

Also 'spamassassin --help' says:
Usage:
   spamassassin [options] [  *mailmessage* | *path* ... ]

Does that mean I can use a directory: smapassassin -r  
/home/bob/Maildir/.Spam/ ?


No: it explicitly says you can only use  with message, you must specify
path without the .

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
We are but packets in the Internet of life (userfriendly.org)


Re: Bayes Filtering

2015-07-22 Thread RW
On Wed, 22 Jul 2015 13:40:12 +0200
Matus UHLAR - fantomas wrote:

 Am 22.07.2015 um 05:05 schrieb Roman Gelfand:
 shortcircuit BAYES_99 spam
 shortcircuit BAYES_00 ham
 
 On 22.07.15 10:09, Reindl Harald wrote:
 i doubt that you really want that and even if for sure not for 
 BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the 
 only decision in a scoring system, it's one component
 
 that said from someone scoring BAYES_999 with 7.9 while
 milter-reject is 8.0 - the other rules are there to avoid
 false-positives and false-negatives for a good reason
 
 So THIS explains, why you blame (us) for every single low-scoring
 rule for hitting something you don't like!

It really doesn't if you think about it. What does explain it is his 
increased score for BAYES_50, and an increase in some non-Bayes scores.

 however, for the OP it is another reason not even to score high on
 BAYES_*


YMMV but personally I've never had a single ham hit BAYES_99. There's
currently no evidence to suggest that the OP would have any problem
with short-circuiting on it. 


Re: DKIM, SPF and Bayesian Learning

2015-07-22 Thread Kevin A. McGrail

On 7/21/2015 8:55 PM, Roman Gelfand wrote:
It seems that if DKIM or SPF is verified, the bayesian learning 
doesn't matter.


X-Spam-Status: No, score=3.6 required=5.0 tests=BAYES_99,BAYES_999,DKIM_SIGNED,
DKIM_VALID,DKIM_VALID_AU,HTML_MESSAGE,SPF_PASS autolearn=no 
version=3.3.2
If you mean autolearn, it requires a mixture of body and header rules.  
Most all the rules hit appear to be header rules


Normally, SpamAssassin will require 3 points from the header and 3 
points from the body to be auto-learned as spam. 



See perldoc for Mail::SpamAssassin::Plugin::AutoLearnThreshold and 
Mail::SpamAssassin::Conf


Regards,
KAM


Re: Report spam to Razor

2015-07-22 Thread RW
On Tue, 21 Jul 2015 21:31:57 -0400
Bill Shirley wrote:

 I'm looking into modifying my spam processing script so it will
 report spam to Razor. From the Spamassassin Wiki:
 https://wiki.apache.org/spamassassin/ReportingSpam I should use:
   spamassassin -r  message.txt
 It states The message will also be submitted to SpamAssassin's
 learning systems.  Looking at the parms for spamassassin there is
 not --dbpath like there is for sa-learn.
 
 Does it in fact train the Bayes DB and if so why is there no way to
 specify --dbpath ?  I'm using per user Bayes and have some vmail
 accounts so the --dbpath is not /home/vmail/.spamassassin

I'm not sure what you mean by vmail, but if you are using virtual home
directories you can probably work around it by setting HOME.

That's how I use sa-learn, which looks in $HOME/.spamassassin/ rather
than the actual unix home directory. I would expect the spamassassin
script to do the same thing.


Re: Bayes Filtering

2015-07-22 Thread Bill Cole

On 22 Jul 2015, at 8:18, RW wrote:


YMMV but personally I've never had a single ham hit BAYES_99. There's
currently no evidence to suggest that the OP would have any problem
with short-circuiting on it.


Experiences with that absolutely do vary, widely. Keep in mind that 
Bayesian classification gives a statistical metric, not a human claim; 
the delta from 100% isn't a polite warning, it's as hard a fact as 
statistical prediction can provide, given a valid Bayes DB. 99.00% spam 
certainty from Bayes will be wrong 1% of the time, on average. If you've 
actually NEVER had BAYES_99 hit on ham, you're quite lucky or don't get 
a lot of ham. If you've never *noticed* it hit on ham because other SA 
rules fail to push the total score past your threshold, SA is working as 
designed.


FWIW, a large slice of the certain ham I saw hit BAYES_99 when I was 
watching a mailstream large enough to make detailed analysis useful was 
what one might call boneless canned ham with artificial smoke 
flavoring, water added: mail addressed to people who had in fact signed 
up to receive willingly it and that they would never report as spam, but 
which they didn't really care much about receiving and which most people 
would believe to be spam if they saw it without knowing the fact that it 
was intentionally requested. In many cases essentially identical mail 
*was* outright spam, e.g. social network invites.


Re: Bayes Filtering

2015-07-22 Thread Matus UHLAR - fantomas

On 22.07.15 10:09, Reindl Harald wrote:

i doubt that you really want that and even if for sure not for
BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the
only decision in a scoring system, it's one component

that said from someone scoring BAYES_999 with 7.9 while milter-reject
is 8.0 - the other rules are there to avoid false-positives and
false-negatives for a good reason



Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas:

So THIS explains, why you blame (us) for every single low-scoring rule for
hitting something you don't like!


On 22.07.15 14:01, Reindl Harald wrote:
completly untrue, if something hits BAYES_999 i expect it to get 
rejected by a corpus of currently 35000 spam samples, 25000 ham 
samples and a total of 2.5 Mio tokens


handtrained, while a default autolearning/autoexpire setup purges 
anything above 15 tokens so that you are running in circles if 
already trained junk comes back after two months which happens 
regulary


a FP is a FP and in doubt questionable, always


I'm talking about a few cases you were complaining about low scoring rules,
for example DCC (don't remember others).


no idea why on this list any qestions are blaming


because of the way how you have asked about them ;-)


however, for the OP it is another reason not even to score high on BAYES_*


for the OP the shortcircuit is questionable because with low scoring 
BAYES_* and skip all other rules because shortcircuit he won't get 
useful results


the shortcircuiting on BAYES_00 and BAYES_99(9) is questionable no matter
what score those rules have...

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I'm not interested in your website anymore.
If you need cookies, bake them yourself.


Re: Bayes Filtering

2015-07-22 Thread Reindl Harald


Am 22.07.2015 um 05:05 schrieb Roman Gelfand:

shortcircuit BAYES_99 spam
shortcircuit BAYES_00 ham


i doubt that you really want that and even if for sure not for BAYES_99 
but BAYES_999, it makes no sense - bayes alone is not the only decision 
in a scoring system, it's one component


that said from someone scoring BAYES_999 with 7.9 while milter-reject is 
8.0 - the other rules are there to avoid false-positives and 
false-negatives for a good reason





signature.asc
Description: OpenPGP digital signature


Re: Bayes Filtering

2015-07-22 Thread Reindl Harald


Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas:

Am 22.07.2015 um 05:05 schrieb Roman Gelfand:

shortcircuit BAYES_99 spam
shortcircuit BAYES_00 ham


On 22.07.15 10:09, Reindl Harald wrote:

i doubt that you really want that and even if for sure not for
BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the
only decision in a scoring system, it's one component

that said from someone scoring BAYES_999 with 7.9 while milter-reject
is 8.0 - the other rules are there to avoid false-positives and
false-negatives for a good reason


So THIS explains, why you blame (us) for every single low-scoring rule for
hitting something you don't like!


completly untrue, if something hits BAYES_999 i expect it to get 
rejected by a corpus of currently 35000 spam samples, 25000 ham samples 
and a total of 2.5 Mio tokens


handtrained, while a default autolearning/autoexpire setup purges 
anything above 15 tokens so that you are running in circles if 
already trained junk comes back after two months which happens regulary


a FP is a FP and in doubt questionable, always
no idea why on this list any qestions are blaming


however, for the OP it is another reason not even to score high on BAYES_*


for the OP the shortcircuit is questionable because with low scoring 
BAYES_* and skip all other rules because shortcircuit he won't get 
useful results




signature.asc
Description: OpenPGP digital signature


Re: Bayes Filtering

2015-07-22 Thread Reindl Harald



Am 22.07.2015 um 14:18 schrieb RW:

On Wed, 22 Jul 2015 13:40:12 +0200
Matus UHLAR - fantomas wrote:


Am 22.07.2015 um 05:05 schrieb Roman Gelfand:

shortcircuit BAYES_99 spam
shortcircuit BAYES_00 ham


On 22.07.15 10:09, Reindl Harald wrote:

i doubt that you really want that and even if for sure not for
BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the
only decision in a scoring system, it's one component

that said from someone scoring BAYES_999 with 7.9 while
milter-reject is 8.0 - the other rules are there to avoid
false-positives and false-negatives for a good reason


So THIS explains, why you blame (us) for every single low-scoring
rule for hitting something you don't like!


It really doesn't if you think about it. What does explain it is his
increased score for BAYES_50, and an increase in some non-Bayes scores


which don't change the fact that in cases a rule hits more ham than spam 
or around 50% in both directions questions about it are legit


but that's a completly differnet topic


however, for the OP it is another reason not even to score high on
BAYES_*


YMMV but personally I've never had a single ham hit BAYES_99. There's
currently no evidence to suggest that the OP would have any problem
with short-circuiting on it


well, if someone would read the manuals before talk about score high on 
BAYES_* he would know that is does *not* matter at all in the context 
of the OP because BAYES_99 would lead in 100 points and BAYES_00 in -100 
points by skip all other non-dns rules and so BAYES_00 and BAYES_999 
becomes the final result


https://spamassassin.apache.org/full/3.4.x/doc/Mail_SpamAssassin_Plugin_Shortcircuit.html




signature.asc
Description: OpenPGP digital signature


Re: Bayes Filtering

2015-07-22 Thread RW
On Wed, 22 Jul 2015 03:31:04 +
Roman Gelfand wrote:

 I think the issue was because I never ran sa-learn --sync.

That only matters if you set 

bayes_learn_to_journal 1





Re: Bayes Filtering

2015-07-22 Thread Reindl Harald



Am 22.07.2015 um 15:52 schrieb Matus UHLAR - fantomas:

On 22.07.15 10:09, Reindl Harald wrote:

i doubt that you really want that and even if for sure not for
BAYES_99 but BAYES_999, it makes no sense - bayes alone is not the
only decision in a scoring system, it's one component

that said from someone scoring BAYES_999 with 7.9 while milter-reject
is 8.0 - the other rules are there to avoid false-positives and
false-negatives for a good reason



Am 22.07.2015 um 13:40 schrieb Matus UHLAR - fantomas:

So THIS explains, why you blame (us) for every single low-scoring
rule for
hitting something you don't like!


On 22.07.15 14:01, Reindl Harald wrote:

completly untrue, if something hits BAYES_999 i expect it to get
rejected by a corpus of currently 35000 spam samples, 25000 ham
samples and a total of 2.5 Mio tokens

handtrained, while a default autolearning/autoexpire setup purges
anything above 15 tokens so that you are running in circles if
already trained junk comes back after two months which happens regulary

a FP is a FP and in doubt questionable, always


I'm talking about a few cases you were complaining about low scoring rules,
for example DCC (don't remember others).


because there is no good justification to give a legit double-optin 
newsletter a penalty just because it is a newsletter and i think that 
i've explained that well in the thread you refer


and since i don't use DCC but wondered that it works that way in the 
thread So THIS explains, why you blame (us) is completly wrong from 
the begin in context of said from someone scoring BAYES_999 with 7.9



no idea why on this list any qestions are blaming


because of the way how you have asked about them ;-)


people often are hypersensitive..


however, for the OP it is another reason not even to score high on
BAYES_*


for the OP the shortcircuit is questionable because with low scoring
BAYES_* and skip all other rules because shortcircuit he won't get
useful results


the shortcircuiting on BAYES_00 and BAYES_99(9) is questionable no matter
what score those rules have...


and now explain me what was all that crap why you blame (us)... about 
when in fact your response could have been agreed?




signature.asc
Description: OpenPGP digital signature