Re: URIBL_PH_SURBL

2011-12-08 Thread Jeff Chan
On Thursday, December 1, 2011, 10:11:35 AM, Darxus Darxus wrote:
 On 12/01, Jeff Chan wrote:
 Also keep in mind that PH has a generally low score even for net
 + bayes since it doesn't hit a large portion of spam in the SA
 corpus.  

 No.  Scores are not determined by how many spams a rule hits.  Scores are
 automatically generated to correctly flag as many spams as possible
 without exceeding 1 false positive in every 2500 hams (with a
 required_score of 5).

 Stated in
 http://svn.apache.org/repos/asf/spamassassin/trunk/rules/50_scores.cf
 (a file you get via sa-update)

 So it's entirely possible to have a rule that hits a very small percentage
 of spam with a very large score.

Thanks for the correction.  I actually knew that but remembered
incorrectly.  :(

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:je...@surbl.org
http://www.surbl.org/



Re: URIBL_PH_SURBL

2011-12-01 Thread Ned Slider

On 01/12/11 08:29, Tom Kinghorn wrote:

Good morning list.

could someone possibly explain how the scoring for ph.surbl.org works?

I see the following in my spam logs

spam-1DSMgl4+-YFV.gz: TO_NO_BRKTS_HTML_ONLY=1.258, URIBL_PH_SURBL=0.001]
spam-1DSMgl4+-YFV.gz: * 0.0 URIBL_PH_SURBL Contains an URL listed in the PH
SURBL blocklist


Why does the ph.surbl.org score so low?

I see the rule is defined as

urirhssub URIBL_PH_SURBL multi.surbl.org. A 8
body URIBL_PH_SURBL eval:check_uridnsbl('URIBL_PH_SURBL')
describe URIBL_PH_SURBL Contains an URL listed in the PH SURBL blocklist
tflags URIBL_PH_SURBL net
reuse URIBL_PH_SURBL

how does this work?

Thanks

Tom



and the score is defined in 50_scores.cf:

score URIBL_PH_SURBL 0 0.001 0 0.610 # n=0 n=2

These 4 scores are defined as local, net, with bayes, with bayes+net.

Net means you have network tests enabled, local means you don't have 
network tests enabled.


So because you are showing a score of 0.001, you appear to be using the 
net score set - network tests enabled but no bayes. If you were using 
net and bayes, then this rule would have scored 0.610.


You can over ride scores locally in local.cf if you want.

The scores are automatically generated based on nightly masschecks

http://wiki.apache.org/spamassassin/NightlyMassCheck

This is obviously dependent upon people contributing data for their spam 
and ham.




Re: URIBL_PH_SURBL

2011-12-01 Thread Jeff Chan
Also keep in mind that PH has a generally low score even for net
+ bayes since it doesn't hit a large portion of spam in the SA
corpus.  (In other words phishing and malware unsolicited
messages are a relatively small subset of unsolicited messages in
general.)  However the unsolicited messages it does hit are
generally going to be phishing or malware, so IMO it should have
a much higher score.  Unless people want to get phishing and
malware 

Cheers,

Jeff C.
-- 
Jeff Chan
mailto:je...@surbl.org
http://www.surbl.org/



Re: URIBL_PH_SURBL

2011-12-01 Thread darxus
On 12/01, Jeff Chan wrote:
 Also keep in mind that PH has a generally low score even for net
 + bayes since it doesn't hit a large portion of spam in the SA
 corpus.  

No.  Scores are not determined by how many spams a rule hits.  Scores are
automatically generated to correctly flag as many spams as possible
without exceeding 1 false positive in every 2500 hams (with a
required_score of 5).

Stated in
http://svn.apache.org/repos/asf/spamassassin/trunk/rules/50_scores.cf
(a file you get via sa-update)

So it's entirely possible to have a rule that hits a very small percentage
of spam with a very large score.

-- 
This hurts quite a bit. Very painful.
Think of the sensation as reassurance that you are not dead yet. What
you are feeling is life in you! - Johnny The Homicidal Maniac
http://www.ChaosReigns.com


Re: URIBL_PH_SURBL

2011-12-01 Thread Tom Kinghorn


  
  
On 01/12/2011 20:11, dar...@chaosreigns.com wrote:


  
So it's entirely possible to have a rule that hits a very small percentage
of spam with a very large score.



Thank you to all who replied.

It is much clearer now.

regards

Tom