Re: Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-06 Thread Mark London

On 12/5/2017 5:28 AM, Sebastian Arcus wrote:

On 02/12/17 18:45, David Jones wrote:

On 12/02/2017 11:22 AM, Sebastian Arcus wrote:

On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly 
text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 
bytes of words
2.0 BAYES_50   BODY: Bayes spam probability is 40 to 
60%

  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]

On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?

compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.00.8
h maybe you should rethink those changes.

On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more 
spam for the type of email received at this particular site. That 
doesn't change the fact that 1.6 seems to me a pretty high score 
for a rule which would be triggered on such a large number of ham 
emails. Just saying.
You should understand that when you start tuning scores, you can 
get to hell
very fast. unless you do your own mass-checks and tune according to 
them.
I'm not too sure I understand this attitude. The whole reason I 
started to tweak the scores for certain rules is that too much spam 
was going through. The false negatives have gone down considerably 
since I have altered the scores - and yes, I do keep an eye on them 
constantly and adjust depending on the number of false positive and 
negatives, and what triggers what. I also use network tests / RBL's 
as well and Bayes. The simple fact of the matter is that on plenty 
of spam emails, only one significant rule might get triggered - be 
it a high bayes score, one of the DNS RBL's or something else. If 
the rule doesn't have a high enough score, the email passes through.


Spammers change their tactics and content of their emails all the 
time - and the rule scores haven't been updated in months - because 
of the problems with the updating system (which is not a criticism - 
I understand the situation). So for people to advise sticking 
religiously to the default scores, well, frankly I don't get it.
The rulesets and dynamic scores in 72_scores.cf are updating again 
for the past 2 weeks.
I recommend only changing a few of the default scores and make meta 
rules that combine the hits to add points when you see a pattern of 2 
or more rules being hit.
If you add enough add-ons to your SA instance, then you shouldn't be 
impacted too much by the default scores.  SA has to be generic out of 
the box to cover all types of mail flow.  You have to tune it a bit 
for your particular recipients, language, and location.  See my email 
moments ago about tuning suggestions.
I used to constantly adjust scores to react to new spam campaigns but 
found I was always behind the spammers.  The more RBLs and meta rules 
you can setup, the more you can stay ahead of them.  Compromised 
accounts are the exception to this with zero-hour spam that is very 
difficult to block so try to keep that separate in your mind and not 
chase after those with score adjustments. These tend to stop 
automatically after 30 minutes or so when RBLs and DCC catch up to 
them or the account gets locked or it's password changed.  I report 
these to Spamcop as quickly as I can.

Thank you David. Those are useful tips


I have also encountered FPs due to the scores of all the 
HTML_IMAGE_ONLY_* rules.  I have changed their score to be 0.001. I have 
meta rules that combine __HTML_IMG_ONLY with the RBLs, and I've found 
that to be useful.   But for some reason, __HTML_IMG_ONLY does not 
include HTML_IMAGE_ONLY_32.   Is there any reason that this was left out?


- Mark



Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-05 Thread Sebastian Arcus


On 02/12/17 18:45, David Jones wrote:

On 12/02/2017 11:22 AM, Sebastian Arcus wrote:


On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly 
text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes 
of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more 
spam for the type of email received at this particular site. That 
doesn't change the fact that 1.6 seems to me a pretty high score for 
a rule which would be triggered on such a large number of ham 
emails. Just saying.


You should understand that when you start tuning scores, you can get 
to hell
very fast. unless you do your own mass-checks and tune according to 
them.


I'm not too sure I understand this attitude. The whole reason I 
started to tweak the scores for certain rules is that too much spam 
was going through. The false negatives have gone down considerably 
since I have altered the scores - and yes, I do keep an eye on them 
constantly and adjust depending on the number of false positive and 
negatives, and what triggers what. I also use network tests / RBL's as 
well and Bayes. The simple fact of the matter is that on plenty of 
spam emails, only one significant rule might get triggered - be it a 
high bayes score, one of the DNS RBL's or something else. If the rule 
doesn't have a high enough score, the email passes through.


Spammers change their tactics and content of their emails all the time 
- and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking 
religiously to the default scores, well, frankly I don't get it.


The rulesets and dynamic scores in 72_scores.cf are updating again for 
the past 2 weeks.


I recommend only changing a few of the default scores and make meta 
rules that combine the hits to add points when you see a pattern of 2 or 
more rules being hit.


If you add enough add-ons to your SA instance, then you shouldn't be 
impacted too much by the default scores.  SA has to be generic out of 
the box to cover all types of mail flow.  You have to tune it a bit for 
your particular recipients, language, and location.  See my email 
moments ago about tuning suggestions.


I used to constantly adjust scores to react to new spam campaigns but 
found I was always behind the spammers.  The more RBLs and meta rules 
you can setup, the more you can stay ahead of them.  Compromised 
accounts are the exception to this with zero-hour spam that is very 
difficult to block so try to keep that separate in your mind and not 
chase after those with score adjustments. These tend to stop 
automatically after 30 minutes or so when RBLs and DCC catch up to them 
or the account gets locked or it's password changed.  I report these to 
Spamcop as quickly as I can.


Thank you David. Those are useful tips.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-02 Thread David Jones

On 12/02/2017 11:22 AM, Sebastian Arcus wrote:


On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html 
MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes 
of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just 
saying.


You should understand that when you start tuning scores, you can get 
to hell

very fast. unless you do your own mass-checks and tune according to them.


I'm not too sure I understand this attitude. The whole reason I started 
to tweak the scores for certain rules is that too much spam was going 
through. The false negatives have gone down considerably since I have 
altered the scores - and yes, I do keep an eye on them constantly and 
adjust depending on the number of false positive and negatives, and what 
triggers what. I also use network tests / RBL's as well and Bayes. The 
simple fact of the matter is that on plenty of spam emails, only one 
significant rule might get triggered - be it a high bayes score, one of 
the DNS RBL's or something else. If the rule doesn't have a high enough 
score, the email passes through.


Spammers change their tactics and content of their emails all the time - 
and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking religiously 
to the default scores, well, frankly I don't get it.


The rulesets and dynamic scores in 72_scores.cf are updating again for 
the past 2 weeks.


I recommend only changing a few of the default scores and make meta 
rules that combine the hits to add points when you see a pattern of 2 or 
more rules being hit.


If you add enough add-ons to your SA instance, then you shouldn't be 
impacted too much by the default scores.  SA has to be generic out of 
the box to cover all types of mail flow.  You have to tune it a bit for 
your particular recipients, language, and location.  See my email 
moments ago about tuning suggestions.


I used to constantly adjust scores to react to new spam campaigns but 
found I was always behind the spammers.  The more RBLs and meta rules 
you can setup, the more you can stay ahead of them.  Compromised 
accounts are the exception to this with zero-hour spam that is very 
difficult to block so try to keep that separate in your mind and not 
chase after those with score adjustments. These tend to stop 
automatically after 30 minutes or so when RBLs and DCC catch up to them 
or the account gets locked or it's password changed.  I report these to 
Spamcop as quickly as I can.


--
David Jones


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-02 Thread Sebastian Arcus


On 02/12/17 13:06, Matus UHLAR - fantomas wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in 
wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html 
MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes 
of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in 
list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just 
saying.


You should understand that when you start tuning scores, you can get to 
hell

very fast. unless you do your own mass-checks and tune according to them.


I'm not too sure I understand this attitude. The whole reason I started 
to tweak the scores for certain rules is that too much spam was going 
through. The false negatives have gone down considerably since I have 
altered the scores - and yes, I do keep an eye on them constantly and 
adjust depending on the number of false positive and negatives, and what 
triggers what. I also use network tests / RBL's as well and Bayes. The 
simple fact of the matter is that on plenty of spam emails, only one 
significant rule might get triggered - be it a high bayes score, one of 
the DNS RBL's or something else. If the rule doesn't have a high enough 
score, the email passes through.


Spammers change their tactics and content of their emails all the time - 
and the rule scores haven't been updated in months - because of the 
problems with the updating system (which is not a criticism - I 
understand the situation). So for people to advise sticking religiously 
to the default scores, well, frankly I don't get it.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-02 Thread Matus UHLAR - fantomas

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:

-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 
bytes of words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in list.dnswl.org]



On 01/12/17 10:54, Axb wrote:
you've changed SA default scores and now complain about one which 
hasn't been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


On 01.12.17 12:23, Sebastian Arcus wrote:
Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just 
saying.


You should understand that when you start tuning scores, you can get to hell
very fast. unless you do your own mass-checks and tune according to them.

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
42.7 percent of all statistics are made up on the spot. 


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-02 Thread Matus UHLAR - fantomas

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set 
of rules recently generating false positives.



On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


On 01.12.17 10:17, Sebastian Arcus wrote:
Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what 
you mean by network rules?). I still think that a score of 1.6 is 
quite a lot, considering that so many emails nowadays contain either 
an embedded logo in the signature, with just a few words (in a quick 
email reply, for example), or even images inserted, instead of 
attached to the email. Please see below an example of a SA report:



1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
[score: 0.4808]


configuring BAYES includes training it, so your mail don't get 0.48 score.


2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)


now I really wonder why you blame HTML_IMAGE_ONLY_24, when BAYES_50 and
PYZOR_CHECK gave you higher score each?

--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
I just got lost in thought. It was unfamiliar territory. 


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-01 Thread Sebastian Arcus


On 01/12/17 10:54, Axb wrote:

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:


On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and 
not everybody is a graphics expert to make their logo a tiny 
optimised gif or png - so some of these are slightly bigger than 
they should be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of 
words to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
anything else to be done - aside from educating all the internet on 
optimising logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an 
embedded logo in the signature, with just a few words (in a quick 
email reply, for example), or even images inserted, instead of 
attached to the email. Please see below an example of a SA report:


-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
 [212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of 
words

2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at 
http://www.dnswl.org/, no

  trust
  [212.227.126.131 listed in list.dnswl.org]


you've changed SA default scores and now complain about one which hasn't 
been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.0    0.8

h maybe you should rethink those changes.


Indeed, I did amend some of the default SA scores, to catch more spam 
for the type of email received at this particular site. That doesn't 
change the fact that 1.6 seems to me a pretty high score for a rule 
which would be triggered on such a large number of ham emails. Just saying.


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-01 Thread Axb

On 12/01/2017 11:17 AM, Sebastian Arcus wrote:


On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and not 
everybody is a graphics expert to make their logo a tiny optimised 
gif or png - so some of these are slightly bigger than they should be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of words 
to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
anything else to be done - aside from educating all the internet on 
optimising logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an embedded 
logo in the signature, with just a few words (in a quick email reply, 
for example), or even images inserted, instead of attached to the email. 
Please see below an example of a SA report:


-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
     [212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
  [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECK    Listed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
  trust
  [212.227.126.131 listed in list.dnswl.org]


you've changed SA default scores and now complain about one which hasn't 
been touched as cause for FPs?


compare the defaults with yours...
score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2
score BAYES_50  0  0  2.00.8

h maybe you should rethink those changes.



Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-12-01 Thread Sebastian Arcus


On 30/11/17 12:45, Matus UHLAR - fantomas wrote:

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and not 
everybody is a graphics expert to make their logo a tiny optimised gif 
or png - so some of these are slightly bigger than they should be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of words 
to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything 
else to be done - aside from educating all the internet on optimising 
logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...


Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you 
mean by network rules?). I still think that a score of 1.6 is quite a 
lot, considering that so many emails nowadays contain either an embedded 
logo in the signature, with just a few words (in a quick email reply, 
for example), or even images inserted, instead of attached to the email. 
Please see below an example of a SA report:


-0.2 RCVD_IN_MSPIKE_H2  RBL: Average reputation (+2)
[212.227.126.131 listed in wl.mailspike.net]
0.4 MIME_HTML_MOSTLY   BODY: Multipart message mostly text/html MIME
1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words
2.0 BAYES_50   BODY: Bayes spam probability is 40 to 60%
 [score: 0.4808]
0.8 MPART_ALT_DIFF BODY: HTML and text parts are different
0.0 HTML_MESSAGE   BODY: HTML included in message
2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/)
-0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no
 trust
 [212.227.126.131 listed in list.dnswl.org]


Re: HTML_IMAGE_ONLY_* generating too many FP's

2017-11-30 Thread Matus UHLAR - fantomas

On 28.11.17 19:39, Sebastian Arcus wrote:
I'm having more and more problems with the HTML_IMAGE_ONLY_* set of 
rules recently generating false positives.


Plenty of business emails will include a logo at the bottom - and not 
everybody is a graphics expert to make their logo a tiny optimised 
gif or png - so some of these are slightly bigger than they should 
be.


However, this seems to be sufficiently wide spread. Also, many 
business emails can be just a few words reply - so the ratio of words 
to images triggers the filter in SA. Could the scores on 
HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there 
anything else to be done - aside from educating all the internet on 
optimising logos in the email signatures? :-)


those have lower scorew with BAYES and network rules enabled.
configure BAYES and enable netowrk rules...
--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Windows 2000: 640 MB ought to be enough for anybody