Re: Re: HTML_IMAGE_ONLY_* generating too many FP's
On 12/5/2017 5:28 AM, Sebastian Arcus wrote: On 02/12/17 18:45, David Jones wrote: On 12/02/2017 11:22 AM, Sebastian Arcus wrote: On 02/12/17 13:06, Matus UHLAR - fantomas wrote: On 12/01/2017 11:17 AM, Sebastian Arcus wrote: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] On 01/12/17 10:54, Axb wrote: you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.00.8 h maybe you should rethink those changes. On 01.12.17 12:23, Sebastian Arcus wrote: Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying. You should understand that when you start tuning scores, you can get to hell very fast. unless you do your own mass-checks and tune according to them. I'm not too sure I understand this attitude. The whole reason I started to tweak the scores for certain rules is that too much spam was going through. The false negatives have gone down considerably since I have altered the scores - and yes, I do keep an eye on them constantly and adjust depending on the number of false positive and negatives, and what triggers what. I also use network tests / RBL's as well and Bayes. The simple fact of the matter is that on plenty of spam emails, only one significant rule might get triggered - be it a high bayes score, one of the DNS RBL's or something else. If the rule doesn't have a high enough score, the email passes through. Spammers change their tactics and content of their emails all the time - and the rule scores haven't been updated in months - because of the problems with the updating system (which is not a criticism - I understand the situation). So for people to advise sticking religiously to the default scores, well, frankly I don't get it. The rulesets and dynamic scores in 72_scores.cf are updating again for the past 2 weeks. I recommend only changing a few of the default scores and make meta rules that combine the hits to add points when you see a pattern of 2 or more rules being hit. If you add enough add-ons to your SA instance, then you shouldn't be impacted too much by the default scores. SA has to be generic out of the box to cover all types of mail flow. You have to tune it a bit for your particular recipients, language, and location. See my email moments ago about tuning suggestions. I used to constantly adjust scores to react to new spam campaigns but found I was always behind the spammers. The more RBLs and meta rules you can setup, the more you can stay ahead of them. Compromised accounts are the exception to this with zero-hour spam that is very difficult to block so try to keep that separate in your mind and not chase after those with score adjustments. These tend to stop automatically after 30 minutes or so when RBLs and DCC catch up to them or the account gets locked or it's password changed. I report these to Spamcop as quickly as I can. Thank you David. Those are useful tips I have also encountered FPs due to the scores of all the HTML_IMAGE_ONLY_* rules. I have changed their score to be 0.001. I have meta rules that combine __HTML_IMG_ONLY with the RBLs, and I've found that to be useful. But for some reason, __HTML_IMG_ONLY does not include HTML_IMAGE_ONLY_32. Is there any reason that this was left out? - Mark
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 02/12/17 18:45, David Jones wrote: On 12/02/2017 11:22 AM, Sebastian Arcus wrote: On 02/12/17 13:06, Matus UHLAR - fantomas wrote: On 12/01/2017 11:17 AM, Sebastian Arcus wrote: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] On 01/12/17 10:54, Axb wrote: you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.0 0.8 h maybe you should rethink those changes. On 01.12.17 12:23, Sebastian Arcus wrote: Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying. You should understand that when you start tuning scores, you can get to hell very fast. unless you do your own mass-checks and tune according to them. I'm not too sure I understand this attitude. The whole reason I started to tweak the scores for certain rules is that too much spam was going through. The false negatives have gone down considerably since I have altered the scores - and yes, I do keep an eye on them constantly and adjust depending on the number of false positive and negatives, and what triggers what. I also use network tests / RBL's as well and Bayes. The simple fact of the matter is that on plenty of spam emails, only one significant rule might get triggered - be it a high bayes score, one of the DNS RBL's or something else. If the rule doesn't have a high enough score, the email passes through. Spammers change their tactics and content of their emails all the time - and the rule scores haven't been updated in months - because of the problems with the updating system (which is not a criticism - I understand the situation). So for people to advise sticking religiously to the default scores, well, frankly I don't get it. The rulesets and dynamic scores in 72_scores.cf are updating again for the past 2 weeks. I recommend only changing a few of the default scores and make meta rules that combine the hits to add points when you see a pattern of 2 or more rules being hit. If you add enough add-ons to your SA instance, then you shouldn't be impacted too much by the default scores. SA has to be generic out of the box to cover all types of mail flow. You have to tune it a bit for your particular recipients, language, and location. See my email moments ago about tuning suggestions. I used to constantly adjust scores to react to new spam campaigns but found I was always behind the spammers. The more RBLs and meta rules you can setup, the more you can stay ahead of them. Compromised accounts are the exception to this with zero-hour spam that is very difficult to block so try to keep that separate in your mind and not chase after those with score adjustments. These tend to stop automatically after 30 minutes or so when RBLs and DCC catch up to them or the account gets locked or it's password changed. I report these to Spamcop as quickly as I can. Thank you David. Those are useful tips.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 12/02/2017 11:22 AM, Sebastian Arcus wrote: On 02/12/17 13:06, Matus UHLAR - fantomas wrote: On 12/01/2017 11:17 AM, Sebastian Arcus wrote: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] On 01/12/17 10:54, Axb wrote: you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.0 0.8 h maybe you should rethink those changes. On 01.12.17 12:23, Sebastian Arcus wrote: Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying. You should understand that when you start tuning scores, you can get to hell very fast. unless you do your own mass-checks and tune according to them. I'm not too sure I understand this attitude. The whole reason I started to tweak the scores for certain rules is that too much spam was going through. The false negatives have gone down considerably since I have altered the scores - and yes, I do keep an eye on them constantly and adjust depending on the number of false positive and negatives, and what triggers what. I also use network tests / RBL's as well and Bayes. The simple fact of the matter is that on plenty of spam emails, only one significant rule might get triggered - be it a high bayes score, one of the DNS RBL's or something else. If the rule doesn't have a high enough score, the email passes through. Spammers change their tactics and content of their emails all the time - and the rule scores haven't been updated in months - because of the problems with the updating system (which is not a criticism - I understand the situation). So for people to advise sticking religiously to the default scores, well, frankly I don't get it. The rulesets and dynamic scores in 72_scores.cf are updating again for the past 2 weeks. I recommend only changing a few of the default scores and make meta rules that combine the hits to add points when you see a pattern of 2 or more rules being hit. If you add enough add-ons to your SA instance, then you shouldn't be impacted too much by the default scores. SA has to be generic out of the box to cover all types of mail flow. You have to tune it a bit for your particular recipients, language, and location. See my email moments ago about tuning suggestions. I used to constantly adjust scores to react to new spam campaigns but found I was always behind the spammers. The more RBLs and meta rules you can setup, the more you can stay ahead of them. Compromised accounts are the exception to this with zero-hour spam that is very difficult to block so try to keep that separate in your mind and not chase after those with score adjustments. These tend to stop automatically after 30 minutes or so when RBLs and DCC catch up to them or the account gets locked or it's password changed. I report these to Spamcop as quickly as I can. -- David Jones
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 02/12/17 13:06, Matus UHLAR - fantomas wrote: On 12/01/2017 11:17 AM, Sebastian Arcus wrote: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] On 01/12/17 10:54, Axb wrote: you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.0 0.8 h maybe you should rethink those changes. On 01.12.17 12:23, Sebastian Arcus wrote: Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying. You should understand that when you start tuning scores, you can get to hell very fast. unless you do your own mass-checks and tune according to them. I'm not too sure I understand this attitude. The whole reason I started to tweak the scores for certain rules is that too much spam was going through. The false negatives have gone down considerably since I have altered the scores - and yes, I do keep an eye on them constantly and adjust depending on the number of false positive and negatives, and what triggers what. I also use network tests / RBL's as well and Bayes. The simple fact of the matter is that on plenty of spam emails, only one significant rule might get triggered - be it a high bayes score, one of the DNS RBL's or something else. If the rule doesn't have a high enough score, the email passes through. Spammers change their tactics and content of their emails all the time - and the rule scores haven't been updated in months - because of the problems with the updating system (which is not a criticism - I understand the situation). So for people to advise sticking religiously to the default scores, well, frankly I don't get it.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 12/01/2017 11:17 AM, Sebastian Arcus wrote: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] On 01/12/17 10:54, Axb wrote: you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.0 0.8 h maybe you should rethink those changes. On 01.12.17 12:23, Sebastian Arcus wrote: Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying. You should understand that when you start tuning scores, you can get to hell very fast. unless you do your own mass-checks and tune according to them. -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. 42.7 percent of all statistics are made up on the spot.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 28.11.17 19:39, Sebastian Arcus wrote: I'm having more and more problems with the HTML_IMAGE_ONLY_* set of rules recently generating false positives. On 30/11/17 12:45, Matus UHLAR - fantomas wrote: those have lower scorew with BAYES and network rules enabled. configure BAYES and enable netowrk rules... On 01.12.17 10:17, Sebastian Arcus wrote: Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you mean by network rules?). I still think that a score of 1.6 is quite a lot, considering that so many emails nowadays contain either an embedded logo in the signature, with just a few words (in a quick email reply, for example), or even images inserted, instead of attached to the email. Please see below an example of a SA report: 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] configuring BAYES includes training it, so your mail don't get 0.48 score. 2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/) now I really wonder why you blame HTML_IMAGE_ONLY_24, when BAYES_50 and PYZOR_CHECK gave you higher score each? -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. I just got lost in thought. It was unfamiliar territory.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 01/12/17 10:54, Axb wrote: On 12/01/2017 11:17 AM, Sebastian Arcus wrote: On 30/11/17 12:45, Matus UHLAR - fantomas wrote: On 28.11.17 19:39, Sebastian Arcus wrote: I'm having more and more problems with the HTML_IMAGE_ONLY_* set of rules recently generating false positives. Plenty of business emails will include a logo at the bottom - and not everybody is a graphics expert to make their logo a tiny optimised gif or png - so some of these are slightly bigger than they should be. However, this seems to be sufficiently wide spread. Also, many business emails can be just a few words reply - so the ratio of words to images triggers the filter in SA. Could the scores on HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything else to be done - aside from educating all the internet on optimising logos in the email signatures? :-) those have lower scorew with BAYES and network rules enabled. configure BAYES and enable netowrk rules... Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you mean by network rules?). I still think that a score of 1.6 is quite a lot, considering that so many emails nowadays contain either an embedded logo in the signature, with just a few words (in a quick email reply, for example), or even images inserted, instead of attached to the email. Please see below an example of a SA report: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.0 0.8 h maybe you should rethink those changes. Indeed, I did amend some of the default SA scores, to catch more spam for the type of email received at this particular site. That doesn't change the fact that 1.6 seems to me a pretty high score for a rule which would be triggered on such a large number of ham emails. Just saying.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 12/01/2017 11:17 AM, Sebastian Arcus wrote: On 30/11/17 12:45, Matus UHLAR - fantomas wrote: On 28.11.17 19:39, Sebastian Arcus wrote: I'm having more and more problems with the HTML_IMAGE_ONLY_* set of rules recently generating false positives. Plenty of business emails will include a logo at the bottom - and not everybody is a graphics expert to make their logo a tiny optimised gif or png - so some of these are slightly bigger than they should be. However, this seems to be sufficiently wide spread. Also, many business emails can be just a few words reply - so the ratio of words to images triggers the filter in SA. Could the scores on HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything else to be done - aside from educating all the internet on optimising logos in the email signatures? :-) those have lower scorew with BAYES and network rules enabled. configure BAYES and enable netowrk rules... Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you mean by network rules?). I still think that a score of 1.6 is quite a lot, considering that so many emails nowadays contain either an embedded logo in the signature, with just a few words (in a quick email reply, for example), or even images inserted, instead of attached to the email. Please see below an example of a SA report: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECK Listed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org] you've changed SA default scores and now complain about one which hasn't been touched as cause for FPs? compare the defaults with yours... score PYZOR_CHECK 0 1.985 0 1.392 # n=0 n=2 score BAYES_50 0 0 2.00.8 h maybe you should rethink those changes.
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 30/11/17 12:45, Matus UHLAR - fantomas wrote: On 28.11.17 19:39, Sebastian Arcus wrote: I'm having more and more problems with the HTML_IMAGE_ONLY_* set of rules recently generating false positives. Plenty of business emails will include a logo at the bottom - and not everybody is a graphics expert to make their logo a tiny optimised gif or png - so some of these are slightly bigger than they should be. However, this seems to be sufficiently wide spread. Also, many business emails can be just a few words reply - so the ratio of words to images triggers the filter in SA. Could the scores on HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything else to be done - aside from educating all the internet on optimising logos in the email signatures? :-) those have lower scorew with BAYES and network rules enabled. configure BAYES and enable netowrk rules... Hi. I have BAYES enabled and DNSBL's enabled (I assume that's what you mean by network rules?). I still think that a score of 1.6 is quite a lot, considering that so many emails nowadays contain either an embedded logo in the signature, with just a few words (in a quick email reply, for example), or even images inserted, instead of attached to the email. Please see below an example of a SA report: -0.2 RCVD_IN_MSPIKE_H2 RBL: Average reputation (+2) [212.227.126.131 listed in wl.mailspike.net] 0.4 MIME_HTML_MOSTLY BODY: Multipart message mostly text/html MIME 1.6 HTML_IMAGE_ONLY_24 BODY: HTML: images with 2000-2400 bytes of words 2.0 BAYES_50 BODY: Bayes spam probability is 40 to 60% [score: 0.4808] 0.8 MPART_ALT_DIFF BODY: HTML and text parts are different 0.0 HTML_MESSAGE BODY: HTML included in message 2.5 PYZOR_CHECKListed in Pyzor (http://pyzor.sf.net/) -0.0 RCVD_IN_DNSWL_NONE RBL: Sender listed at http://www.dnswl.org/, no trust [212.227.126.131 listed in list.dnswl.org]
Re: HTML_IMAGE_ONLY_* generating too many FP's
On 28.11.17 19:39, Sebastian Arcus wrote: I'm having more and more problems with the HTML_IMAGE_ONLY_* set of rules recently generating false positives. Plenty of business emails will include a logo at the bottom - and not everybody is a graphics expert to make their logo a tiny optimised gif or png - so some of these are slightly bigger than they should be. However, this seems to be sufficiently wide spread. Also, many business emails can be just a few words reply - so the ratio of words to images triggers the filter in SA. Could the scores on HTML_IMAGE_ONLY_* set of rules be lowered a bit - or is there anything else to be done - aside from educating all the internet on optimising logos in the email signatures? :-) those have lower scorew with BAYES and network rules enabled. configure BAYES and enable netowrk rules... -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Windows 2000: 640 MB ought to be enough for anybody