Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
On Thu, Apr 13, 2006 at 01:35:19PM +0200, Mark Martinec wrote: > Agreed, this rule is completely inappropriate, it penalizes valid > encoding according to RFC 2047 and fires on any lengthier Subject > line in non-English language. It should disappear or have a > much reduced default score. Says you. ;) 1.047 1.4619 0.07920.949 0.580.89 SUBJECT_ENCODED_TWICE So in the results used to generate scores, that rule is ~94.9% accurate, and hits ~1.46% of all spam. In a recent nightly mass-check run: 1.153 1.4173 0.11510.925 0.730.89 SUBJECT_ENCODED_TWICE So more ham seems to use encoding twice in the subject, and a little less spam uses it. Based on this, my guess is the generated score would go down. The thing to remember about rules is that they neither necessarily look for RFC non-compliance, nor do they avoid RFC compliant mails. They look for features that hit spam and try to avoid hitting ham. The key there is that rule development occurs with the results people make available. If the people generating results don't receive ham mails that, for instance, use multiple encodings in a Subject header, the results won't indicate that it occurs in ham very much. -- Randomly Generated Tagline: "I protect home plate like a mormon girl on prom night." - Mimi on the Drew Carey show pgp7GImSPz38Z.pgp Description: PGP signature
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
On Donnerstag, 13. April 2006 13:35 Mark Martinec wrote: > Agreed, this rule is completely inappropriate, it penalizes valid > encoding according to RFC 2047 and fires on any lengthier Subject > line in non-English language. It should disappear or have a > much reduced default score. The problem seems to be that 1) most spam is english 2) most people contributing mass-checks are english speaking 3) therefore most ham+spam tested in mass-checks are english in order to improve the situation, more mass-check testers with non-english language ham+spam should contribute, see http://wiki.apache.org/spamassassin/MassCheck?highlight=%28mass%29 I'm not a SA dev, but I think they once wrote more supporters would be nice. I do mass-checks, and if somebody wants to help, I have a working script you can have in order to contribute to testing. It's a simple setup, and then your server has some work to do overnight. On mine, it's about 1 hour per night, so pas problem. mfg zmi -- // Michael Monnerie, Ing.BSc- http://it-management.at // Tel: 0660/4156531 .network.your.ideas. // PGP Key: "lynx -source http://zmi.at/zmi3.asc | gpg --import" // Fingerprint: 44A3 C1EC B71E C71A B4C2 9AA6 C818 847C 55CB A4EE // Keyserver: www.keyserver.net Key-ID: 0x55CBA4EE pgpRDuDm470m7.pgp Description: PGP signature
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
Kai Schaetzl wrote: > > I just saw that a normal Ebay outbid notice hit two high-score rules. One > > is from sare-spoof and I already contacted the maintainer. But one is in > > the default 3.1.1 ruleset and I think this rule should get completely > > removed or get a score of 0. It's > > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice Alan Premselaar: > This utterly wreaks havoc on just about all Japanese email, so I dropped > the score to nearly nothing. Agreed, this rule is completely inappropriate, it penalizes valid encoding according to RFC 2047 and fires on any lengthier Subject line in non-English language. It should disappear or have a much reduced default score. Mark
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Kai Schaetzl wrote: > I just saw that a normal Ebay outbid notice hit two high-score rules. One > is from sare-spoof and I already contacted the maintainer. But one is in > the default 3.1.1 ruleset and I think this rule should get completely > removed or get a score of 0. It's > > 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice > > From grepping the rules it does what it says: it checks if there are two > B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or > at all? This is absolutely valid Q/B encoding and actually *required* by > RFC if your subject line is longer than 80 (or was it 72?) characters > (minus the encoding, so it's actually more like a 60 raw character limit). > This rule will hit on *lots* of non-ASCII mail and on almost all mail > coming from Ebay Germany. > > There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which > are "similar". QP scores 0 and BASE64 scores 0.449. This is much more > reasonable. > > Kai > This utterly wreaks havoc on just about all Japanese email, so I dropped the score to nearly nothing. alan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.1 (Darwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFEPfgmE2gsBSKjZHQRAt82AKDAY4xTmST0kaY5cje1xH1ScDajOACg6fMH msifLKqJuv1IpudxbKGDcfQ= =ZDQE -END PGP SIGNATURE-
Re: 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice
Kai Schaetzl wrote: I just saw that a normal Ebay outbid notice hit two high-score rules. One is from sare-spoof and I already contacted the maintainer. But one is in the default 3.1.1 ruleset and I think this rule should get completely removed or get a score of 0. It's 1.72 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice From grepping the rules it does what it says: it checks if there are two B/Q encoding identifiers in the subject. Why is this scoring with 1.72 or at all? This is absolutely valid Q/B encoding and actually *required* by RFC if your subject line is longer than 80 (or was it 72?) characters (minus the encoding, so it's actually more like a 60 raw character limit). This rule will hit on *lots* of non-ASCII mail and on almost all mail coming from Ebay Germany. There are also the rules SUBJECT_EXCESS_QP and SUBJECT_EXCESS_BASE64 which are "similar". QP scores 0 and BASE64 scores 0.449. This is much more reasonable. same here (multiple FPs). I disabled these rules. many popular MSPs here in .fr use software that trigger these. The days I feel angry and bad, I can block caramail, but I can never block laposte.net and wanadoo... For similar reasons, I had to disable (after lowering the score incrementally) some *bl lists. for now, these are sorbs, rfci, bad_whois, spamcops. the list seems growing:)