[EMAIL PROTECTED] writes: > On Tue, 16 Jan 2007 14:06:14 -0500, Theo Van Dinter <[EMAIL PROTECTED]> > posted to spamassassin-devel: > > On Tue, Jan 16, 2007 at 10:49:36AM -0800, Karl Chen wrote: > >> As I understand it, this rule is intended to match subject lines > >> that were encoded, then RE-encoded recursively, or perhaps with > >> two different encodings in the same subject line. > > > > It looks for Subject headers which have multiple encodings in it. > > > >> However, this regexp also matches singly-encoded long subject > >> lines, since (from what I've seen) the subject string is broken up > >> and encoded per line: > >> Subject: > =?iso-8859-1?Q?Automatisk_svar_n=E5r_du_er_borte_fra_kontoret=3A_and_the_?= > >> =?iso-8859-1?Q?poor=2C_of_the_innocent_person_shall_in?= > > > > This isn't a single encoding, there are two independent encodings. > > It's already been discussed before in SpamAssassin bug #5026 > <http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5026> > but I think it should be emphasized again: > > * This rule is prone to produce a lot of false positives in any > locale where RFC2047 Subject encoding is a necessity (basically > any long header line which is split up like in the example above) > > * The rule's name and description are somewhat inaccurate > > The first problem can be fixed by more international messages fed into > the mass checks, I suppose. > > The second will become moot if it is proven by mass-check results that > the rule is flawed (-, but in the meantime, perhaps a bug should be > filed about that?
hi Era -- fwiw: SUBJECT_ENCODED_TWICE has dropped in accuracy enough that it was removed in SA 3.2.0. --j.
