Re: SUBJECT_ENCODED_TWICE

Justin Mason Tue, 23 Jan 2007 07:48:02 -0800

[EMAIL PROTECTED] writes:
> On Tue, 16 Jan 2007 14:06:14 -0500, Theo Van Dinter <[EMAIL PROTECTED]>
> posted to spamassassin-devel:
>  > On Tue, Jan 16, 2007 at 10:49:36AM -0800, Karl Chen wrote:
>  >> As I understand it, this rule is intended to match subject lines
>  >> that were encoded, then RE-encoded recursively, or perhaps with
>  >> two different encodings in the same subject line.
>  >
>  > It looks for Subject headers which have multiple encodings in it.
>  >
>  >> However, this regexp also matches singly-encoded long subject
>  >> lines, since (from what I've seen) the subject string is broken up
>  >> and encoded per line:
>  >> Subject: 
> =?iso-8859-1?Q?Automatisk_svar_n=E5r_du_er_borte_fra_kontoret=3A_and_the_?=
>  >> =?iso-8859-1?Q?poor=2C_of_the_innocent_person_shall_in?=
>  >
>  > This isn't a single encoding, there are two independent encodings.
> 
> It's already been discussed before in SpamAssassin bug #5026
> <http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5026>
> but I think it should be emphasized again:
> 
>   * This rule is prone to produce a lot of false positives in any
>     locale where RFC2047 Subject encoding is a necessity (basically
>     any long header line which is split up like in the example above)
> 
>   * The rule's name and description are somewhat inaccurate
> 
> The first problem can be fixed by more international messages fed into
> the mass checks, I suppose.
> 
> The second will become moot if it is proven by mass-check results that
> the rule is flawed (-, but in the meantime, perhaps a bug should be
> filed about that?


hi Era --

fwiw: SUBJECT_ENCODED_TWICE has dropped in accuracy enough that it was
removed in SA 3.2.0.

--j.

Re: SUBJECT_ENCODED_TWICE

Reply via email to