On 6/7/2011 7:22 PM, Adam Katz wrote:
> From SA's perspective, since we don't use $& or its kin, a regex that
> starts with an optional portion is the same as not having it.  Therefore:
>
> header EXTRA_MPART_TYPE         Content-Type =~ /(?:\s*multipart\/)?.*
> type=/i
>
> is functionally equivalent to the faster/simpler:
>
> header EXTRA_MPART_TYPE         Content-Type =~ / type=/i
>
> I could just remove that frivolous portion, but I wonder if something
> else was intended...

I agree the two are equivalent. The starting "multipart" clause is made
optional with a ?, therefore can be eliminated. Once it is removed the
.* is now a leading optional clause, and can be dropped too.

A lot of times rules start off with the writer thinking about the whole
line, and after they put it down as a rule, there may be frivolous
patterns... I've caught myself doing it dozens of times. I suspect
(without really knowing) that happened here.

Nice catch.

+1 on making the change for efficiency.

That said, I propose we should remove this rule entirely, or zero its score.

The S/O of this rule is now 0.100 in ruleqa, and was 0.283 in the 3.3.0
corpus for set0.  Clearly it is a lousy indicator of spam, but has a
force-assigned score of 1.0.

It was force-set to 1.0 due to FP problems back in the 3.1.x days, but
clearly the S/O of this rule is well under 0.5, so any significant
positive score is counter-productive to overall accuracy. Sure 1.0 isn't
much score, but with that low a S/O it is still very misplaced.


See history at:
https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110




Reply via email to