On 6/7/2011 7:22 PM, Adam Katz wrote: > From SA's perspective, since we don't use $& or its kin, a regex that > starts with an optional portion is the same as not having it. Therefore: > > header EXTRA_MPART_TYPE Content-Type =~ /(?:\s*multipart\/)?.* > type=/i > > is functionally equivalent to the faster/simpler: > > header EXTRA_MPART_TYPE Content-Type =~ / type=/i > > I could just remove that frivolous portion, but I wonder if something > else was intended...
I agree the two are equivalent. The starting "multipart" clause is made optional with a ?, therefore can be eliminated. Once it is removed the .* is now a leading optional clause, and can be dropped too. A lot of times rules start off with the writer thinking about the whole line, and after they put it down as a rule, there may be frivolous patterns... I've caught myself doing it dozens of times. I suspect (without really knowing) that happened here. Nice catch. +1 on making the change for efficiency. That said, I propose we should remove this rule entirely, or zero its score. The S/O of this rule is now 0.100 in ruleqa, and was 0.283 in the 3.3.0 corpus for set0. Clearly it is a lousy indicator of spam, but has a force-assigned score of 1.0. It was force-set to 1.0 due to FP problems back in the 3.1.x days, but clearly the S/O of this rule is well under 0.5, so any significant positive score is counter-productive to overall accuracy. Sure 1.0 isn't much score, but with that low a S/O it is still very misplaced. See history at: https://issues.apache.org/SpamAssassin/show_bug.cgi?id=5110