http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822
Summary: make 'rawbody' rules match the full message string, not
line-by-line
Product: Spamassassin
Version: SVN Trunk (Latest Devel Version)
Platform: Other
OS/Version: other
Status: NEW
Severity: major
Priority: P5
Component: Libraries
AssignedTo: [email protected]
ReportedBy: [EMAIL PROTECTED]
let's make a bug!
Loren: (?)
> > If nothing else, I am for simply changing the way rawbody rules are
> > evaluated... Because the current line by line evaluation is too
> > restrictive, and using a handfull of rules and meta'ing them together to
> > match something that wraps across multiple lines is kludgly at best.
Justin:
> That is definitely a good idea.
Justin:
> Are there any rawbody rules left anywhere that this would break? I think
> it's likely to be only an improvement.
Theo: 'Hard to say, though I tend to agree. In our case, there are few
rawbody rules (26), and fewer which aren't evals (18). There's only one
(HTML_TINY_FONT) which has a ".*" which would need some help, and via
discussion about the HTML*TINY* rules it could either be replaced or
removed without issue.'
....
Just so we're all clear... It seems like the proposal would be to change
M::SA::Message::get_decoded_body_text_array() such that:
push(@{$self->{text_decoded}},
split_into_array_of_short_lines($parts[$pt]->decode()));
becomes (corrected by Justin):
my $text = $parts[$pt]->decode();
$text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace => space
$self->{text_decoded} = [ $text ];
...
Justin:
> It does introduce the danger of algorithmic complexity attacks
> if .* is used instead of .{0,20} though -- but we may be able to help
> this if we spot that kind of thing in --lint.
Theo: '<shrug> I worry more about full than rawbody in this case since the
full text is always going to be larger than rawbody, so the potential
for problems is greater. Even with the above code, the decoded portion
is split to be under 1k, full is the size of the message.'
On Wed, Mar 08, 2006 at 11:24:59PM +0000, Justin Mason wrote:
> I think it'd be without split_into_array_of_short_lines() -- we
> want to offer the entire body as a string, not split at all.
Theo: 'Well, here's the rule differences from original to the non-split version:
< 0.149 0.1540 0.1215 0.559 BLANK_LINES_70_80
> 0.006 0.0072 0.0000 1.000 BLANK_LINES_70_80
< 0.003 0.0036 0.0000 1.000 BLANK_LINES_80_90
> 0.000 0.0000 0.0000 0.500 BLANK_LINES_80_90
< 0.021 0.0107 0.0810 0.117 HIDE_WIN_STATUS
> 0.024 0.0143 0.0810 0.150 HIDE_WIN_STATUS
< 1.041 1.2213 0.0202 0.984 INTERRUPTUS
> 1.272 1.4936 0.0202 0.987 INTERRUPTUS
< 0.030 0.0251 0.0607 0.292 OBFUSCATING_COMMENT
> 0.043 0.0322 0.1012 0.242 OBFUSCATING_COMMENT
< 0.003 0.0000 0.0202 0.000 __HS_FUNNY_BODY_FROM
> 0.000 0.0000 0.0000 0.500 __HS_FUNNY_BODY_FROM
< 0.003 0.0036 0.0000 1.000 __HS_NONDEFAULT_OE_QUOTE
> 0.055 0.0036 0.3441 0.010 __HS_NONDEFAULT_OE_QUOTE
< 0.009 0.0000 0.0607 0.000 __HS_ODD_ORIGINAL_MESSAGE
> 0.000 0.0000 0.0000 0.500 __HS_ODD_ORIGINAL_MESSAGE
< 3.055 0.3080 18.5830 0.016 __HS_QUOTE
> 0.222 0.1110 0.8502 0.116 __HS_QUOTE
< 0.015 0.0179 0.0000 1.000 __OBFUSCATING_COMMENT_A
> 0.030 0.0358 0.0000 1.000 __OBFUSCATING_COMMENT_A
< 0.186 0.0537 0.9312 0.055 __OBFUSCATING_COMMENT_B
> 0.332 0.2185 0.9717 0.184 __OBFUSCATING_COMMENT_B
in general, good rules are improved, bad rules stay bad rules. Timing is
basically unchanged. So it's fine with me to move forward on making
the change for 3.2.'
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.