[Bug 4822] New: make 'rawbody' rules match the full message string, not line-by-line

bugzilla-daemon Thu, 09 Mar 2006 03:52:45 -0800

http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4822


           Summary: make 'rawbody' rules match the full message string, not
                    line-by-line
           Product: Spamassassin
           Version: SVN Trunk (Latest Devel Version)
          Platform: Other
        OS/Version: other
            Status: NEW
          Severity: major
          Priority: P5
         Component: Libraries
        AssignedTo: [email protected]
        ReportedBy: [EMAIL PROTECTED]


let's make a bug!

Loren: (?)
> > If nothing else, I am for simply changing the way rawbody rules are
> > evaluated... Because the current line by line evaluation is too
> > restrictive, and using a handfull of rules and meta'ing them together to
> > match something that wraps across multiple lines is kludgly at best.

Justin:
> That is definitely a good idea.

Justin:
> Are there any rawbody rules left anywhere that this would break? I think
> it's likely to be only an improvement.

Theo: 'Hard to say, though I tend to agree.  In our case, there are few
rawbody rules (26), and fewer which aren't evals (18).  There's only one
(HTML_TINY_FONT) which has a ".*" which would need some help, and via
discussion about the HTML*TINY* rules it could either be replaced or
removed without issue.'


....

Just so we're all clear...  It seems like the proposal would be to change
M::SA::Message::get_decoded_body_text_array() such that:

    push(@{$self->{text_decoded}},
split_into_array_of_short_lines($parts[$pt]->decode()));

becomes (corrected by Justin):

    my $text = $parts[$pt]->decode();
    $text =~ tr/ \t\n\r\x0b\xa0/ /s;    # whitespace => space
    $self->{text_decoded} = [ $text ];

...

Justin:
> It does introduce the danger of algorithmic complexity attacks
> if .* is used instead of .{0,20} though -- but we may be able to help
> this if we spot that kind of thing in --lint.

Theo: '<shrug>  I worry more about full than rawbody in this case since the
full text is always going to be larger than rawbody, so the potential
for problems is greater.  Even with the above code, the decoded portion
is split to be under 1k, full is the size of the message.'


On Wed, Mar 08, 2006 at 11:24:59PM +0000, Justin Mason wrote:
> I think it'd be without split_into_array_of_short_lines() -- we
> want to offer the entire body as a string, not split at all.

Theo: 'Well, here's the rule differences from original to the non-split version:

<   0.149   0.1540   0.1215    0.559       BLANK_LINES_70_80
>   0.006   0.0072   0.0000    1.000       BLANK_LINES_70_80
<   0.003   0.0036   0.0000    1.000       BLANK_LINES_80_90
>   0.000   0.0000   0.0000    0.500       BLANK_LINES_80_90
<   0.021   0.0107   0.0810    0.117       HIDE_WIN_STATUS
>   0.024   0.0143   0.0810    0.150       HIDE_WIN_STATUS
<   1.041   1.2213   0.0202    0.984       INTERRUPTUS
>   1.272   1.4936   0.0202    0.987       INTERRUPTUS
<   0.030   0.0251   0.0607    0.292       OBFUSCATING_COMMENT
>   0.043   0.0322   0.1012    0.242       OBFUSCATING_COMMENT
<   0.003   0.0000   0.0202    0.000       __HS_FUNNY_BODY_FROM
>   0.000   0.0000   0.0000    0.500       __HS_FUNNY_BODY_FROM
<   0.003   0.0036   0.0000    1.000       __HS_NONDEFAULT_OE_QUOTE
>   0.055   0.0036   0.3441    0.010       __HS_NONDEFAULT_OE_QUOTE
<   0.009   0.0000   0.0607    0.000       __HS_ODD_ORIGINAL_MESSAGE
>   0.000   0.0000   0.0000    0.500       __HS_ODD_ORIGINAL_MESSAGE
<   3.055   0.3080  18.5830    0.016       __HS_QUOTE
>   0.222   0.1110   0.8502    0.116       __HS_QUOTE
<   0.015   0.0179   0.0000    1.000       __OBFUSCATING_COMMENT_A
>   0.030   0.0358   0.0000    1.000       __OBFUSCATING_COMMENT_A
<   0.186   0.0537   0.9312    0.055       __OBFUSCATING_COMMENT_B
>   0.332   0.2185   0.9717    0.184       __OBFUSCATING_COMMENT_B

in general, good rules are improved, bad rules stay bad rules.  Timing is
basically unchanged.  So it's fine with me to move forward on making
the change for 3.2.'



------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.

[Bug 4822] New: make 'rawbody' rules match the full message string, not line-by-line

Reply via email to