> -----Original Message----- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Wednesday, March 08, 2006 17:25 > To: Theo Van Dinter > Cc: dev@spamassassin.apache.org > Subject: Re: move "full" rule functionality into a default-off plugin > > > Theo Van Dinter writes: > > On Wed, Mar 08, 2006 at 09:36:04PM +0000, Justin Mason wrote: > > > > If nothing else, I am for simply changing the way rawbody rules > > > > are evaluated... Because the current line by line evaluation is > > > > too restrictive, and using a handfull of rules and > meta'ing them > > > > together to match something that wraps across multiple > lines is kludgly at best. > > > > > > That is definitely a good idea. > > > > > > Are there any rawbody rules left anywhere that this would > break? I > > > think it's likely to be only an improvement. > > > > Hard to say, though I tend to agree. In our case, there are few > > rawbody rules (26), and fewer which aren't evals (18). > There's only > > one > > (HTML_TINY_FONT) which has a ".*" which would need some > help, and via > > discussion about the HTML*TINY* rules it could either be > replaced or > > removed without issue. > > > > Just so we're all clear... It seems like the proposal would be to > > change > > M::SA::Message::get_decoded_body_text_array() such that: > > > > push(@{$self->{text_decoded}}, > > split_into_array_of_short_lines($parts[$pt]->decode())); > > > > becomes > > > > my $text = $parts[$pt]->decode(); > > $text =~ tr/ \t\n\r\x0b\xa0/ /s; # whitespace => space > > push(@{$self->{text_decoded}}, > > split_into_array_of_short_lines($text)); > > > > Yes? > > I think it'd be without split_into_array_of_short_lines() -- > we want to offer the entire body as a string, not split at all. > > --j. > > > > It does introduce the danger of algorithmic complexity > attacks if .* > > > is used instead of .{0,20} though -- but we may be able > to help this > > > if we spot that kind of thing in --lint. > > > > <shrug> I worry more about full than rawbody in this case > since the > > full text is always going to be larger than rawbody, so the > potential > > for problems is greater. Even with the above code, the decoded > > portion is split to be under 1k, full is the size of the message. > >
this is a little off-topic, but related to the removal of full rule types, and what we would need in order to replace/supplement them. My idea is to allow for evaluation of the full mime structure, minus the content found in those structures. I'm looking to evaluate the data that one would normally find if you right click a message in Outlook and look at the header info. For example, in order to determine if there is an inline gif (recent stock spam), we have to use a full rule currently. Which as we know can be very inefficient. full SARE_GIF_ATTACH /name=\"[a-z]{3,18}\.gif\"/ What I really want is to do this evaluation on the entire mime info minus the content found within that mime part. Something like mimeheader SARE_GIF_ATTACH /name=\"[a-z]{3,18}\.gif\"/ Where mimeheader would contain this... Microsoft Mail Internet Headers Version 2.0 Received: from mailgw.nmgi.com ([172.17.1.100]) by exchange.nmgi.com with Microsoft SMTPSVC(6.0.3790.1830); Sun, 26 Feb 2006 12:32:52 -0600 Received: from host81.nmgi.com (HELO asp1.doublecheckemail.com) (64.218.27.81) by mailgw.nmgi.com with SMTP; 26 Feb 2006 18:32:47 -0000 Received: from p54981da9.dip0.t-ipconnect.de (84.152.29.169) by asp1.doublecheckemail.com with SMTP; 26 Feb 2006 18:31:12 -0000 Message-ID: <[EMAIL PROTECTED]> From: "qzntequmdxk" <[EMAIL PROTECTED]> To: [EMAIL PROTECTED] Subject: [SPAM-8.5]- [SPAM-5.2]- Fw: 290004 Date: Sun, 26 Feb 2006 19:32:43 -0000 MIME-Version: 1.0 Content-Type: multipart/related; type="multipart/alternative"; boundary="----=_NextPart_000_0007_01C63B0B.69D62A00" X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.2180 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180 X-Spam-Prev-Subject: Fw: 290004 X-Spam-Prev-Subject: [SPAM-5.2]- Fw: 290004 Return-Path: [EMAIL PROTECTED] X-OriginalArrivalTime: 26 Feb 2006 18:32:52.0102 (UTC) FILETIME=[0D517260:01C63B03] ------=_NextPart_000_0007_01C63B0B.69D62A00 Content-Type: multipart/alternative; boundary="----=_NextPart_001_0008_01C63B0B.69D62A00" ------=_NextPart_001_0008_01C63B0B.69D62A00 Content-Type: text/plain; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable ------=_NextPart_001_0008_01C63B0B.69D62A00 Content-Type: text/html; charset="Windows-1252" Content-Transfer-Encoding: quoted-printable ------=_NextPart_001_0008_01C63B0B.69D62A00-- ------=_NextPart_000_0007_01C63B0B.69D62A00 Content-Type: image/gif; name="tkvsumcgojm.gif" Content-Transfer-Encoding: base64 Content-ID: <[EMAIL PROTECTED]> ------=_NextPart_000_0007_01C63B0B.69D62A00-- I think either changing the full rule type to do the above, or adding a new rule type that contains this data would be a great thing for rule writers! And much more efficient. Cya, Dallas