> -----Original Message-----
> From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] 
> Sent: Wednesday, March 08, 2006 17:25
> To: Theo Van Dinter
> Cc: dev@spamassassin.apache.org
> Subject: Re: move "full" rule functionality into a default-off plugin 
> 
> 
> Theo Van Dinter writes:
> > On Wed, Mar 08, 2006 at 09:36:04PM +0000, Justin Mason wrote:
> > > > If nothing else, I am for simply changing the way rawbody rules 
> > > > are evaluated... Because the current line by line evaluation is 
> > > > too restrictive, and using a handfull of rules and 
> meta'ing them 
> > > > together to match something that wraps across multiple 
> lines is kludgly at best.
> > > 
> > > That is definitely a good idea.
> > > 
> > > Are there any rawbody rules left anywhere that this would 
> break? I 
> > > think it's likely to be only an improvement.
> > 
> > Hard to say, though I tend to agree.  In our case, there are few 
> > rawbody rules (26), and fewer which aren't evals (18).  
> There's only 
> > one
> > (HTML_TINY_FONT) which has a ".*" which would need some 
> help, and via 
> > discussion about the HTML*TINY* rules it could either be 
> replaced or 
> > removed without issue.
> > 
> > Just so we're all clear...  It seems like the proposal would be to 
> > change
> > M::SA::Message::get_decoded_body_text_array() such that:
> > 
> >     push(@{$self->{text_decoded}}, 
> > split_into_array_of_short_lines($parts[$pt]->decode()));
> > 
> > becomes
> > 
> >     my $text = $parts[$pt]->decode();
> >     $text =~ tr/ \t\n\r\x0b\xa0/ /s;    # whitespace => space
> >     push(@{$self->{text_decoded}}, 
> > split_into_array_of_short_lines($text));
> > 
> > Yes?
> 
> I think it'd be without split_into_array_of_short_lines() -- 
> we want to offer the entire body as a string, not split at all.
> 
> --j.
> 
> > > It does introduce the danger of algorithmic complexity 
> attacks if .* 
> > > is used instead of .{0,20} though -- but we may be able 
> to help this 
> > > if we spot that kind of thing in --lint.
> > 
> > <shrug>  I worry more about full than rawbody in this case 
> since the 
> > full text is always going to be larger than rawbody, so the 
> potential 
> > for problems is greater.  Even with the above code, the decoded 
> > portion is split to be under 1k, full is the size of the message.
> > 

this is a little off-topic, but related to the removal of full rule
types, and what we would need in order to replace/supplement them.

My idea is to allow for evaluation of the full mime structure, minus the
content found in those structures.   I'm looking to evaluate the data
that one would normally find if you right click a message in Outlook and
look at the header info.  

For example, in order to determine if there is an inline gif (recent
stock spam), we have to use a full rule currently.  Which as we know can
be very inefficient.  

full         SARE_GIF_ATTACH   /name=\"[a-z]{3,18}\.gif\"/

What I really want is to do this evaluation on the entire mime info
minus the content found within that mime part.  Something like

mimeheader   SARE_GIF_ATTACH  /name=\"[a-z]{3,18}\.gif\"/

Where mimeheader would contain this...  

Microsoft Mail Internet Headers Version 2.0
Received: from mailgw.nmgi.com ([172.17.1.100]) by exchange.nmgi.com
with Microsoft SMTPSVC(6.0.3790.1830);
         Sun, 26 Feb 2006 12:32:52 -0600
Received: from host81.nmgi.com (HELO asp1.doublecheckemail.com)
(64.218.27.81)
  by mailgw.nmgi.com with SMTP; 26 Feb 2006 18:32:47 -0000
Received: from p54981da9.dip0.t-ipconnect.de (84.152.29.169)
  by asp1.doublecheckemail.com with SMTP; 26 Feb 2006 18:31:12 -0000
Message-ID: <[EMAIL PROTECTED]>
From:   "qzntequmdxk" <[EMAIL PROTECTED]>
To: [EMAIL PROTECTED]
Subject: [SPAM-8.5]- [SPAM-5.2]- Fw: 290004
Date:   Sun, 26 Feb 2006 19:32:43 -0000
MIME-Version: 1.0
Content-Type: multipart/related;
        type="multipart/alternative";
        boundary="----=_NextPart_000_0007_01C63B0B.69D62A00"
X-Priority: 3
X-MSMail-Priority: Normal
X-Mailer: Microsoft Outlook Express 6.00.2900.2180
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.2180
X-Spam-Prev-Subject: Fw: 290004
X-Spam-Prev-Subject: [SPAM-5.2]- Fw: 290004
Return-Path: [EMAIL PROTECTED]
X-OriginalArrivalTime: 26 Feb 2006 18:32:52.0102 (UTC)
FILETIME=[0D517260:01C63B03]

------=_NextPart_000_0007_01C63B0B.69D62A00
Content-Type: multipart/alternative;
        boundary="----=_NextPart_001_0008_01C63B0B.69D62A00"

------=_NextPart_001_0008_01C63B0B.69D62A00
Content-Type: text/plain;
        charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

------=_NextPart_001_0008_01C63B0B.69D62A00
Content-Type: text/html;
        charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable


------=_NextPart_001_0008_01C63B0B.69D62A00--
------=_NextPart_000_0007_01C63B0B.69D62A00
Content-Type: image/gif;
        name="tkvsumcgojm.gif"
Content-Transfer-Encoding: base64
Content-ID: <[EMAIL PROTECTED]>


------=_NextPart_000_0007_01C63B0B.69D62A00--


I think either changing the full rule type to do the above, or adding a
new rule type that contains this data would be a great thing for rule
writers!  And much more efficient.

Cya,
Dallas



Reply via email to