Re: [sa] regex anchor for start of line in body
On Wed, July 8, 2009 06:41, Charles Gregory wrote: So the desired test is: do you have a dual quad core that idles ? :) rawbody LOC_09070702 /^Assets of my deceased Client/m rawbody takes more cpu power then body LOC09070702 /\bAssets of my deceased Client\b/ why missing /i ? and why exact match on begin of line ? another way to catch it body __A1 /\basserts\b/i body __A2 /\bof\b/i body __A3 /\bmy\b/i body __A4 /\bdeceased\b/i body __A5 /\bclient\b/i meta LOC09070702 (__A1 __A2 __A3 __A4 __A5) ... ... if in my example all 5 words is found in body it will hit -- xpoint
Re: regex anchor for start of line in body
On Wed, 8 Jul 2009, Benny Pedersen wrote: do you have a dual quad core that idles ? :) I have a dual Pentium-III that idles 99% of the time, yes. rawbody takes more cpu power then (body) I wouldn't think that it takes much more as the only difference is whether HTML is still present why missing /i ? and why exact match on begin of line ? I use these rules as quick 'poison pill' rules added as needed, then remove them a few weeks later. The use of case-sensitive matching and exact line matching are intended to match the spam as exactly as possible and minimize the possibility of FP's. Someone could very well have a deceased client of some kind, but it's not likely that ham will use that exact phrase, with that capitalization, all alone on a single line (the original regex matches beginning to END of the line). Also, anchoring tests to the beginning or end of lines should improve efficiency, as the only places it will check the regex is at line breaks. body __A1 /\basserts\b/i body __A2 /\bof\b/i body __A3 /\bmy\b/i body __A4 /\bdeceased\b/i body __A5 /\bclient\b/i meta LOC09070702 (__A1 __A2 __A3 __A4 __A5) Far too much chance of FP's. Given that 'for' and 'my' occur in many e-mails, you are really basing this on 'deceased', 'client' and 'assets'. - C
Re: [sa] regex anchor for start of line in body
On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote: I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. What the.? So am I! I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! - Charles
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009, Charles Gregory wrote: I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! Just for interest sake, I am putting my 'test line' here Assets of my deceased Client ...just to see if it is my testing method that is broken - Charles
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009, Charles Gregory wrote: X-Spam-Status: No, hits=-2004.0 required=10.0 autolearn=disabled tests=LOC_SAUSERS_RCVD_WL=-1000,LOC_SAUSERS_TO_WL=-1000, RCVD_IN_DNSWL_MED=-4 On Tue, 7 Jul 2009, Charles Gregory wrote: I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! Just for interest sake, I am putting my 'test line' here Assets of my deceased Client ...just to see if it is my testing method that is broken And no, it doesn't (sigh) - C
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009, Charles Gregory wrote: On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote: I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. What the.? So am I! I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! Post a sample email that you're trying to match. Bear in mind, body rules work on modified body text. The fact that text appears at the beginning of a line when displayed in your mail client (or even in a text editor editing the raw message file) does not reliably imply it's at the beginning of a line in the text body rules are matching against. See the ALL_BODY troubleshooting rule I suggested for test use. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- There is no doubt in my mind that millions of lives could have been saved if the people were not brainwashed about gun ownership and had been well armed. ... Gun haters always want to forget the Warsaw Ghetto uprising, which is a perfect example of how a ragtag, half-starved group of Jews took 10 handguns and made asses out of the Nazis.-- Theodore Haas, Dachau survivor --- Today: Robert Heinlein's 102nd birthday
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009, Charles Gregory wrote: Just for interest sake, I am putting my 'test line' here Assets of my deceased Client ...just to see if it is my testing method that is broken The body rule is comparing against a cleaned up paragraph where those lines are joined. Otherwise inserting line breaks would be a trivial way to avoid many SA rules. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- There is no doubt in my mind that millions of lives could have been saved if the people were not brainwashed about gun ownership and had been well armed. ... Gun haters always want to forget the Warsaw Ghetto uprising, which is a perfect example of how a ragtag, half-starved group of Jews took 10 handguns and made asses out of the Nazis.-- Theodore Haas, Dachau survivor --- Today: Robert Heinlein's 102nd birthday
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009 14:57:59 -0400 (EDT) Charles Gregory cgreg...@hwcn.org wrote: On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote: I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. What the.? So am I! I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! From man Mail::SpamAssassin::Conf body SYMBOLIC_TEST_NAME /pattern/modifiers Define a body pattern test. pattern is a Perl regular expression. Note: as per the header tests, # must be escaped (\#) or else it is considered the beginning of a comment. The 'body' in this case is the textual parts of the message body; any non-text MIME parts are stripped, and the message decoded from Quoted-Printable or Base-64-encoded format if necessary. The message Subject header is considered part of the body and becomes the first paragraph when running the rules. All HTML tags and line breaks will be removed before matching.
Re: [sa] regex anchor for start of line in body
On Tue, 7 Jul 2009, Charles Gregory wrote: On Tue, 7 Jul 2009, Charles Gregory wrote: I have tried all combinations of: body LOC_09070701 /^Assets of my deceased Client/ body LOC_09070702 /^Assets of my deceased Client/m body LOC_09070703 /^Assets of my deceased Client/ms And NONE of them match the beginning of line! Sorry. I started typing this in the afternoon then got called away from the keyboard. Hope I didn't waste too many people's time Bottom line: I need to RTFM more *literally*. The man itself says, of the 'body' test, that all line breaks are removed before matching. So strictly speaking, there is way to make the 'body' test match a string anchored to the beginning of a line. To achieve the desired result, we need to use a 'rawbody' test with the m option (but NOT the s option!). Yes, this means that we might have to code the regex to handle some HTML (sigh) So the desired test is: rawbody LOC_09070702 /^Assets of my deceased Client/m - Charles
regex anchor for start of line in body
I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. e.g., suppose I get a lot of phish which contain text (not html) like this: Username:.. Password:.. I try what seemed intuitively easy: body__PHISH1/^Password\b/i body__PHISH0/^Username\b/i metaPHISH __PHISH1 __PHISH0 But the rule does not hit unless I remove the '^' from the above regex. What am I missing? Thanks, Fletcher fletcher at cs.utexas.edu
Re: regex anchor for start of line in body
Fletcher, I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. e.g., suppose I get a lot of phish which contain text (not html) like this: Username:.. Password:.. I try what seemed intuitively easy: body __PHISH1/^Password\b/i body __PHISH0/^Username\b/i metaPHISH __PHISH1 __PHISH0 But the rule does not hit unless I remove the '^' from the above regex. What am I missing? The /m flag probably. It is almost always wrong (or irrelevant) to leave out the /m flag on regexp rules which contain anchors like ^ and $ (especially on header rules). Try: body __PHISH1 /^Password\b/im Mark
Re: regex anchor for start of line in body
On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote: I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. e.g., suppose I get a lot of phish which contain text (not html) like this: Username:.. Password:.. You might want to look at the FILL_THIS_FORM stuff I posted in the last few days. I try what seemed intuitively easy: body__PHISH1/^Password\b/i body__PHISH0/^Username\b/i metaPHISH __PHISH1 __PHISH0 But the rule does not hit unless I remove the '^' from the above regex. What am I missing? ...that body rules work on a cleaned-up body. Lines that look like they should make up a paragraph are joined together and whitespace is collapsed. Add this to your testbed and run with --debug area=all,rules to see what it's _really_ comparing to for body rules: body ALL_BODY /.+/ You also need a m flag. :) -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- The big news on the streets today is that the people of Baqubah are generally ecstatic, although many hold in reserve a serious concern that we will abandon them again. For many Iraqis, we have morphed from being invaders to occupiers to members of a tribe. -- Michael Yon, 05 July 2007 --- Tomorrow: Robert Heinlein's 102nd birthday
Re: regex anchor for start of line in body
On Mon, 6 Jul 2009 17:58:59 -0500 info-spamassassin-t...@cs.utexas.edu wrote: I seem to be having a hard time writing rules which anchor a string to the start of the line in the body of a text message. e.g., suppose I get a lot of phish which contain text (not html) like this: Username:.. Password:.. I try what seemed intuitively easy: body __PHISH1/^Password\b/i body __PHISH0/^Username\b/i metaPHISH __PHISH1 __PHISH0 As has already been said, line-breaks are removed in body tests, but even if they weren't, the test would be likely to FP on website sign-up replies. It might be better to looks for username and password separated by a suitable pattern of whitespace and punctuation.
Re: regex anchor for start of line in body
On Tue, July 7, 2009 00:58, info-spamassassin-t...@cs.utexas.edu wrote: body __PHISH1/^Password\b/i body __PHISH0/^Username\b/i metaPHISH __PHISH1 __PHISH0 But the rule does not hit unless I remove the '^' from the above regex. What am I missing? replace ^ with \b /i case is not important so you can also have lowercase U and P :) -- xpoint