Re: [sa] regex anchor for start of line in body

2009-07-08 Thread Benny Pedersen

On Wed, July 8, 2009 06:41, Charles Gregory wrote:

 So the desired test is:

do you have a dual quad core that idles ? :)

 rawbody  LOC_09070702 /^Assets of my deceased Client/m

rawbody takes more cpu power then

body LOC09070702 /\bAssets of my deceased Client\b/

why missing /i ?
and why exact match on begin of line ?

another way to catch it

body __A1 /\basserts\b/i
body __A2 /\bof\b/i
body __A3 /\bmy\b/i
body __A4 /\bdeceased\b/i
body __A5 /\bclient\b/i
meta LOC09070702 (__A1  __A2  __A3  __A4  __A5)
...
...

if in my example all 5 words is found in body it will hit


-- 
xpoint



Re: regex anchor for start of line in body

2009-07-08 Thread Charles Gregory

On Wed, 8 Jul 2009, Benny Pedersen wrote:

do you have a dual quad core that idles ? :)


I have a dual Pentium-III that idles 99% of the time, yes.


rawbody takes more cpu power then (body)


I wouldn't think that it takes much more as the only difference
is whether HTML is still present


why missing /i ?
and why exact match on begin of line ?


I use these rules as quick 'poison pill' rules added as needed, then
remove them a few weeks later.

The use of case-sensitive matching and exact line matching are intended to 
match the spam as exactly as possible and minimize the possibility of 
FP's. Someone could very well have a deceased client of some kind, but 
it's not likely that ham will use that exact phrase, with that 
capitalization, all alone on a single line (the original regex matches 
beginning to END of the line).


Also, anchoring tests to the beginning or end of lines should improve 
efficiency, as the only places it will check the regex is at line breaks.



body __A1 /\basserts\b/i
body __A2 /\bof\b/i
body __A3 /\bmy\b/i
body __A4 /\bdeceased\b/i
body __A5 /\bclient\b/i
meta LOC09070702 (__A1  __A2  __A3  __A4  __A5)


Far too much chance of FP's. Given that 'for' and 'my' occur in many 
e-mails, you are really basing this on 'deceased', 'client' and 'assets'.


- C


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread Charles Gregory

On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote:

I seem to be having a hard time writing rules which anchor
a string to the start of the line in the body of a text message.


What the.? So am I!

I have tried all combinations of:
body   LOC_09070701 /^Assets of my deceased Client/
body   LOC_09070702 /^Assets of my deceased Client/m
body   LOC_09070703 /^Assets of my deceased Client/ms

And NONE of them match the beginning of line!

- Charles


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread Charles Gregory

On Tue, 7 Jul 2009, Charles Gregory wrote:

I have tried all combinations of:
body   LOC_09070701 /^Assets of my deceased Client/
body   LOC_09070702 /^Assets of my deceased Client/m
body   LOC_09070703 /^Assets of my deceased Client/ms

And NONE of them match the beginning of line!


Just for interest sake, I am putting my 'test line' here
Assets of my deceased Client
...just to see if it is my testing method that is broken

- Charles


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread Charles Gregory

On Tue, 7 Jul 2009, Charles Gregory wrote:

X-Spam-Status: No, hits=-2004.0 required=10.0 autolearn=disabled
tests=LOC_SAUSERS_RCVD_WL=-1000,LOC_SAUSERS_TO_WL=-1000,
RCVD_IN_DNSWL_MED=-4
On Tue, 7 Jul 2009, Charles Gregory wrote:
 I have tried all combinations of:
 body   LOC_09070701 /^Assets of my deceased Client/
 body   LOC_09070702 /^Assets of my deceased Client/m
 body   LOC_09070703 /^Assets of my deceased Client/ms

 And NONE of them match the beginning of line!

Just for interest sake, I am putting my 'test line' here

Assets of my deceased Client

...just to see if it is my testing method that is broken


And no, it doesn't (sigh)

- C


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread John Hardin

On Tue, 7 Jul 2009, Charles Gregory wrote:


On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote:

 I seem to be having a hard time writing rules which anchor
 a string to the start of the line in the body of a text message.


What the.? So am I!

I have tried all combinations of:
body   LOC_09070701 /^Assets of my deceased Client/
body   LOC_09070702 /^Assets of my deceased Client/m
body   LOC_09070703 /^Assets of my deceased Client/ms

And NONE of them match the beginning of line!


Post a sample email that you're trying to match. Bear in mind, body rules 
work on modified body text. The fact that text appears at the beginning of 
a line when displayed in your mail client (or even in a text editor 
editing the raw message file) does not reliably imply it's at the 
beginning of a line in the text body rules are matching against. See the 
ALL_BODY troubleshooting rule I suggested for test use.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  There is no doubt in my mind that millions of lives could have been
  saved if the people were not brainwashed about gun ownership and
  had been well armed. ... Gun haters always want to forget the Warsaw
  Ghetto uprising, which is a perfect example of how a ragtag,
  half-starved group of Jews took 10 handguns and made asses out of
  the Nazis.-- Theodore Haas, Dachau survivor
---
 Today: Robert Heinlein's 102nd birthday


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread John Hardin

On Tue, 7 Jul 2009, Charles Gregory wrote:


Just for interest sake, I am putting my 'test line' here
Assets of my deceased Client
...just to see if it is my testing method that is broken


The body rule is comparing against a cleaned up paragraph where those 
lines are joined. Otherwise inserting line breaks would be a trivial way 
to avoid many SA rules.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  There is no doubt in my mind that millions of lives could have been
  saved if the people were not brainwashed about gun ownership and
  had been well armed. ... Gun haters always want to forget the Warsaw
  Ghetto uprising, which is a perfect example of how a ragtag,
  half-starved group of Jews took 10 handguns and made asses out of
  the Nazis.-- Theodore Haas, Dachau survivor
---
 Today: Robert Heinlein's 102nd birthday


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread RW
On Tue, 7 Jul 2009 14:57:59 -0400 (EDT)
Charles Gregory cgreg...@hwcn.org wrote:

 On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote:
  I seem to be having a hard time writing rules which anchor
  a string to the start of the line in the body of a text message.
 
 What the.? So am I!
 
 I have tried all combinations of:
 body   LOC_09070701 /^Assets of my deceased Client/
 body   LOC_09070702 /^Assets of my deceased Client/m
 body   LOC_09070703 /^Assets of my deceased Client/ms
 
 And NONE of them match the beginning of line!
 


From man Mail::SpamAssassin::Conf   

body SYMBOLIC_TEST_NAME /pattern/modifiers
   Define a body pattern test.  pattern is a Perl regular
   expression.  Note: as per the header tests, # must be
   escaped (\#) or else it is considered the beginning of a
   comment.

   The 'body' in this case is the textual parts of the message
   body; any non-text MIME parts are stripped, and the message
   decoded from Quoted-Printable or Base-64-encoded format if
   necessary.  The message Subject header is considered part of
   the body and becomes the first paragraph when running the
   rules.  All HTML tags and line breaks will be removed before
   matching.


Re: [sa] regex anchor for start of line in body

2009-07-07 Thread Charles Gregory

On Tue, 7 Jul 2009, Charles Gregory wrote:

On Tue, 7 Jul 2009, Charles Gregory wrote:

 I have tried all combinations of:
 body   LOC_09070701 /^Assets of my deceased Client/
 body   LOC_09070702 /^Assets of my deceased Client/m
 body   LOC_09070703 /^Assets of my deceased Client/ms
 And NONE of them match the beginning of line!


Sorry. I started typing this in the afternoon then got called away from 
the keyboard. Hope I didn't waste too many people's time


Bottom line: I need to RTFM more *literally*. The man itself says,
of the 'body' test, that all line breaks are removed before matching. 
So strictly speaking, there is way to make the 'body' test match
a string anchored to the beginning of a line. To achieve the desired 
result, we need to use a 'rawbody' test with the m option (but NOT
the s option!). Yes, this means that we might have to code the 
regex to handle some HTML (sigh)


So the desired test is:
rawbody  LOC_09070702 /^Assets of my deceased Client/m

- Charles


regex anchor for start of line in body

2009-07-06 Thread info-spamassassin-talk
I seem to be having a hard time writing rules which anchor
a string to the start of the line in the body of a text message.

e.g., suppose I get a lot of phish which contain text (not html)
like this:

Username:..
Password:..

I try what seemed intuitively easy:

body__PHISH1/^Password\b/i
body__PHISH0/^Username\b/i
metaPHISH   __PHISH1  __PHISH0

But the rule does not hit unless I remove the '^' from the above regex.
What am I missing?

Thanks,
Fletcher

fletcher at cs.utexas.edu


Re: regex anchor for start of line in body

2009-07-06 Thread Mark Martinec
Fletcher,

 I seem to be having a hard time writing rules which anchor
 a string to the start of the line in the body of a text message.

 e.g., suppose I get a lot of phish which contain text (not html)
 like this:

 Username:..
 Password:..

 I try what seemed intuitively easy:

 body  __PHISH1/^Password\b/i
 body  __PHISH0/^Username\b/i
 metaPHISH __PHISH1  __PHISH0

 But the rule does not hit unless I remove the '^' from the above regex.

 What am I missing?

The /m flag probably.

It is almost always wrong (or irrelevant) to leave out the /m flag
on regexp rules which contain anchors like ^ and $
(especially on header rules).

Try:  body __PHISH1 /^Password\b/im

  Mark


Re: regex anchor for start of line in body

2009-07-06 Thread John Hardin

On Mon, 6 Jul 2009, info-spamassassin-t...@cs.utexas.edu wrote:


I seem to be having a hard time writing rules which anchor
a string to the start of the line in the body of a text message.

e.g., suppose I get a lot of phish which contain text (not html)
like this:

Username:..
Password:..


You might want to look at the FILL_THIS_FORM stuff I posted in the last 
few days.



I try what seemed intuitively easy:

body__PHISH1/^Password\b/i
body__PHISH0/^Username\b/i
metaPHISH   __PHISH1  __PHISH0

But the rule does not hit unless I remove the '^' from the above regex.
What am I missing?


...that body rules work on a cleaned-up body. Lines that look like they 
should make up a paragraph are joined together and whitespace is 
collapsed.


Add this to your testbed and run with --debug area=all,rules to see what 
it's _really_ comparing to for body rules:


   body ALL_BODY /.+/

You also need a m flag. :)

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  The big news on the streets today is that the people of Baqubah
  are generally ecstatic, although many hold in reserve a serious
  concern that we will abandon them again. For many Iraqis, we have
  morphed from being invaders to occupiers to members of a tribe.
 -- Michael Yon, 05 July 2007
---
 Tomorrow: Robert Heinlein's 102nd birthday


Re: regex anchor for start of line in body

2009-07-06 Thread RW
On Mon, 6 Jul 2009 17:58:59 -0500
info-spamassassin-t...@cs.utexas.edu wrote:

 I seem to be having a hard time writing rules which anchor
 a string to the start of the line in the body of a text message.
 
 e.g., suppose I get a lot of phish which contain text (not html)
 like this:
 
 Username:..
 Password:..
 
 I try what seemed intuitively easy:
 
 body  __PHISH1/^Password\b/i
 body  __PHISH0/^Username\b/i
 metaPHISH __PHISH1  __PHISH0

As has already been said, line-breaks are removed in body tests, but
even if they weren't, the test would be likely to FP on website sign-up
replies.

It might be better to looks for username  and password separated by
a suitable pattern of whitespace and punctuation.


Re: regex anchor for start of line in body

2009-07-06 Thread Benny Pedersen

On Tue, July 7, 2009 00:58, info-spamassassin-t...@cs.utexas.edu wrote:

 body  __PHISH1/^Password\b/i
 body  __PHISH0/^Username\b/i
 metaPHISH __PHISH1  __PHISH0

 But the rule does not hit unless I remove the '^' from the above regex.
 What am I missing?

replace ^ with \b

/i case is not important so you can also have lowercase U and P :)

-- 
xpoint