Re: Blank line rules

2014-05-26 Thread James B. Byrne

On Thu, May 22, 2014 17:50, Karsten Bräckelmann wrote:



 There's another issue with your approach of different rules matching up
 to n occurrences and more than n. The first will always match in
 addition, if the latter matches.

 If the desired behavior is mutually exclusive matching, you need meta
 rules actually encoding the math / logic.


The rules are meant to 'stack'.  The scores need adjusting to suit but the
effect is intentional. Whether or not it makes sense I will discover once I
get it working.  Which apparently is not going to happen any time soon.

On a related note, what is the difference between 'body' and 'rawbody' rules?


-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3



Re: Blank line rules

2014-05-26 Thread Axb

On 05/26/2014 09:20 PM, James B. Byrne wrote:

On a related note, what is the difference between 'body' and 'rawbody' rules?


http://wiki.apache.org/spamassassin/WritingRules


Re: Blank line rules

2014-05-23 Thread Alex
Hi,

On Thu, May 22, 2014 at 8:44 PM, John Hardin jhar...@impsec.org wrote:

 On Thu, 22 May 2014, Karsten Bräckelmann wrote:

  On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:

 I am clearly missing something with these rules but I lack the
 experience to
 see what it is:

 score RAW_BLANK_LINES_05 0.5
 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i


 Why is everyone trying to match empty lines these days? Must be spam I'm
 missing out on. ;)


 Heh. Something similar just plopped into my spam quarantine.

 You might want to do this:

   rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi


I tried this for a while in my corpus. Have you combined this into a meta?
I'm finding this matches far too much ham to even remotely be considered.
Was it the intention to only match fn's?

Thanks,
Alex


Re: Blank line rules

2014-05-23 Thread Martin Gregorie
On Fri, 2014-05-23 at 19:36 -0400, Alex wrote:
 Hi,
 
 On Thu, May 22, 2014 at 8:44 PM, John Hardin jhar...@impsec.org wrote:
 
  On Thu, 22 May 2014, Karsten Bräckelmann wrote:
 
   On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:
 
  I am clearly missing something with these rules but I lack the
  experience to
  see what it is:
 
  score RAW_BLANK_LINES_05 0.5
  rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
 
 
  Why is everyone trying to match empty lines these days? Must be spam I'm
  missing out on. ;)
 
 
  Heh. Something similar just plopped into my spam quarantine.
 
  You might want to do this:
 
rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi
 
 
 I tried this for a while in my corpus. Have you combined this into a meta?
 I'm finding this matches far too much ham to even remotely be considered.
 Was it the intention to only match fn's?
 
If you're going to write rules to reliably match HTML spam, its a good
idea to start by reading enough of the HTML generated by the more
popular MUAs, especially the MS ones, to be familiar with the tag
sequences they generate because a lot of them are quite unlike anything
you'd expect a rationally designed program to produce. IOW you need some
familiarity with the tangled strings of tags that can be found in *ham*
so you can avoid matching them by mistake.
  

Martin

 Thanks,
 Alex





Re: Blank line rules

2014-05-23 Thread John Hardin

On Fri, 23 May 2014, Alex wrote:


On Thu, May 22, 2014 at 8:44 PM, John Hardin jhar...@impsec.org wrote:

On Thu, 22 May 2014, Karsten Bräckelmann wrote:
 On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:



rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i


Why is everyone trying to match empty lines these days? Must be spam I'm
missing out on. ;)


Heh. Something similar just plopped into my spam quarantine.

You might want to do this:

  rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi


I tried this for a while in my corpus. Have you combined this into a meta?
I'm finding this matches far too much ham to even remotely be considered.
Was it the intention to only match fn's?


It was more to illustrate a possible variant. The {9} is probably *way* 
too low for production use, and yes, it should probably be a subrule used 
in metas rather than being directly scored.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Christian martyrs don't explode. -- Marisol
---
 3 days until Memorial Day - honor those who sacrificed for our liberty

Blank line rules

2014-05-22 Thread James B. Byrne
I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines
score RAW_BLANK_LINES_10 1.0
rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i
describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines
score RAW_BLANK_LINES_15 1.5
rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/
describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines

I created a test file that consisted of nought but newlines (shown as $
characters using vim set list).

I passed it to spamassassin from the command line with the above rules in
/etc/mail/spamassassin/local.cf and nothing was reported.  I used an actual
message body from a spam message received and only the RAW_BLANK_LINES_05 test
is tripped even though the body of that message has 18 consecutive blank
lines, also consisting of nothing but \n characters.

So what is it about the regexp I am using that I evidently do not understand?

-- 
***  E-Mail is NOT a SECURE channel  ***
James B. Byrnemailto:byrn...@harte-lyne.ca
Harte  Lyne Limited  http://www.harte-lyne.ca
9 Brockley Drive  vox: +1 905 561 1241
Hamilton, Ontario fax: +1 905 561 0757
Canada  L8E 3C3



Re: Blank line rules

2014-05-22 Thread John Hardin

On Thu, 22 May 2014, James B. Byrne wrote:


I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i
describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines
score RAW_BLANK_LINES_10 1.0
rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i
describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines
score RAW_BLANK_LINES_15 1.5
rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/
describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines


Regular expressions by default only consider a single line of text. You 
need to provide an option to say treat multiple lines as a single line.

Try this:

  rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m
  rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m
  rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m

The case-insensitive flag is not meaningful for these rules as there's no 
attempt to match text, and I added the ?: to make the groups 
non-capturing, which is a bit more efficient.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
 Windows and its users got mentioned at home today, after my wife the
 psych major brought up Seligman's theory of learned helplessness.
 -- Dan Birchall in a.s.r
---
 4 days until Memorial Day - honor those who sacrificed for our liberty


Re: Blank line rules

2014-05-22 Thread Ian Zimmerman
On Thu, 22 May 2014 13:47:04 -0700 (PDT)
John Hardin jhar...@impsec.org wrote:

John Regular expressions by default only consider a single line of
John text.  You need to provide an option to say treat multiple lines
John as a single line. Try this:

rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m
rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m
rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m

James, see also the Bayes refinement thread where I posted about doing
the exact same thing.  Somehow John's multiline rules don't work for me,
either.  Kärsten was looking at it last I know.

-- 
Please *no* private copies of mailing list or newsgroup messages.


Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:
 I am clearly missing something with these rules but I lack the experience to
 see what it is:
 
 score RAW_BLANK_LINES_05 0.5
 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i

Why is everyone trying to match empty lines these days? Must be spam I'm
missing out on. ;)

 I passed it to spamassassin from the command line with the above rules in
 /etc/mail/spamassassin/local.cf and nothing was reported.  I used an actual
 message body from a spam message received and only the RAW_BLANK_LINES_05 test
 is tripped even though the body of that message has 18 consecutive blank
 lines, also consisting of nothing but \n characters.
 
 So what is it about the regexp I am using that I evidently do not understand?

See the post Consecutive Newlines in Rawbody Rules as of a few minutes
ago, follow-up to the Bayes refinement thread.

In a nutshell: 12 or more consecutive newlines cannot be matched with
rawbody rules. They get replaced by 2 newlines.


There's another issue with your approach of different rules matching up
to n occurrences and more than n. The first will always match in
addition, if the latter matches.

If the desired behavior is mutually exclusive matching, you need meta
rules actually encoding the math / logic.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 13:47 -0700, John Hardin wrote:
 On Thu, 22 May 2014, James B. Byrne wrote:

  rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i

 Regular expressions by default only consider a single line of text. You 

Nope. You're thinking about ^ and $ by default only matching the
beginning and end of the string. A \n newline is just an ordinary char.

REs don't know the concept of lines, they operate on a string.


 need to provide an option to say treat multiple lines as a single line.
 Try this:
 
rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m

The /m modifier changes ^ and $ to match anywhere in the string.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: Blank line rules

2014-05-22 Thread John Hardin

On Thu, 22 May 2014, Karsten Bräckelmann wrote:


On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote:

I am clearly missing something with these rules but I lack the experience to
see what it is:

score RAW_BLANK_LINES_05 0.5
rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i


Why is everyone trying to match empty lines these days? Must be spam I'm
missing out on. ;)


Heh. Something similar just plopped into my spam quarantine.

You might want to do this:

  rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...intellectuals have no interest in what _creates_ wealth, and
  what _inhibits_ the creation of wealth. They are very concerned
  about the _distribution_ of it, but they act as if wealth just
  exists somehow. It's like manna from heaven, it's only a
  question of how we split it up.-- Thomas Sowell
---
 4 days until Memorial Day - honor those who sacrificed for our liberty

Re: Blank line rules

2014-05-22 Thread Amir Caspi
On May 22, 2014, at 6:44 PM, John Hardin jhar...@impsec.org wrote:
 
 You might want to do this:
 
  rawbody  MANY_BLANK_LINES  /(?:(?:br)?\r?\n){9}/mi

AC_BR_BONANZA should cover the HTML case. It could be easily extended to match 
standard LF or CR per above. (In my case I am matching something like 20 
newlines for the HTML case, to try to prevent FPs.)

--- Amir
thumbed via iPhone



Re: Blank line rules

2014-05-22 Thread Kevin A. McGrail

On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:
Why is everyone trying to match empty lines these days? Must be spam 
I'm missing out on. ;) 

Who here has seen Pootietang and is laughing about this?  Just me, likely...


Re: Blank line rules

2014-05-22 Thread Karsten Bräckelmann
On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote:
 On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:
  Why is everyone trying to match empty lines these days? Must be spam 
  I'm missing out on. ;)
 
 Who here has seen Pootietang and is laughing about this?  Just me, likely...

The fact I just googled that word should sufficiently answer it as far
as I am concerned. ;)  Good thing it amused you, but that reference was
certainly unintended.


-- 
char *t=\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



OFF-TOPIC: The Brilliance of PootieTang was Re: Blank line rules

2014-05-22 Thread Kevin A. McGrail

On 5/22/2014 9:17 PM, Karsten Bräckelmann wrote:

On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote:

On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote:

Why is everyone trying to match empty lines these days? Must be spam
I'm missing out on. ;)

Who here has seen Pootietang and is laughing about this?  Just me, likely...

The fact I just googled that word should sufficiently answer it as far
as I am concerned. ;)  Good thing it amused you, but that reference was
certainly unintended.

https://www.youtube.com/watch?v=RtCxvv8Y3Bs

2:54 is classic.

This movie is one of the real hit or miss comedies.  I think it's 
brilliant on a lot of levels.  Others don't get it.


regards,
KAM