Re: Blank line rules
On 05/26/2014 09:20 PM, James B. Byrne wrote: On a related note, what is the difference between 'body' and 'rawbody' rules? http://wiki.apache.org/spamassassin/WritingRules
Re: Blank line rules
On Thu, May 22, 2014 17:50, Karsten Bräckelmann wrote: > > > There's another issue with your approach of different rules matching "up > to n" occurrences and "more than n". The first will always match in > addition, if the latter matches. > > If the desired behavior is mutually exclusive matching, you need meta > rules actually encoding the math / logic. > The rules are meant to 'stack'. The scores need adjusting to suit but the effect is intentional. Whether or not it makes sense I will discover once I get it working. Which apparently is not going to happen any time soon. On a related note, what is the difference between 'body' and 'rawbody' rules? -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
Re: Blank line rules
On Fri, 23 May 2014, Alex wrote: On Thu, May 22, 2014 at 8:44 PM, John Hardin wrote: On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Heh. Something similar just plopped into my spam quarantine. You might want to do this: rawbody MANY_BLANK_LINES /(?:(?:)?\r?\n){9}/mi I tried this for a while in my corpus. Have you combined this into a meta? I'm finding this matches far too much ham to even remotely be considered. Was it the intention to only match fn's? It was more to illustrate a possible variant. The {9} is probably *way* too low for production use, and yes, it should probably be a subrule used in metas rather than being directly scored. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Christian martyrs don't explode. -- Marisol --- 3 days until Memorial Day - honor those who sacrificed for our liberty
Re: Blank line rules
On Fri, 2014-05-23 at 19:36 -0400, Alex wrote: > Hi, > > On Thu, May 22, 2014 at 8:44 PM, John Hardin wrote: > > > On Thu, 22 May 2014, Karsten Bräckelmann wrote: > > > > On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: > >> > >>> I am clearly missing something with these rules but I lack the > >>> experience to > >>> see what it is: > >>> > >>> score RAW_BLANK_LINES_05 0.5 > >>> rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i > >>> > >> > >> Why is everyone trying to match empty lines these days? Must be spam I'm > >> missing out on. ;) > >> > > > > Heh. Something similar just plopped into my spam quarantine. > > > > You might want to do this: > > > > rawbody MANY_BLANK_LINES /(?:(?:)?\r?\n){9}/mi > > > I tried this for a while in my corpus. Have you combined this into a meta? > I'm finding this matches far too much ham to even remotely be considered. > Was it the intention to only match fn's? > If you're going to write rules to reliably match HTML spam, its a good idea to start by reading enough of the HTML generated by the more popular MUAs, especially the MS ones, to be familiar with the tag sequences they generate because a lot of them are quite unlike anything you'd expect a rationally designed program to produce. IOW you need some familiarity with the tangled strings of tags that can be found in *ham* so you can avoid matching them by mistake. Martin > Thanks, > Alex
Re: Blank line rules
Hi, On Thu, May 22, 2014 at 8:44 PM, John Hardin wrote: > On Thu, 22 May 2014, Karsten Bräckelmann wrote: > > On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: >> >>> I am clearly missing something with these rules but I lack the >>> experience to >>> see what it is: >>> >>> score RAW_BLANK_LINES_05 0.5 >>> rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i >>> >> >> Why is everyone trying to match empty lines these days? Must be spam I'm >> missing out on. ;) >> > > Heh. Something similar just plopped into my spam quarantine. > > You might want to do this: > > rawbody MANY_BLANK_LINES /(?:(?:)?\r?\n){9}/mi I tried this for a while in my corpus. Have you combined this into a meta? I'm finding this matches far too much ham to even remotely be considered. Was it the intention to only match fn's? Thanks, Alex
OFF-TOPIC: The Brilliance of PootieTang was Re: Blank line rules
On 5/22/2014 9:17 PM, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote: On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Who here has seen Pootietang and is laughing about this? Just me, likely... The fact I just googled that word should sufficiently answer it as far as I am concerned. ;) Good thing it amused you, but that reference was certainly unintended. https://www.youtube.com/watch?v=RtCxvv8Y3Bs 2:54 is classic. This movie is one of the real hit or miss comedies. I think it's brilliant on a lot of levels. Others don't get it. regards, KAM
Re: Blank line rules
On Thu, 2014-05-22 at 20:56 -0400, Kevin A. McGrail wrote: > On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: > > Why is everyone trying to match empty lines these days? Must be spam > > I'm missing out on. ;) > > Who here has seen Pootietang and is laughing about this? Just me, likely... The fact I just googled that word should sufficiently answer it as far as I am concerned. ;) Good thing it amused you, but that reference was certainly unintended. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On 5/22/2014 5:50 PM, Karsten Bräckelmann wrote: Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Who here has seen Pootietang and is laughing about this? Just me, likely...
Re: Blank line rules
On May 22, 2014, at 6:44 PM, John Hardin wrote: > > You might want to do this: > > rawbody MANY_BLANK_LINES /(?:(?:)?\r?\n){9}/mi AC_BR_BONANZA should cover the HTML case. It could be easily extended to match standard LF or CR per above. (In my case I am matching something like 20 newlines for the HTML case, to try to prevent FPs.) --- Amir thumbed via iPhone
Re: Blank line rules
On Thu, 22 May 2014, Karsten Bräckelmann wrote: On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) Heh. Something similar just plopped into my spam quarantine. You might want to do this: rawbody MANY_BLANK_LINES /(?:(?:)?\r?\n){9}/mi -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- ...intellectuals have no interest in what _creates_ wealth, and what _inhibits_ the creation of wealth. They are very concerned about the _distribution_ of it, but they act as if wealth just exists somehow. It's like manna from heaven, it's only a question of how we split it up.-- Thomas Sowell --- 4 days until Memorial Day - honor those who sacrificed for our liberty
Re: Blank line rules
On Thu, 2014-05-22 at 13:47 -0700, John Hardin wrote: > On Thu, 22 May 2014, James B. Byrne wrote: > > rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i > Regular expressions by default only consider a single line of text. You Nope. You're thinking about ^ and $ by default only matching the beginning and end of the string. A \n newline is just an ordinary char. REs don't know the concept of lines, they operate on a string. > need to provide an option to say "treat multiple lines as a single line". > Try this: > >rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m The /m modifier changes ^ and $ to match anywhere in the string. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On Thu, 2014-05-22 at 15:49 -0400, James B. Byrne wrote: > I am clearly missing something with these rules but I lack the experience to > see what it is: > > score RAW_BLANK_LINES_05 0.5 > rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i Why is everyone trying to match empty lines these days? Must be spam I'm missing out on. ;) > I passed it to spamassassin from the command line with the above rules in > /etc/mail/spamassassin/local.cf and nothing was reported. I used an actual > message body from a spam message received and only the RAW_BLANK_LINES_05 test > is tripped even though the body of that message has 18 consecutive blank > lines, also consisting of nothing but \n characters. > > So what is it about the regexp I am using that I evidently do not understand? See the post Consecutive Newlines in Rawbody Rules as of a few minutes ago, follow-up to the Bayes refinement thread. In a nutshell: 12 or more consecutive newlines cannot be matched with rawbody rules. They get replaced by 2 newlines. There's another issue with your approach of different rules matching "up to n" occurrences and "more than n". The first will always match in addition, if the latter matches. If the desired behavior is mutually exclusive matching, you need meta rules actually encoding the math / logic. -- char *t="\10pse\0r\0dtu\0.@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4"; main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}
Re: Blank line rules
On Thu, 22 May 2014 13:47:04 -0700 (PDT) John Hardin wrote: John> Regular expressions by default only consider a single line of John> text. You need to provide an option to say "treat multiple lines John> as a single line". Try this: >rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m >rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m >rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m James, see also the Bayes refinement thread where I posted about doing the exact same thing. Somehow John's multiline rules don't work for me, either. Kärsten was looking at it last I know. -- Please *no* private copies of mailing list or newsgroup messages.
Re: Blank line rules
On Thu, 22 May 2014, James B. Byrne wrote: I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines score RAW_BLANK_LINES_10 1.0 rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines score RAW_BLANK_LINES_15 1.5 rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/ describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines Regular expressions by default only consider a single line of text. You need to provide an option to say "treat multiple lines as a single line". Try this: rawbody RAW_BLANK_LINES_05 /(?:\r?\n){5,9}/m rawbody RAW_BLANK_LINES_10 /(?:\r?\n){10,24}/m rawbody RAW_BLANK_LINES_15 /(?:\r?\n){25}/m The case-insensitive flag is not meaningful for these rules as there's no attempt to match text, and I added the ?: to make the groups non-capturing, which is a bit more efficient. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Windows and its users got mentioned at home today, after my wife the psych major brought up Seligman's theory of "learned helplessness." -- Dan Birchall in a.s.r --- 4 days until Memorial Day - honor those who sacrificed for our liberty
Blank line rules
I am clearly missing something with these rules but I lack the experience to see what it is: score RAW_BLANK_LINES_05 0.5 rawbody RAW_BLANK_LINES_05 /(\r?\n){5,9}/i describe RAW_BLANK_LINES_05 Raw body contains 5 or more consecutive empty lines score RAW_BLANK_LINES_10 1.0 rawbody RAW_BLANK_LINES_10 /(\r?\n){10,24}/i describe RAW_BLANK_LINES_10 Raw body contains 10 or more consecutive empty lines score RAW_BLANK_LINES_15 1.5 rawbody RAW_BLANK_LINES_15 /(\r?\n){25}/ describe RAW_BLANK_LINES_15 Raw body contains 25 or more consecutive empty lines I created a test file that consisted of nought but newlines (shown as $ characters using vim set list). I passed it to spamassassin from the command line with the above rules in /etc/mail/spamassassin/local.cf and nothing was reported. I used an actual message body from a spam message received and only the RAW_BLANK_LINES_05 test is tripped even though the body of that message has 18 consecutive blank lines, also consisting of nothing but \n characters. So what is it about the regexp I am using that I evidently do not understand? -- *** E-Mail is NOT a SECURE channel *** James B. Byrnemailto:byrn...@harte-lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3