Re: Why single periods in regex in spamassassin rules?

2021-04-25 Thread Kevin A. McGrail
Completely agree with Joe. Normally if we did that we saw some situation
where they were using something other than a space perhaps a pipe or a plus
or a non-printable character or something else. So we made the rest of the
role like that to future proof it against other variants of the same spam.

On Sun, Apr 25, 2021, 08:51 Joe Quinn  wrote:

> On 4/23/21 2:52 PM, David B Funk wrote:
> > On Fri, 23 Apr 2021, Steve Dondley wrote:
> >
> >> I'm looking at KAM.cf. There is this rule:
> >>
> >> body__KAM_WEB2  /INDIA based
> >> IT|indian.based.website|certified.it.company/i
> >>
> >> I'm wondering if there is a good reason why a singe period is used
> >> instead of something like \s+ which would catch multiple spaces
> >> whereas a singe period doesn't.
> >
> > Because '/indian.based.website'/ will match 'indian-based_website' but
> > \s will not.
> >
> >
> This is the real reason (or at least, it was for all of my contributions
> to KAM.cf). I was also concerned about tricks like  , which is
> visibly a space but has all the technical characteristics of
> non-whitespace. Using "." was easier than knowing everything about
> unicode codepoints.
>
>


Re: Why single periods in regex in spamassassin rules?

2021-04-25 Thread Joe Quinn

On 4/23/21 2:52 PM, David B Funk wrote:

On Fri, 23 Apr 2021, Steve Dondley wrote:


I'm looking at KAM.cf. There is this rule:

body    __KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used 
instead of something like \s+ which would catch multiple spaces 
whereas a singe period doesn't.


Because '/indian.based.website'/ will match 'indian-based_website' but 
\s will not.



This is the real reason (or at least, it was for all of my contributions 
to KAM.cf). I was also concerned about tricks like  , which is 
visibly a space but has all the technical characteristics of 
non-whitespace. Using "." was easier than knowing everything about 
unicode codepoints.




Re: Why single periods in regex in spamassassin rules?

2021-04-24 Thread John Hardin

On Fri, 23 Apr 2021, RW wrote:


On Fri, 23 Apr 2021 13:52:40 -0500 (CDT)
David B Funk wrote:


On Fri, 23 Apr 2021, Steve Dondley wrote:


I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based
IT|indian.based.website|certified.it.company/i

I'm wondering if there is a good reason why a singe period is used
instead of something like \s+ which would catch multiple spaces
whereas a singe period doesn't.


Because '/indian.based.website'/ will match 'indian-based_website'
but \s will not.


\W+ might be better though


Not unbounded it isn't. \W{1,5} might be better without being runaway.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.org pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  Are you a mildly tech-literate politico horrified by the level of
  ignorance demonstrated by lawmakers gearing up to regulate online
  technology they don't even begin to grasp? Cool. Now you have a
  tiny glimpse into a day in the life of a gun owner.   -- Sean Davis
---
 329 days since the first private commercial manned orbital mission (SpaceX)


Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread RW
On Fri, 23 Apr 2021 13:52:40 -0500 (CDT)
David B Funk wrote:

> On Fri, 23 Apr 2021, Steve Dondley wrote:
> 
> > I'm looking at KAM.cf. There is this rule:
> >
> > body__KAM_WEB2  /INDIA based 
> > IT|indian.based.website|certified.it.company/i
> >
> > I'm wondering if there is a good reason why a singe period is used
> > instead of something like \s+ which would catch multiple spaces
> > whereas a singe period doesn't.  
> 
> Because '/indian.based.website'/ will match 'indian-based_website'
> but \s will not.

\W+ might be better though


Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread David B Funk

On Fri, 23 Apr 2021, Steve Dondley wrote:


I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used instead of 
something like \s+ which would catch multiple spaces whereas a singe period 
doesn't.


Because '/indian.based.website'/ will match 'indian-based_website' but \s will 
not.



--
Dave Funk   University of Iowa
 College of Engineering
319/335-5751   FAX: 319/384-05491256 Seamans Center, 103 S Capitol St.
Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527
#include 
Better is not better, 'standard' is better. B{


Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread Steve Dondley

On 2021-04-23 01:37 PM, Henrik K wrote:

On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote:

I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based
IT|indian.based.website|certified.it.company/i

I'm wondering if there is a good reason why a singe period is used 
instead
of something like \s+ which would catch multiple spaces whereas a 
singe

period doesn't.


It would make no difference, because body is normalized from 
consecutive

spaces into single spaces.

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced


Makes sense. And thanks for the link. I was looking for some king of 
guidance on writing rules. Google didn't help much.


Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread Henrik K
On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote:
> I'm looking at KAM.cf. There is this rule:
> 
> body__KAM_WEB2  /INDIA based
> IT|indian.based.website|certified.it.company/i
> 
> I'm wondering if there is a good reason why a singe period is used instead
> of something like \s+ which would catch multiple spaces whereas a singe
> period doesn't.

It would make no difference, because body is normalized from consecutive
spaces into single spaces.

https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced



Re: Why single periods in regex in spamassassin rules?

2021-04-23 Thread Matus UHLAR - fantomas

On 23.04.21 13:03, Steve Dondley wrote:

I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used 
instead of something like \s+ which would catch multiple spaces 
whereas a singe period doesn't.


generally, it's safer not to allow regular expressions unlimited range, e.g.

\s{1,3}


--
Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/
Warning: I wish NOT to receive e-mail advertising to this address.
Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu.
Support bacteria - they're the only culture some people have.


Why single periods in regex in spamassassin rules?

2021-04-23 Thread Steve Dondley

I'm looking at KAM.cf. There is this rule:

body__KAM_WEB2  /INDIA based 
IT|indian.based.website|certified.it.company/i


I'm wondering if there is a good reason why a singe period is used 
instead of something like \s+ which would catch multiple spaces whereas 
a singe period doesn't.