Re: Why single periods in regex in spamassassin rules?
Completely agree with Joe. Normally if we did that we saw some situation where they were using something other than a space perhaps a pipe or a plus or a non-printable character or something else. So we made the rest of the role like that to future proof it against other variants of the same spam. On Sun, Apr 25, 2021, 08:51 Joe Quinn wrote: > On 4/23/21 2:52 PM, David B Funk wrote: > > On Fri, 23 Apr 2021, Steve Dondley wrote: > > > >> I'm looking at KAM.cf. There is this rule: > >> > >> body__KAM_WEB2 /INDIA based > >> IT|indian.based.website|certified.it.company/i > >> > >> I'm wondering if there is a good reason why a singe period is used > >> instead of something like \s+ which would catch multiple spaces > >> whereas a singe period doesn't. > > > > Because '/indian.based.website'/ will match 'indian-based_website' but > > \s will not. > > > > > This is the real reason (or at least, it was for all of my contributions > to KAM.cf). I was also concerned about tricks like , which is > visibly a space but has all the technical characteristics of > non-whitespace. Using "." was easier than knowing everything about > unicode codepoints. > >
Re: Why single periods in regex in spamassassin rules?
On 4/23/21 2:52 PM, David B Funk wrote: On Fri, 23 Apr 2021, Steve Dondley wrote: I'm looking at KAM.cf. There is this rule: body __KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't. Because '/indian.based.website'/ will match 'indian-based_website' but \s will not. This is the real reason (or at least, it was for all of my contributions to KAM.cf). I was also concerned about tricks like , which is visibly a space but has all the technical characteristics of non-whitespace. Using "." was easier than knowing everything about unicode codepoints.
Re: Why single periods in regex in spamassassin rules?
On Fri, 23 Apr 2021, RW wrote: On Fri, 23 Apr 2021 13:52:40 -0500 (CDT) David B Funk wrote: On Fri, 23 Apr 2021, Steve Dondley wrote: I'm looking at KAM.cf. There is this rule: body__KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't. Because '/indian.based.website'/ will match 'indian-based_website' but \s will not. \W+ might be better though Not unbounded it isn't. \W{1,5} might be better without being runaway. -- John Hardin KA7OHZhttp://www.impsec.org/~jhardin/ jhar...@impsec.org pgpk -a jhar...@impsec.org key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C AF76 D822 E6E6 B873 2E79 --- Are you a mildly tech-literate politico horrified by the level of ignorance demonstrated by lawmakers gearing up to regulate online technology they don't even begin to grasp? Cool. Now you have a tiny glimpse into a day in the life of a gun owner. -- Sean Davis --- 329 days since the first private commercial manned orbital mission (SpaceX)
Re: Why single periods in regex in spamassassin rules?
On Fri, 23 Apr 2021 13:52:40 -0500 (CDT) David B Funk wrote: > On Fri, 23 Apr 2021, Steve Dondley wrote: > > > I'm looking at KAM.cf. There is this rule: > > > > body__KAM_WEB2 /INDIA based > > IT|indian.based.website|certified.it.company/i > > > > I'm wondering if there is a good reason why a singe period is used > > instead of something like \s+ which would catch multiple spaces > > whereas a singe period doesn't. > > Because '/indian.based.website'/ will match 'indian-based_website' > but \s will not. \W+ might be better though
Re: Why single periods in regex in spamassassin rules?
On Fri, 23 Apr 2021, Steve Dondley wrote: I'm looking at KAM.cf. There is this rule: body__KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't. Because '/indian.based.website'/ will match 'indian-based_website' but \s will not. -- Dave Funk University of Iowa College of Engineering 319/335-5751 FAX: 319/384-05491256 Seamans Center, 103 S Capitol St. Sys_admin/Postmaster/cell_admin Iowa City, IA 52242-1527 #include Better is not better, 'standard' is better. B{
Re: Why single periods in regex in spamassassin rules?
On 2021-04-23 01:37 PM, Henrik K wrote: On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote: I'm looking at KAM.cf. There is this rule: body__KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't. It would make no difference, because body is normalized from consecutive spaces into single spaces. https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced Makes sense. And thanks for the link. I was looking for some king of guidance on writing rules. Google didn't help much.
Re: Why single periods in regex in spamassassin rules?
On Fri, Apr 23, 2021 at 01:03:33PM -0400, Steve Dondley wrote: > I'm looking at KAM.cf. There is this rule: > > body__KAM_WEB2 /INDIA based > IT|indian.based.website|certified.it.company/i > > I'm wondering if there is a good reason why a singe period is used instead > of something like \s+ which would catch multiple spaces whereas a singe > period doesn't. It would make no difference, because body is normalized from consecutive spaces into single spaces. https://cwiki.apache.org/confluence/display/SPAMASSASSIN/WritingRulesAdvanced
Re: Why single periods in regex in spamassassin rules?
On 23.04.21 13:03, Steve Dondley wrote: I'm looking at KAM.cf. There is this rule: body__KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't. generally, it's safer not to allow regular expressions unlimited range, e.g. \s{1,3} -- Matus UHLAR - fantomas, uh...@fantomas.sk ; http://www.fantomas.sk/ Warning: I wish NOT to receive e-mail advertising to this address. Varovanie: na tuto adresu chcem NEDOSTAVAT akukolvek reklamnu postu. Support bacteria - they're the only culture some people have.
Why single periods in regex in spamassassin rules?
I'm looking at KAM.cf. There is this rule: body__KAM_WEB2 /INDIA based IT|indian.based.website|certified.it.company/i I'm wondering if there is a good reason why a singe period is used instead of something like \s+ which would catch multiple spaces whereas a singe period doesn't.