subject:"Underscores"

Re: Underscores

2009-07-18 Thread John Hardin


On Sat, 18 Jul 2009, twofers wrote:

I am mainly using the rule to check the header subject, I haven't added 
it to a body check.

?
So, between the 3 choices:
1.  /(?:[^_]{1,30}_+){5}/
2. /\S+_+\S+_+\S+/
3. R02 /^\S{30,}$/m

?Which covers the most territory given the example I submitted? I'm
basically interested in identifying those garbage subject lines laced 
with characters like underscores, periods, hyphens, semi-colons, etc; so 
rather than use several rules to trap those individual characters, maybe 
there is a more effective way to resolve this.


Your original example only included underscores.

Try this:

  header XX Subject =~ /(?:[[:alnum:]]{1,30}[^[:alnum:]\s]{1,5}){5}/i

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Re: Underscores

2009-07-18 Thread twofers

I am mainly using the rule to check the header subject, I haven't added it to a 
body check.
 
So, between the 3 choices:
1.  /(?:[^_]{1,30}_+){5}/
2. /\S+_+\S+_+\S+/
3. R02 /^\S{30,}$/m
 
Which covers the most territory given the example I submitted? I'm basically 
interested in identifying those garbage subject lines laced with characters 
like underscores, periods, hyphens, semi-colons, etc; so rather than use 
several rules to trap those individual characters, maybe there is a more 
effective way to resolve this.
 
Thanks, Wes

Re: Underscores

2009-07-16 Thread John Hardin


On Thu, 16 Jul 2009, Karsten Br?ckelmann wrote:


Whoops! Make that:

  /(?:[^_]{1,30}_+){5}/


Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match "words ending in underscores". A non-underscore [^_] includes
space, punctuation, and any other unwanted char.

Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_
chars in between. This paragraph matches. :)


Sorry. I lost sight of that part...

  /(?:[^_\s]{1,30}_+){5}/

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  You know things are bad when Pravda says we [the USA] have gone
  too far to the left. -- Joe Huffman
---
 Today: the 64th anniversary of the dawn of the Atomic Age

Re: [sa] Re: Underscores

2009-07-16 Thread Karsten Bräckelmann

On Thu, 2009-07-16 at 11:08 -0400, Charles Gregory wrote:
> Given that OP said the entire *line* was word-underscore-word-underscore,
> then why not just:
> 
> body R01 /^\w{30,}$/m

Indeed, it really depends on what *exactly* the rule should match.

> Or perhaps the OP wasn't clear on whether 'word' might contain other 
> punctuation, and so we might simply use:
> 
> body R02 /^\S{30,}$/m

This one also matches a long-ish URL on a line of its own.

> I might add \s* at the end of the rule, just in case of trailing spaces...

Keep in mind, that with body rules, the body is *rendered*. Whitespace
normalized, and *paragraphs* re-flowed to a single string with embedded
newlines stripped. For instance, this very paragraph is a single ^line$
as far as body REs are concerned.

-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: [sa] Re: Underscores

2009-07-16 Thread Charles Gregory


On Thu, 16 Jul 2009, Karsten Bräckelmann wrote:

  /(?:[^_]{1,30}_+){5}/

Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match "words ending in underscores". A non-underscore [^_] includes
space, punctuation, and any other unwanted char.


Given that OP said the entire *line* was word-underscore-word-underscore,
then why not just:

body R01 /^\w{30,}$/m

Or perhaps the OP wasn't clear on whether 'word' might contain other 
punctuation, and so we might simply use:


body R02 /^\S{30,}$/m

I might add \s* at the end of the rule, just in case of trailing spaces...

- C

Re: Underscores

2009-07-16 Thread Karsten Bräckelmann

> Whoops! Make that:
> 
>   /(?:[^_]{1,30}_+){5}/

Better. ;)  However, while that indeed eliminates excessive backtracking
as \S or \w results in (since they contain the underscore), this doesn't
match "words ending in underscores". A non-underscore [^_] includes
space, punctuation, and any other unwanted char.

Exactly _five_ occurrences of an '_' underscore, with up to 30 _random_
chars in between. This paragraph matches. :)


-- 
char *t="\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4";
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;i>=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}

Re: Underscores

2009-07-16 Thread John Hardin

On Thu, 2009-07-16 at 06:27 -0700, John Hardin wrote:

> How about:
> 
>   /(?:[^_]{1,30}_+){1,5}/

Whoops! Make that:

  /(?:[^_]{1,30}_+){5}/

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Re: Underscores

2009-07-16 Thread Jeff Mincy

   From: Matt Kettler 
   Date: Thu, 16 Jul 2009 08:52:50 -0400

   twofers wrote:
   > How can I pattern match when every word has an underscore after it.
   > Example:
   > This_sentenance_has_an_underscore_after_every_word
   >
   > I'm not really good at Perl pattern matching, but \w and \W see an
   > underscore as a word character, so I'm just not sure what might work.
   >
   > body =~ /^([a-z]+_+)+/i
   >
   > Is that something that will work effectively?

Is this for a spam rule?

   I'd do something like this:

   body  MY_UNDERSCORES/\S+_+\S+_+\S+/

   Unless you really want to restrict it to A-Z.

   Regardless, ending any regex in + in a SA rule is redundant. Since +
   allows a one-instance match, it will devolve to that. You don't need to
   match the entire line with your rule, so the extra matches are
   redundant. It will match the first instance, and that's all it needs to
   be a match.

   Also any regex ending in * should just have it's last element removed,
   as that will devolve to a zero-count match.

The /\S+_+\S+_+\S+/ rule will lots of technical email, for example
discussions on shell environment variables like LD_LIBRARY_PATH.

-jeff

Re: Underscores

2009-07-16 Thread John Hardin

On Thu, 2009-07-16 at 08:52 -0400, Matt Kettler wrote:
> 
> twofers wrote:
> > How can I pattern match when every word has an underscore after it.
> > Example:
> > This_sentenance_has_an_underscore_after_every_word
> >
> > body =~ /^([a-z]+_+)+/i
>
> I'd do something like this:
> 
> body  MY_UNDERSCORES/\S+_+\S+_+\S+/

That's quite a lot of backtracking, no?

How about:

  /(?:[^_]{1,30}_+){1,5}/

-- 
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79

Re: Underscores

2009-07-16 Thread Matt Kettler

twofers wrote:
> How can I pattern match when every word has an underscore after it.
> Example:
> This_sentenance_has_an_underscore_after_every_word
>
> I'm not really good at Perl pattern matching, but \w and \W see an
> underscore as a word character, so I'm just not sure what might work.
>
> body =~ /^([a-z]+_+)+/i
>
> Is that something that will work effectively?
>
> Thanks.
>
> Wes
>
>

I'd do something like this:

body  MY_UNDERSCORES/\S+_+\S+_+\S+/

Unless you really want to restrict it to A-Z.

Regardless, ending any regex in + in a SA rule is redundant. Since +
allows a one-instance match, it will devolve to that. You don't need to
match the entire line with your rule, so the extra matches are
redundant. It will match the first instance, and that's all it needs to
be a match.

Also any regex ending in * should just have it's last element removed,
as that will devolve to a zero-count match.

Underscores

2009-07-16 Thread twofers

How can I pattern match when every word has an underscore after it.
Example:
This_sentenance_has_an_underscore_after_every_word

I'm not really good at Perl pattern matching, but \w and \W see an underscore 
as a word character, so I'm just not sure what might work.

body =~ /^([a-z]+_+)+/i

Is that something that will work effectively?

Thanks.

Wes

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-11 Thread Loren Wilton

I just finally turned this rule off.  For some reason it has started
triggering on a whole lot of my normal mail, which isn't useful and is
creating a bunch of FPs.  I don't think I've ever seen it trigger on spam...
:-)

Loren

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-11 Thread Matt Kettler

At 08:30 PM 12/10/2004, Matt Kettler wrote:
>The rule doesn't do very well anyway:
>
>   1.039   1.1433   0.11900.906   0.730.90  SUBJ_HAS_UNIQ_ID
>
>Hence the <1 score it receives.
Perhaps this is a decent chunk of why the rule doesn't perform well It 
might be worth looking into modifying that regex in the eval to try to get 
better performance, or splitting them up so you can test each separately...
Nevermind. Looking at my most recent 300 spams, only one matched, and that 
didn't have a UNIQ_ID..

Subject: {SPAM} 0rder your meds" today`
It doesn't look like spammers use UNIQ ID's in the subject lines often 
anymore..

The only one I did find, doesn't match the rule:
Subject: {SPAM} STOP_PAYING_FOR YOUR Cable_Movies e6pgu5
There are others posing as shipment notices, but the rule tries to skip 
them on purpose..

Subject: {SPAM} Fedex Ship Notification, Tracking Number : 
VBN24530946 - 40352TZLP
Subject: {SPAM} Fedex Delivery Confirmation, Tracking Number : 
ITZ65070066405343DJCK

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-11 Thread Matt Kettler

At 04:06 PM 12/10/2004, Theo Van Dinter wrote:
It's not simply a hyphenated word.  It looks like two long sets of characte=
rs
with a hyphen in the middle, which is the exact same thing as a unique id.
The rule doesn't do very well anyway:
  1.039   1.1433   0.11900.906   0.730.90  SUBJ_HAS_UNIQ_ID
Hence the <1 score it receives.
Perhaps this is a decent chunk of why the rule doesn't perform well It 
might be worth looking into modifying that regex in the eval to try to get 
better performance, or splitting them up so you can test each separately...

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Theo Van Dinter

On Fri, Dec 10, 2004 at 08:31:57PM +0100, Per Jessen wrote:
> > No.  The issue is that "cumulus-bonuspunkten" looks like an ID tag.
> 
> Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word?  There
> are plenty of those around (although less in german then in english). 

It's not simply a hyphenated word.  It looks like two long sets of characters
with a hyphen in the middle, which is the exact same thing as a unique id.

The rule doesn't do very well anyway:

  1.039   1.1433   0.11900.906   0.730.90  SUBJ_HAS_UNIQ_ID

Hence the <1 score it receives.

-- 
Randomly Generated Tagline:
"Linux poses a real challenge for those with a taste for late-night
 hacking (and/or conversations with God)."
 (By Matt Welsh)


pgpQIdHBQJj0N.pgp
Description: PGP signature

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Per Jessen

Theo Van Dinter wrote:

> On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote:
>> Why does SUBJ_HAS_UNIQ_ID fire on this subject:
>> 
>> Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=
>> 
>> Is this a bug in the RFC2047 decoding in SA 2.64?
> 
> No.  The issue is that "cumulus-bonuspunkten" looks like an ID tag.

Should SUBJ_HAS_UNIQ_ID really fire on that - simply a hyphenated word?  There
are plenty of those around (although less in german then in english). 
 

-- 
Per Jessen, Zurich
Let your spam stop here -- http://www.spamchek.com

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Theo Van Dinter

On Fri, Dec 10, 2004 at 01:25:43PM +0100, Per Jessen wrote:
> Why does SUBJ_HAS_UNIQ_ID fire on this subject:
> 
> Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=
> 
> Is this a bug in the RFC2047 decoding in SA 2.64? 

No.  The issue is that "cumulus-bonuspunkten" looks like an ID tag.

-- 
Randomly Generated Tagline:
"Any similarity to person/persons now living to anyone or thing, dead or 
 undead, is entirely accidental and just one more irrefutable proof of the 
 paranormal."  - From the 7th Guest


pgp3wtKYoXpsM.pgp
Description: PGP signature

2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2004-12-10 Thread Per Jessen

Why does SUBJ_HAS_UNIQ_ID fire on this subject:

Subject: =?iso-8859-1?Q?MIGROL_Heiz=F6l-Angebot_mit_Cumulus-Bonuspunkten?=

It looks as SA mistakenly interprets the underscores as underscores - which in
an RFC2047 encoded string, they're not - http://rfc.net/rfc2047.html ,

Is this a bug in the RFC2047 decoding in SA 2.64? 


-- 
Per Jessen, Zurich
Let your spam stop here -- http://www.spamchek.com

Re: Underscores

Re: Underscores

Re: Underscores

Re: [sa] Re: Underscores

Re: [sa] Re: Underscores

Re: Underscores

Re: Underscores

Re: Underscores

Re: Underscores

Re: Underscores

Underscores

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

Re: 2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

2.64 - SUBJ_HAS_UNIQ_ID - incorrect interpretation of underscores??

18 matches

Site Navigation

Mail list logo

Footer information