List of urls

2010-10-26 Thread Richard Smits

Hello,

Does anyone know if it's possible to have a list of url's, and define a 
score for all of them in one line ?



Now i do like this :

uri url_1 /www.domain1.com/
uri url_2 /www.domain2.com/
uri url_3 /www.domain3.com/
uri url_4 /www.domain4.com/

score url_1 10
score url_2 10
score url_3 10
score url_4 10


But I want just one line to define the score. Are there more ways to do 
this ?


Greetings .. Richard



Re: List of urls

2010-10-26 Thread Martin Gregorie
On Tue, 2010-10-26 at 08:07 +0200, Richard Smits wrote:
 Hello,
 
 Does anyone know if it's possible to have a list of url's, and define a 
 score for all of them in one line ?
 
I developed a similar system for my own purposes that you might want to
look at.

The idea is that you define this type of rule in an easily edited file
which contains header lines the set the rule name, score, description,
whether it ignores case, etc. These are followed by one or more
sections, each consisting of a line saying which part of the message it
applies to (body, uri, etc) and a list of match terms. A shell script,
which uses gawk for the heavy lifting, converts one or more definition
files into rules (one rule per definition) and outputs a single .cf file
containing them all. There's even a man page.

Its all available in a GPLed tarball:
http://www.libelle-systems.com/free/portmanteau/portmanteau.tgz


Martin




Re: List of urls

2010-10-26 Thread Raymond Dijkxhoorn

Hi!


Now i do like this :

uri url_1 /www.domain1.com/
uri url_2 /www.domain2.com/
uri url_3 /www.domain3.com/
uri url_4 /www.domain4.com/

score url_1 10
score url_2 10
score url_3 10
score url_4 10


Isnt this a bit expensive? Report to SURBL or something and you get them 
added ;) (send a mail to raym...@surbl.org)


For your question, why dont you regexp it?

uri url_1 /www.domain(1|2|3|4).com/

The exact regexp is naturally depending on the domains but you dont need a 
seperate check for all.


The best to handle domains is putting them in a small rbl, or get them 
added to a existing rbl.


Bye,
Raymond.


Re: List of urls

2010-10-26 Thread Karsten Bräckelmann
On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
 For your question, why dont you regexp it?
 
 uri url_1 /www.domain(1|2|3|4).com/
 
 The exact regexp is naturally depending on the domains but you dont need a 
 seperate check for all.

One way to consolidate them, yes -- depending on the nature of the
strings to match it can be very intuitive and natural.

The other technique you can use are meta rules, together with
non-scoring sub-rules to prevent the individual parts from scoring
(default of 1, if not set explicitly).

  uri __MY_BL_001 /example.(com|net)/
  uri __MY_BL_002 /example.org/

  meta  MY_BL  __MY_BL_001 || __MY_BL_002
  score MY_BL  10.0

Note though, that the above uri matches are not sufficiently strict
(similar to the OPs example) and might result in FPs.

The dot in an RE matches any char, and must be escaped to match a
literal dot. Also, the REs should be anchored, either at the left or
right end, to prevent possibly matching innocent bystanders. Since
parsed URIs are guaranteed to have a protocol (pre-pended by SA, if
none), this would be much more safe than the simple example above.

  uri __MY_BL_000  m~^https?://(www\.)?example\.org(/|$)~

It is anchored at the beginning of the URI, allows an optional www
host name, and is anchored at the end to further prevent FPs. Oh, and it
also uses m// with an alternative delimiter, so I don't have to escape
the slash in the RE.

How strict you want your uri rule REs depends on your level of paranoia
and the domains to match.


 The best to handle domains is putting them in a small rbl, or get them 
 added to a existing rbl.

Well, it certainly depends on the amount of URIs, and how frequently the
list may change. SA config is not suitable for frequent changes, but
would be way easier to set up than a local RBL, if the list isn't too
large and mostly static.

Adding to existing URI DNSBLs isn't always an option, btw. URL
shorteners may have a place in severely size-constrained messages of
sorts, but have no business in mail. They won't be blacklisted by the
mayor players out there, though. ;)


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: List of urls

2010-10-26 Thread John Hardin

On Tue, 26 Oct 2010, Karsten Br?ckelmann wrote:


On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:

For your question, why dont you regexp it?

uri url_1 /www.domain(1|2|3|4).com/


The other technique you can use are meta rules

 uri __MY_BL_001 /example.(com|net)/
 uri __MY_BL_002 /example.org/

 meta  MY_BL  __MY_BL_001 || __MY_BL_002
 score MY_BL  10.0


The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
the case, the regex alternatives and meta solutions aren't appropriate and 
there's no way to avoid one score line per URI rule.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...the Fates notice those who buy chainsaws...
  -- www.darwinawards.com
---
 5 days until Halloween

Re: List of urls

2010-10-26 Thread John Hardin

On Tue, 26 Oct 2010, Richard Smits wrote:

Does anyone know if it's possible to have a list of url's, and define a 
score for all of them in one line ?



Now i do like this :

uri url_1 /www.domain1.com/
uri url_2 /www.domain2.com/
uri url_3 /www.domain3.com/
uri url_4 /www.domain4.com/

score url_1 10
score url_2 10
score url_3 10
score url_4 10


But I want just one line to define the score. Are there more ways to do 
this?


Do you want ten points total if _any_ targeted URI hits, or ten points for 
each targeted URI that hits regardless of how many hit?


The latter is what you are doing above.

--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...the Fates notice those who buy chainsaws...
  -- www.darwinawards.com
---
 5 days until Halloween


Re: List of urls

2010-10-26 Thread Martin Gregorie
On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:
 On Tue, 26 Oct 2010, Karsten Brckelmann wrote:
 
  On Tue, 2010-10-26 at 10:53 +0200, Raymond Dijkxhoorn wrote:
  For your question, why dont you regexp it?
 
  uri url_1 /www.domain(1|2|3|4).com/
 
  The other technique you can use are meta rules
 
   uri __MY_BL_001 /example.(com|net)/
   uri __MY_BL_002 /example.org/
 
   meta  MY_BL  __MY_BL_001 || __MY_BL_002
   score MY_BL  10.0
 
 The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
 the case, the regex alternatives and meta solutions aren't appropriate and 
 there's no way to avoid one score line per URI rule.
 
? What about 'tflags multiple' as in:

uriRULE /(example.(com|net)|example.org|...)/
tflags RULE multiple
score  RULE 10

The only (minor) drawback I've found is that the list of firing rules
can filled with RULE, RULE, RULE, by the type of spam that contains
nothing but tens of lines pushing variations on a theme such as:

Buy FAMOUS SHOE basketMax
Buy FAMOUS SHOE basketSuper
Buy FAMOUS SHOE basketWimp
Buy FAMOUS SHOE runningMax
 


Martin





Re: List of urls

2010-10-26 Thread John Hardin

On Tue, 26 Oct 2010, Martin Gregorie wrote:


On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:


The OP wasn't clear whether he wanted ten points _per URI hit_. If that's
the case, the regex alternatives and meta solutions aren't appropriate and
there's no way to avoid one score line per URI rule.


? What about 'tflags multiple' as in:

uriRULE /(example.(com|net)|example.org|...)/
tflags RULE multiple
score  RULE 10


You're right. I didn't think of that.


--
 John Hardin KA7OHZhttp://www.impsec.org/~jhardin/
 jhar...@impsec.orgFALaholic #11174 pgpk -a jhar...@impsec.org
 key: 0xB8732E79 -- 2D8C 34F4 6411 F507 136C  AF76 D822 E6E6 B873 2E79
---
  ...the Fates notice those who buy chainsaws...
  -- www.darwinawards.com
---
 5 days until Halloween


Re: List of urls

2010-10-26 Thread Karsten Bräckelmann
On Tue, 2010-10-26 at 20:10 +0100, Martin Gregorie wrote:
 On Tue, 2010-10-26 at 10:37 -0700, John Hardin wrote:

  The OP wasn't clear whether he wanted ten points _per URI hit_. If that's 
  the case, the regex alternatives and meta solutions aren't appropriate and 
  there's no way to avoid one score line per URI rule.
 
 ? What about 'tflags multiple' as in:
 
 uriRULE /(example.(com|net)|example.org|...)/
 tflags RULE multiple
 score  RULE 10
 
 The only (minor) drawback I've found is that the list of firing rules
 can filled with RULE, RULE, RULE, by the type of spam that contains
 nothing but tens of lines pushing variations on a theme such as:

tflags multiple can be quite dangerous, though, if it directly results
in a hit. As per your example. Besides possibly flooding the report, it
also can seriously bias the overall score easily.

URI DNSBL hits, for example, do not count how often a domain is in the
spam, but hit once only.

The safest approach for tflags multiple rules is to trigger other rules
based on the number of hits. meta rules explicitly support this.

  meta FOO_4  __TFLAGS_MULTIPLE_SUB = 4


-- 
char *t=\10pse\0r\0dtu...@ghno\x4e\xc8\x79\xf4\xab\x51\x8a\x10\xf4\xf4\xc4;
main(){ char h,m=h=*t++,*x=t+2*h,c,i,l=*x,s=0; for (i=0;il;i++){ i%8? c=1:
(c=*++x); c128  (s+=h); if (!(h=1)||!t[s+h]){ putchar(t[s]);h=m;s=0; }}}



Re: List of urls

2010-10-26 Thread Martin Gregorie
On Tue, 2010-10-26 at 23:59 +0200, Karsten Bräckelmann wrote:
 The safest approach for tflags multiple rules is to trigger other rules
 based on the number of hits. meta rules explicitly support this.
 
   meta FOO_4  __TFLAGS_MULTIPLE_SUB = 4
 
Yes, I agree. Equally importantly, is to avoid use giant-killing scores.
I'd think 1.0 per hit would be as high as you'd ever want to use.

FWIW I have only two multiples - one scores 0.1 per hit and the other
uses 1.0 - the second one scans for relatively complex phrases that are
unlikely to be seen outside advertising blurb or the speech of a
sales-droid, and as a consequence multiple hits are fairly rare - its
only multiple to punish outbreaks of salesorrhea and is only used in
metas (often with the othyer multiple, which tags product names and
descriptions of stuff I'd never buy. I'm a private user, not an ISP: can
you tell?  :-)
 

Martin