Re: Spamassassin default SHORT_URI list obsolete/outdated

2016-07-01 Thread Axb

On 07/01/2016 10:13 AM, Groach wrote:


On 01/07/2016 09:56, Axb wrote:



I then informed him that SA alreadyhas a URL_SHORTENER checking rule
found
in 72_ACTIVE.CF.  I was currently using this as a META rule thus:

meta MY_URI_URLSHORT __URL_SHORTENER  # defined in 72_active.cf


ATM it seems there is no such rule - pls verify the name after running
sa-update


As quoted, it is   "  __URL_SHORTENER  "

The entry reads as follows:

uri __URL_SHORTENER
/^http:\/\/(?:bit\.ly|tinyurl\.com|ow\.ly|is\.gd|tumblr\.com|formspring\.me|ff\.im|youtu\.be|tl\.gd|plurk\.com|migre\.me|j\.mp|cli\.gs|goo\.gl|yfrog\.com|lnk\.ms|su\.pr|fb\.me|alturl\.com|wp\.me|ping\.fm|chatter\.com|post\.ly|twurl\.nl|tiny\.cc|4sq\.com|ustre\.am|short\.to|u\.nu|flic\.kr|budurl\.com|digg\.com|twitvid\.com|gowal\.la|om\.ly|justin\.tv|icio\.us|p\.gs|loopt\.us|tcrn\.ch|xrl\.us|wpo\.st|bkite\.com)\/[^\/]{3}\/?/


ok - found it... and must say this rule is pretty sloppy and should 
probably be deprecated. I hope whoever compiled this list takes  a look 
into this.
It includes  domains which are clearly not URI shorteners, or never used 
in spam, etc.


Imo, this rule can probably be deprecated in favour of network lookups


and is used in other META rules such as MONEY_FRAUD_5 (you see it is
preceeded with "__" )



URL shorteners aren't bad per se so it makes little sense to waste
cycles processing a long list which may or not be abused. Many of
these sites won't be around in 6 months, some  have zero abuse some
may even be NXDOMAIN


You can see from 72_ACTIVE that the idea of using a url shortener isnt
bad by itself and that SA rules do use it in conjunction with other
'more likely' postive matching (such as MONEY_FRAUD_5)


Such rules are best mantained/provided by interested third parties
which may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load
the default rule set with extra bloat doesn't sound very wise.

Why not make this YOUR project?


Ok, well, I will leave it as HIS project ;-)  (the guy who has already
applied his research to provided this surbl lookup).  He also has stated
that many of these sites come and go (as you imply).


His project is to mantain a domain list, similar to Spamhaus DBL's 
section "127.0.1.103 	abused spammed redirector domain"
To mantain a SA rule with that data seems like a redundant effort but if 
someone needs this in would be wiser to tackle it at source to avoid 
stale data.






Re: Spamassassin default SHORT_URI list obsolete/outdated

2016-07-01 Thread Groach


On 01/07/2016 09:56, Axb wrote:


I then informed him that SA alreadyhas a URL_SHORTENER checking rule 
found

in 72_ACTIVE.CF.  I was currently using this as a META rule thus:

meta MY_URI_URLSHORT __URL_SHORTENER  # defined in 72_active.cf


ATM it seems there is no such rule - pls verify the name after running 
sa-update


As quoted, it is   "  __URL_SHORTENER  "

The entry reads as follows:

uri __URL_SHORTENER 
/^http:\/\/(?:bit\.ly|tinyurl\.com|ow\.ly|is\.gd|tumblr\.com|formspring\.me|ff\.im|youtu\.be|tl\.gd|plurk\.com|migre\.me|j\.mp|cli\.gs|goo\.gl|yfrog\.com|lnk\.ms|su\.pr|fb\.me|alturl\.com|wp\.me|ping\.fm|chatter\.com|post\.ly|twurl\.nl|tiny\.cc|4sq\.com|ustre\.am|short\.to|u\.nu|flic\.kr|budurl\.com|digg\.com|twitvid\.com|gowal\.la|om\.ly|justin\.tv|icio\.us|p\.gs|loopt\.us|tcrn\.ch|xrl\.us|wpo\.st|bkite\.com)\/[^\/]{3}\/?/


and is used in other META rules such as MONEY_FRAUD_5 (you see it is 
preceeded with "__" )



URL shorteners aren't bad per se so it makes little sense to waste 
cycles processing a long list which may or not be abused. Many of 
these sites won't be around in 6 months, some  have zero abuse some 
may even be NXDOMAIN


You can see from 72_ACTIVE that the idea of using a url shortener isnt 
bad by itself and that SA rules do use it in conjunction with other 
'more likely' postive matching (such as MONEY_FRAUD_5)


Such rules are best mantained/provided by interested third parties 
which may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load 
the default rule set with extra bloat doesn't sound very wise.


Why not make this YOUR project?


Ok, well, I will leave it as HIS project ;-)  (the guy who has already 
applied his research to provided this surbl lookup).  He also has stated 
that many of these sites come and go (as you imply).


Thanks


Re: Spamassassin default SHORT_URI list obsolete/outdated

2016-07-01 Thread Axb

On 07/01/2016 09:35 AM, jimimaseye wrote:

Recently I was in discussion with the creator of a URI_SHORTENER black list
maintainer that created a list of domains handling short URLs.  (You can
find his full rule and details here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/).
He has identified over 200 CURRENT url shorteners and maintains them
accordingly (viewable here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/url_shorteners.txt).

I then informed him that SA alreadyhas a URL_SHORTENER checking rule found
in 72_ACTIVE.CF.  I was currently using this as a META rule thus:

meta MY_URI_URLSHORT __URL_SHORTENER  # defined in 72_active.cf


ATM it seems there is no such rule - pls verify the name after running 
sa-update



He quite rightly pointed out that the 43 included shortener domains that SA
checks for in the default rule is drastically short and outdated (some even
dont exist anymore) compared to his more current recently 200 researched
list.


URL shorteners aren't bad per se so it makes little sense to waste 
cycles processing a long list which may or not be abused. Many of these 
sites won't be around in 6 months, some  have zero abuse some may even 
be NXDOMAIN


Such rules are best mantained/provided by interested third parties which 
may or not commit to keep them up to date.
SA devs don't really have the time to chase sites/domains and to load 
the default rule set with extra bloat doesn't sound very wise.


Why not make this YOUR project?


Is there any way that maybe the default list that SA checks for in 72_ACTIVE
can be updated and how is this request made or implemented?  (Forgive me, I
dont know how these things work).


See above..




Spamassassin default SHORT_URI list obsolete/outdated

2016-07-01 Thread jimimaseye
Recently I was in discussion with the creator of a URI_SHORTENER black list
maintainer that created a list of domains handling short URLs.  (You can
find his full rule and details here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/). 
He has identified over 200 CURRENT url shorteners and maintains them
accordingly (viewable here:
http://snork.ca/posts/2016-06-24-surbl-of-url-shorteners-for-spamassassin/url_shorteners.txt).

I then informed him that SA alreadyhas a URL_SHORTENER checking rule found
in 72_ACTIVE.CF.  I was currently using this as a META rule thus:

meta MY_URI_URLSHORT __URL_SHORTENER  # defined in 72_active.cf

He quite rightly pointed out that the 43 included shortener domains that SA
checks for in the default rule is drastically short and outdated (some even
dont exist anymore) compared to his more current recently 200 researched
list.

Is there any way that maybe the default list that SA checks for in 72_ACTIVE
can be updated and how is this request made or implemented?  (Forgive me, I
dont know how these things work).



--
View this message in context: 
http://spamassassin.1065346.n5.nabble.com/Spamassassin-default-SHORT-URI-list-obsolete-outdated-tp121584.html
Sent from the SpamAssassin - Users mailing list archive at Nabble.com.