Re: [WSG] RE: OT? - spam in forms: use beyesian filters!

Tim Fri, 16 Feb 2007 01:48:27 -0800

I've been watching this thread thinking I must be like a pirate in thegroup.

I kick bad bots off my ship as soon as I see them and do not let thempoke around twice or use any beyesian filters .

I try to make my site very unattractive to most harvesting botscollecting pages or email addresses or trying to exploit mail scripts.I'm on a Linux server. My hataccess file is 100Kb so this unfortunatelyadds extra time at every hit for this list to be checked for banned IPsor User Agents, but it also allows banning any file type being used byan external IP or domain, so it seems to stops hot linking as well asexploits of cgi mail scripts.


RewriteCond %{HTTP_REFERER} ^-?$
RewriteCond %{HTTP_USER_AGENT} ^-?$
RewriteCond %{HTTP_REFERER} !^$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?yoursite.com(/)?.*$
RewriteCond %{HTTP_REFERER} !^http://(www\.)?43.223.82.45(/)?.*$
RewriteRule .*\.(cgi|wav)$ http://www.yourdomain.com/Stolen.jpe

I can email anyone off list my .htaccess file of banned IPs and bannedbots, there are some really well known bad ones, rima-tide.net comes tomind and the user agent WGet is really suspect as well, but could behonestly used.

I also use a cgi that makes endless links, it is only referenced in acouple of spots with the "noindex" and "nofollow" tags which all thegood bots seem to respect. On every few pages I put a dozen or so fakeemail addresses just for bots which do get through and then in the CSShave them as display none "noindex" and "nofollow".


http://www.hereticpress.com/Private/members.foo

Just for today I will ignore hits on the above page, the cgi is calledpoison but it is not used much anymore, most bots do not follow thelinks but a few have been trapped in a loop for a day or so until I seethe logs and ban them by IP or user. I also have a full page of junkemail addresses about 500Kb of them which I occasionally change a fewdetails and update the page modification date. A few bots and humanshave gone for it and then I see them on server logs and ban them, Ireckon that the fake emails will pollute their email spam databasemaking it less valuable.


Junk emails addresses for your favourite spam bots;
http://www.hereticpress.com/Bots.html

Lastly when I put email address in forms or on pages I hide it in somejavascript and encode the individual letters. I know the Javascript isnot the best for accessibility but you can break up an email addressinto parts with Javascript and encode individual letter in ASCII, thisis not a real address below.

<a href="#" tabindex="131"onclick="JavaScript:window.location='m'+'ail'+'to:'+'r'+'@'+'hereticpssm'" accesskey="M"class="LinkItems" title="Contact the webmaster Tim at Heretic PressCtrl+M">Tim</a><br />

I have used the same email address for years and do one more thing.Sorry Windows users, With Mac mail you can return to sender as bounced,so anyone with a valid email will think you don't exist. I discoveredthat some of the spammers allow a small window which you can returnemail to them as bounced but after some time that return email addresswill not work.

That's about most of what I know to prevent spam and exploits of mymail system.


Tim



On 16/02/2007, at 7:49 PM, James Crooke wrote:

but if I don't find a good alternative soon I might also be forced touse them as thespambots out there get smarter and more capable to getting aroundbasic
obsticles like form fields being named differently or checks on ips. 
This was me 6 months ago ^
 
I just had to give in to using a CAPTCHA until a proven solution comesalong.
On 2/16/07, Michael MD <[EMAIL PROTECTED]> wrote:
> SilverStripe Newsletter
> >I personally get very frustrated with captchas, especially reallyawkwardly> >hard to interpret ones. And the questions below are novel for awhile but
> >wear you down after 10->20 a day!
> >
> >One reason I get frustrated with them is that there are greatbeyesian> >filters out there that just "know" if a comment is spam or not.When you> >submit something, it asks a >global webservice if the text seemshuman or> >not, and its very accurate. I only realised these existed late lastyear,
> >but they've been a godsend for the sites we build.
>
> I don't really think that is a good solution...
> look at email spam and how much of it gets though spam filtering...and the
 > risk of false positives is too high for my liking.
> ... we need a better way.
>
> I don't like captchas either and have so far avoided using them butif I> don't find a good alternative soon I might also be forced to usethem as the> spambots out there get smarter and more capable to getting aroundbasic> obsticles like form fields being named differently or checks on ips.(there> are even some spambots blatently using real ip addresses - egrbnnetwork)
>
> I has to disable trackbacks on my site because the submissionprocess for
> those is too open to spambots... (the standard process for submitting
> trackbacks is fundamentally flawed - it lacks an extra step to askfor a
> response from the client to check if the ip is real!)
>
>
>
>
>
> *******************************************************************
> List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
> Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
> Help: [EMAIL PROTECTED]
> *******************************************************************
>
>


--
James
 

*******************************************************************
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
*******************************************************************

The Editor
Heretic Press
http://www.hereticpress.com
Email [EMAIL PROTECTED]



*******************************************************************
List Guidelines: http://webstandardsgroup.org/mail/guidelines.cfm
Unsubscribe: http://webstandardsgroup.org/join/unsubscribe.cfm
Help: [EMAIL PROTECTED]
*******************************************************************

Re: [WSG] RE: OT? - spam in forms: use beyesian filters!

Reply via email to