Hey all.  A few weeks back I proposed putting some archived messages
up on a page as part of Mark Reda's "Winter Project Page".  There was
concern that spiders or spammers might slurp up email addresses and spam
the list--a very real fear.

Someone put forth a robots.txt file to keep spiders from sweeping
the web site, and someone else proposed passwd protecting it.  I
think passwds are kind of prohibitive and more work to maintain and
distribute.  I would still do the robots.txt file, but I also came up
with another possible solution that might make everyone feel more
comfortable.

I wrote a perl script to replace all email addresses in a web page
(.html file) with an empty string "". Then I ran it on all the pages in
the list archive I've been playing with.  All the links still work, but
there are no email addresses anywhere in the files that a spammer could
possibly grab.

The downside is that someone looking at the archive will not be able to
email the original poster directly.  The other downside is that people
who don't have a complete From: field (seems to only be aol addresses
so far) get their name, which was just an email address, wiped off the
index pages.

Check out what I mean:
http://members.home.net/vwbrd/a2/

Thoughts?

Brian
1989 Jetta GLi Wolfsburg 16v
-- 
The 21st century begins on January 1, 2001.
_____________
List Sponsor: http://www.netsville.com
To remove yourself from this list, send mail to [email protected] with 
'unsubscribe a2_16v' in the body of your message
See us on the web at http://www.a2-16v.com
Visit the 16V Homepage at http://www.gti16v.org

Reply via email to