on Thu, Mar 11, 2004 at 10:29:24AM -0700, Monique Y. Herman ([EMAIL PROTECTED]) wrote: > Hi all! > > I've recently developed an interest in preventing spiders from accessing > certain areas of my site ... but as near as I can tell, robots.txt is > pretty stupid. It only lets you *disallow*, whereas it would be a lot > more sensible for me to specify what I want to *allow*. > > I was thinking I might hack up a little script to generate a robots.txt > file that disallows everything except the files I've listed, but first, > has anyone already done this or seen this done? I'd hate to reinvent > the wheel =) > > In my ideal world, robots.txt wouldn't require you to call out all of > the "hidden" directories on your site ... *sigh* ...
RTFM WRT htaccess. As a note: I found recently that a site I wanted to reference was no longer online (actually, I'd known this for a while), but *also* had a robots.txt prohibiting access. Which meant that the Internet Archive (http://www.archive.org/) didn't provide access to old views of the site. I asked the site owner if he'd modify the robots.txt to allow for display. Well, turns out IA says that having a robots.txt will remove the site from the archive.... But we ran with it anyway, and he removed robots.txt from the site. IA had the full archived site online the same day. Short form of story: don't trust robots.txt for keeping people out of your site. At best, it will restrict well-behaved robots from trawling potentially large parts of your site (with associated bandwidth costs). It's not going to assure that _no_ robots crawl, or that they don't keep copies. For that, you need access control. And people you trust with access. There's an old saying about how three people can keep a secret that I'd tell you here, but both the other people who knew it are dead. Peace. -- Karsten M. Self <[EMAIL PROTECTED]> http://kmself.home.netcom.com/ What Part of "Gestalt" don't you understand? Americans [...] need to watch what they say. -- Ari Fleischer, White House Press Secretary http://www.whitehouse.gov/news/releases/2001/09/20010926-5.html
signature.asc
Description: Digital signature