As the Original Robots.txt standard is 7 years old I agree it is time for a review and maybe extending the functionality of the robots.txt file.
Maybe something along the lines of the following additions: - Version: [number] 1 - original robots.txt specification syntax 2 - future extended robots.txt specification blank - falls back to version Sugested incase for example Version 2 extends the already existing Allow: line Interval: [number] [interval] [number] - numerical value [interval] - h[ours], d[ays], w[eeks], m[onths], and y[ears]. As suggested by Fred Atkinson AllowTypes: [mimetypes] A list of mime-types the crawler is allowed to retrieve. No longer are we only indexing text/html pages, but also PDF, MS Word, etc e.g. # Allow only the following document types AllowTypes: text/html, text/xml, text/plain, image/jpeg BlockTypes: [mimetypes] A list of mime-types the crawler is allowed to retrieve. e.g. # Do not index PDF Files and MSWord files, All others are # allowed. BlockTypes: application/pdf, application/msword AllowExtension: [extensionlist] A list of filename extensions we should include (maybe mime types have not been configured correctly on the server, or we use .exe files to display html pages with CGI etc) e.g. # Allow the following extensions to be indexed. AllowExtension: .html, .php, .pl BlockExtension: [extensionlist] A list of filename extensions we should exclude The possibilities are endless but each would give better control over how there sites are indexed. /PT -----Original Message----- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Fred Atkinson Sent: 10 January 2004 17:52 To: Robots Subject: [Robots] Robots.txt Evolution? Hi, I've just subscribed to the robots.txt list. I read the messages posted through December. My question is if there is going to be any evolution to the robots.txt coding? It is very limited at present. I can think of a few things they could incorporate that would make it bettter and I'm sure the rest of you could, too. I've had robots.txt files on my sites for years. When I recently researched to see what had changed, it doesn't appear that there is anything new on the horizon. I've got two robots completely blocked out of my system. One is Scooter, which is Alta Vista's robot. When I initially put it in as disallowed, it was because Scooter was hitting my site several times a day. I don't mind them scanning me once in a while to get listings for search engines, but I do object to them hammering my site that frequently. Should not there be coding to tell either a particular robot or group of robots how often they are allowed to scan my site? Maybe a line like (and this is arbitrary): User-agent: Scooter Interval: 30d Disallow: /whatever your want them not to scan when they do come in. This would tell Scooter that he is not to scan again until thirty days after the last scan. There could be codes like h[ours], d[ays], w[eeks], m[onths], and y[ears]. Just an idea I had. Feedback? Fred _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots _______________________________________________ Robots mailing list [EMAIL PROTECTED] http://www.mccmedia.com/mailman/listinfo/robots