On Fri, Feb 11, 2005 at 07:21:00PM -0000, Aengus wrote:
> On Friday, February 11, 2005 7:14 PM [GMT],
> Ken Schweigert <[EMAIL PROTECTED]> wrote:
> 
> > Or ... to regenerate it at your convenience:
> > 
> > 
> > [EMAIL PROTECTED] tmp]$ wget http://www.robotstxt.org/wc/active/all.txt
> > [EMAIL PROTECTED] tmp]$ grep "robot-name:" all.txt | awk -F: '{print $2}' |
> > sed 's/^ *//g' | sort | awk '{print "ROBOTINCLUDE \"" $1 "*\""}' 
> 
> grep "robot-name:" or grep "robot-useragent:"?
> 

I used robot-name because there were entries for robot-useragent that
had stuff like:

robot-useragent:                Due to a deficiency in Java it's not currently 
possible to set the User-Agent.
robot-useragent:None
robot-useragent: no
robot-useragent:

This kind of messed up the list and using robot-name produces a list
more like Jeremy's.  Maybe he can chime in and let us know the correct
way.

-- 
Ken Schweigert, Network Administrator
Byte Productions, LLC
http://www.byte-productions.com
+------------------------------------------------------------------------
|  TO UNSUBSCRIBE from this list:
|    http://lists.meer.net/mailman/listinfo/analog-help
|
|  Usenet version: news://news.gmane.org/gmane.comp.web.analog.general
|  List archives:  http://www.analog.cx/docs/mailing.html#listarchives
+------------------------------------------------------------------------

Reply via email to