I still cannot index geocities pages. Here's what they put in their
robots.txt file:
# htdig knows where to go.
User-agent: htdig/3.1.0b1
Disallow: /admin/ # all paths except neighborhoods and members
section are disallowed
Disallow: /auditor/
Disallow: /cgi_emails/
Disallow: /cgi_html/
Disallow: /cgi-bin/
Disallow: /chat/
Disallow: /classes/
Disallow: /companies/
Disallow: /dbm_files/
Disallow: /demos/
Disallow: /error_messages/
Disallow: /errors/
Disallow: /features/
Disallow: /GeoPartners/
Disallow: /geoplus/
Disallow: /geoshops/
Disallow: /geostore/
Disallow: /geoworld/
Disallow:/GreetingCards/
Disallow: /guide/
Disallow: /homestead/
Disallow: /hoodpages/
Disallow: /htmlfrag/
Disallow: /images/
Disallow: /include/
Disallow: /index.html
Disallow: /java/
Disallow: /join/
Disallow: /LunarAwards/
Disallow: /main/
Disallow: /marketplace/
Disallow: /mediakit/
Disallow: /pictures/
Disallow: /portfolio/
Disallow: /ProgrammersPavilion/
Disallow: /pv/
Disallow: /realmedia/
Disallow: /search/
Disallow: /server-errors/
Disallow: /thread-images/
Disallow: geobook.html
I'm not too familiar with how it is all supposed to be but it appears this
doesn't cut it. I'm trying to index various neighborhoods on request of the
folks running those neighborhoods, in case you were a wonderin.
----------------------------------------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word "unsubscribe" in
the body of the message.