On Thursday, June 19, 2003, at 09:59 AM, Dan Muey wrote:

1)
If I have some .cgi scripts that output html will htdig index those to?

Yes. Where the HTML comes and how it is generated doesn't matter. If the server returns it as part of a valid HTTP response, all should be good.


2)
Is so, will it index urls I give it in a .url file that have query strings?

Yes. In general the provided URL's are simply passed to the server as part of an HTTP request, just as your browser would do.


3)
Do the url's I put in a .url file for rundig to index need to be encoded or unencoded?

Encoded. Just as if you were using it for a link in an HTML file.


4)
Will it follow links in html it is indexing that have query strings?

Yes. Assuming of course that it is not explicitly excluded by some attribute setting.


5)
If my main robots.txt file says to not scan /member_area since it's a top secret

I assume you are being facetious? At least I hope so ;) Use of robots.txt is a voluntary thing. The robot author must choose to implement the corresponding functionality. Otherwise the robot is free to do whatever it wants with the content. And of course if the content is even vaguely secret it should not be publicly accessible in the first place.


area and I set up a separate htdig database for /member_area with the authentication
in the config file, will htdig on trying to index /member_area see the robot.txt file
in it's parent directory and forbid indexing of it?

I am not sure that I understand the question. htdig adheres to the robots specification. If you define and place the file according to the specification, htdig should behave accordingly.


If 'Yes' how can I forbid a directory in the robots.txt file and still have a separate db for that directory?

You might want to take a look at the robotstxt_name attribute. It allows you to specify an alternate name. You could specify an alternate in the config file used for /member_area and specify in your robots.txt file that robots with the alternate name should be allowed. See http://www.htdig.org/attrs.html#robotstxt_name for more information on this attribute.


Jim



-------------------------------------------------------
This SF.Net email is sponsored by: INetU
Attention Web Developers & Consultants: Become An INetU Hosting Partner.
Refer Dedicated Servers. We Manage Them. You Get 10% Monthly Commission!
INetU Dedicated Managed Hosting http://www.inetu.net/partner/index.php
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to