> I'm new to to HtDig and this list, but one crude approach could be to > temporarily take the site down, remove .htaccess or whatever > authentication mechanism is in place, and just index everything. > > Another crude approach might be to index temporary copies of these files > located somewhere else then fiddle the URLs. > > Maybe a cronjob could do all this automatically for you. > > I don't know how feasible this is in your case. Maybe HtDig can do > something much more sophisticated. > > Passing usernames and passwords around the net in plain text isn't a > great solution. > > > On Sat, 2004-04-03 at 14:34, Terry Allen wrote: > > Hi all on the list, > > I have been running HTDig for a few years now, with never a > > problem - however, I do have one question which has been asked by a > > client & I'd like to do it if possible. > > Currently, I'm running HTDig 3.16 & the client needs to index > > a password protected site - is there any way I can do this? > > Suggestions have been made to enter the start_url in this > > format: http://username:[EMAIL PROTECTED]/ > > > > All this seems to do is return an error relating to htmerge - > > can anyone tell me what I've done wrong - I have search the FAQ & > > also done a general Google search.
The approach of temporary copies has potential. You could make a copy of the password-protected pages to be indexed located under a URL which is not included in any Web pages and then, when the site is to be indexed, update the copies of the password-protected pages using commands operating on the files rather than on the Web pages, and then run ht-Dig on them using that special URL. Since that URL is not itself used in any Web pages, the pages to which it leads won't be indexed or found otherwise on the Web. If you don't want users o the systems which have access to the files to be able to read them, then you could use file protections to bar access and give the ht-Dig processes access through user or group. This could all be done with a cron job under Unix. To apply the username and password for a cron job, they would have to be in plain text in a file somewhere. The password in the file would have to be updated every time the real password was changed. I don't know if the username and password would be passed around the net in use. ht-Dig will have to pass them to the Web server on access to the protected pages but that will be a single operation or does it have to pass the username and password for every Web page in a protected tree? Douglas ======== Douglas Kline [EMAIL PROTECTED] ------------------------------------------------------- This SF.Net email is sponsored by: IBM Linux Tutorials Free Linux tutorial presented by Daniel Robbins, President and CEO of GenToo technologies. Learn everything from fundamentals to system administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click _______________________________________________ ht://Dig general mailing list: <[EMAIL PROTECTED]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

