Richard, I have never done it myself, but I believe it is fairly easy to modify an Apache listings page. By inserting a meta tag with "no index,follow" you are instructing htdig to follow the hyperlinks, but not record the page itself, which is exactly what you want.
For (ii), there is little reason why such a script should look any different to the user than doing an htdig search. All of the html code for results pages is available to you in your installation to copy into the new script, and 'triggering' the script can be as simple as an additional button within your existing search form - that way you get to use the same 'restrict' and 'exclude' drop-downs that you already use. Mike PS One of the days I will permanently remember the way that this list works - my reply was meant to go to the list, not just to yourself. > -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf > Of Richard Guthrie > Sent: Monday, January 16, 2006 2:31 PM > To: [email protected] > Subject: RE: [htdig] Two queries > > Mike, > > Thanks a lot for getting back to me and providing some useful info > here. You are right in assuming I am working on an intranet site. > > For (i), there doesn't seem to be an easy way of inserting > robots tags as I > think these are the automatically generated files by Apache > to give the > directory structure that htdig can follow. If I stop htdig > looking at the > files, will I also lose the indexing for the directories they > represent? Is there a way of telling Apache to enter a > relevant code in > the file as it generated to get htdig to not display it as > part of the results? > > With regard to (ii), I think I may pursue the options you > suggest with a > non-htdig solution. However, giving users just one interface > to check for > files may be an advantage, which is why I think I would still > like to try > an htdig approach if it were easily possible. Unfortunately too few > documents would have a consistent string that could be used > to search for. > > Cheers, > Richard > > At 12:37 2006-01-16 +0000, you wrote: > >Richard, > >I think that both of these are solvable. > >(i) If you have access to modify these indexes, then try adding a > ><robots noindex,follow> tag (check the syntax, I'm sure that's not > >correct) > >or try adding <!-- htdig no-index --> tags, provided you have links > >elsewhere. > >I believe that such indexes are served as a redirect to: > >whatever/folder/index.html in which case you should be able > to set up > >an exclude for them, again assuming that you don't need to > index their > >links. > > > >(ii) The first thing that comes to mind is to implement this without > >htdig. PERL, PHP or Java would all be able to produce what is > >effectively just a directory listing, and if carefully > written should be > >more efficient than an htdig search. > >Second idea would be to try and find a common term, such as your site > >name, that appears on every page. If this is a public site then that > >should normally be added to the bad-words list, but this > sounds like an > >Intranet? > > > > > >Good luck, > >Mike > > > > ------------------------------------------------------- > This SF.net email is sponsored by: Splunk Inc. Do you grep > through log files > for problems? Stop! Download the new AJAX search engine that makes > searching your log files as easy as surfing the web. > DOWNLOAD SPLUNK! > http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click > _______________________________________________ > ht://Dig general mailing list: <[email protected]> > ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html > List information (subscribe/unsubscribe, etc.) > https://lists.sourceforge.net/lists/listinfo/htdig-general > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_idv37&alloc_id865&op=click _______________________________________________ ht://Dig general mailing list: <[email protected]> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html List information (subscribe/unsubscribe, etc.) https://lists.sourceforge.net/lists/listinfo/htdig-general

