Richard,
I have never done it myself, but I believe it is fairly easy to modify
an Apache listings page. 
By inserting a meta tag with "no index,follow" you are instructing htdig
to follow the hyperlinks, but not record the page itself, which is
exactly what you want.

For (ii), there is little reason why such a script should look any
different to the user than doing an htdig search. All of the html code
for results pages is available to you in your installation to copy into
the new script, and 'triggering' the script can be as simple as an
additional button within your existing search form - that way you get to
use the same 'restrict' and 'exclude' drop-downs that you already use.

Mike
PS One of the days I will permanently remember the way that this list
works - my reply was meant to go to the list, not just to yourself.

> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf 
> Of Richard Guthrie
> Sent: Monday, January 16, 2006 2:31 PM
> To: [email protected]
> Subject: RE: [htdig] Two queries
> 
> Mike,
> 
> Thanks a lot for getting back to me and providing some useful info 
> here.  You are right in assuming I am working on an intranet site.
> 
> For (i), there doesn't seem to be an easy way of inserting 
> robots tags as I 
> think these are the automatically generated files by Apache 
> to give the 
> directory structure that htdig can follow.  If I stop htdig 
> looking at the 
> files, will I also lose the indexing for the directories they 
> represent?  Is there a way of telling Apache to enter a 
> relevant code in 
> the file as it generated to get htdig to not display it as 
> part of the results?
> 
> With regard to (ii), I think I may pursue the options you 
> suggest with a 
> non-htdig solution.  However, giving users just one interface 
> to check for 
> files may be an advantage, which is why I think I would still 
> like to try 
> an htdig approach if it were easily possible.  Unfortunately too few 
> documents would have a consistent string that could be used 
> to search for.
> 
> Cheers,
> Richard
> 
> At 12:37 2006-01-16 +0000, you wrote:
> >Richard,
> >I think that both of these are solvable.
> >(i) If you have access to modify these indexes, then try adding a
> ><robots noindex,follow> tag (check the syntax, I'm sure that's not
> >correct)
> >or try adding <!-- htdig no-index --> tags, provided you have links
> >elsewhere.
> >I believe that such indexes are served as a redirect to:
> >whatever/folder/index.html  in which case you should be able 
> to set up
> >an exclude for them, again assuming that you don't need to 
> index their
> >links.
> >
> >(ii) The first thing that comes to mind is to implement this without
> >htdig. PERL, PHP or Java would all be able to produce what is
> >effectively just a directory listing, and if carefully 
> written should be
> >more efficient than an htdig search.
> >Second idea would be to try and find a common term, such as your site
> >name, that appears on every page. If this is a public site then that
> >should normally be added to the bad-words list, but this 
> sounds like an
> >Intranet?
> >
> >
> >Good luck,
> >Mike
> 
> 
> 
> -------------------------------------------------------
> This SF.net email is sponsored by: Splunk Inc. Do you grep 
> through log files
> for problems?  Stop!  Download the new AJAX search engine that makes
> searching your log files as easy as surfing the  web.  
> DOWNLOAD SPLUNK!
> http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
> _______________________________________________
> ht://Dig general mailing list: <[email protected]>
> ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
> List information (subscribe/unsubscribe, etc.)
> https://lists.sourceforge.net/lists/listinfo/htdig-general
> 


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_idv37&alloc_id865&op=click
_______________________________________________
ht://Dig general mailing list: <[email protected]>
ht://Dig FAQ: http://htdig.sourceforge.net/FAQ.html
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-general

Reply via email to