Ah, ok, that's different. I once had a somewhat similar problem with a big customer. Most of the files in his dir structure where not referred to by any other file/href (several thousands). I wrote a little shell script that reads all dirs recursivly and builds one big index.html file with nothing but empty links and no further content whatsoever: <html><body> <a href=/dir1/dir2/file1.html></a> <a href=/dir3/file2.html></a> <a href=/dir4/dir5/dir6/file3.html></a> </body></html> Basically the script pipes the result of a find operation through a simple regex to rewrite the files found into URLs. Then I conf'd htdig to use this file as start point for indexing. The index file itself does not show up in any htsearches as it has no content, but htdig _does_ index every file referred to. The script certainly needs to be run right before every htdig run.
Hope it helps. Marcel --On Dienstag, 18. Juni 2002 17:26 +0200 "Albl, Thomas" <[EMAIL PROTECTED]> wrote: > Hi Marcel, > > in short I want the dig to crawl over a mounted novell-share (our filesys > for the docs) but to exclude the autogenerated pages from apache with the > dir-listings (the index of - Pages with all files an dirs inside). ah, the > novell-share get to the dig by a apache-webserver which has it's document > root at the start of the exported novell-volume. > > But when the dig crawls through the dir-tree it uses the autogenerated > Index Of - Pages as "real" documents with the files an dirs as > matchwords. By now we don't want to index the dirs but the files :-| > seems tricky to solve this i think... if this didn't work so we want to > exclude the dirnames and filenames showed and only index all files by > fulltext... > > -- > Mit freundlichem Gru� > Thomas Albl > Deutscher St�dtetag > Tel. : 0221/3771-210 > FAX : 0221/3771-128 > eMail: mailto:[EMAIL PROTECTED] > Web : http://www.staedtetag.de > > > >> -----Urspr�ngliche Nachricht----- >> Von: Marcel Hicking [mailto:[EMAIL PROTECTED]] >> Gesendet: Dienstag, 18. Juni 2002 16:50 >> An: [EMAIL PROTECTED] >> Cc: Albl, Thomas >> Betreff: Re: AW: [htdig] How to set the sorting order of a >> webserver-export of a plain fil esystem? >> >> >> Not sure exatcly what you try to do (missed the prior posting) >> but maybe Apache/PHP's "autoprepend" and "autoappend" might help. >> They include a static or PHP file at the top/bottom of other files. >> Can be configured through Apache-conf, IIRC within <dir> or <files> >> etc. sections also. >> >> HIH, >> Marcel >> >> >> --On Dienstag, 18. Juni 2002 16:36 +0200 "Albl, Thomas" >> <[EMAIL PROTECTED]> wrote: >> >> > Dear Geoff, >> > >> > thanks for your help. It solves my problem half the way but >> anyway one >> > schould look forward :*) >> > >> > While the filesystem is a mounted novell-share with all our >> docs from our >> > company we can't put a file in each directory (massive amount of >> > directories) so the part to transform the http-header is >> the first step. >> > >> > Is it possible to use *one* htrobots-file for *all* directories? My >> > apache-doc says that the Directive HeaderName can be used even in >> > virtual-host-statements and the file should be placed >> relativly in the >> > directories. But I havn't managed to access one central file with >> > /htrobots.html >> > >> > I think the key to solve my problem is to get the line >> <META NAME="robots" >> > CONTENT="noindex, follow"> in the header somehow - but how, without >> > hundreds of htrobots-files? >> > >> > :) I try another few experiments and next to this hope for help :) >> > >> > Thanks a lot for your help so far! >> > >> > -- >> > Mit freundlichem Gru� >> > Thomas Albl >> > Deutscher St�dtetag >> > Tel. : 0221/3771-210 >> > FAX : 0221/3771-128 >> > eMail: mailto:[EMAIL PROTECTED] >> > Web : http://www.staedtetag.de >> > >> > >> > >> >> -----Urspr�ngliche Nachricht----- >> >> Von: Geoff Hutchison [mailto:[EMAIL PROTECTED]] >> >> Gesendet: Dienstag, 18. Juni 2002 15:40 >> >> An: Albl, Thomas >> >> Cc: [EMAIL PROTECTED] >> >> Betreff: Re: [htdig] How to set the sorting order of a >> >> webserver-export >> >> of a plain fil esystem? >> >> >> >> >> >> > If the dig crawls through this exported filesys it finds >> often used >> >> > searchwords in these generated pages from filenames or >> >> dirnames. Though >> >> >> >> See the FAQ: >> >> <http://www.htdig.org/FAQ.html#q4.23> >> >> >> >> Regards, >> >> >> >> -- >> >> -Geoff Hutchison >> >> Williams Students Online >> >> http://wso.williams.edu/ >> >> >> > >> > >> -------------------------------------------------------------- >> ----------- >> > --- Bringing you mounds of caffeinated joy >> > >>> http://thinkgeek.com/sf <<< >> > >> > _______________________________________________ >> > htdig-general mailing list <[EMAIL PROTECTED]> >> > To unsubscribe, send a message to >> > <[EMAIL PROTECTED]> with a subject of >> > unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html >> >> >> >> > > ------------------------------------------------------------------------- > --- Bringing you mounds of caffeinated joy > >>> http://thinkgeek.com/sf <<< > > _______________________________________________ > htdig-general mailing list <[EMAIL PROTECTED]> > To unsubscribe, send a message to > <[EMAIL PROTECTED]> with a subject of > unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html -- Marcel Hicking VIA NET.WORKS Deutschland GmbH www.vianetworks.de Bismarckstrasse 120,47057 Duisburg ----------------------------------------------------------------- Gesch�ftsf�hrung: Matt Nydell, HRB 7672 Alle Angebote sind unverbindlich. Auftraege erledigen wir zu unseren allgemeinen Geschaeftsbedingungen. ---------------------------------------------------------------------------- Bringing you mounds of caffeinated joy >>> http://thinkgeek.com/sf <<< _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

