HtDig 3.1.6, FreeBSD 4.7 I want to index the words in a set of files, with the file names in which certain words appear.
It would appear that htdig MUST have an index.html file, even though this technically isn't web content. So I wrote a php script that goes through the directory, and creates an index.html file. However, I want this index.html file to reside OUTSIDE of the directory structure of the files being indexed. More to the point, here's the real example. htdig.conf: database_dir: /u1/index/database/hdc local_urls_only: true local_urls: http://localhost/=/u1/index/html/ local_default_doc: hdc.index.html start_url: http://localhost/ so the html file being indexed is in /u1/index/html and is called hdc.index.html. Here is a fragment of the hdc.index.html file, and not that it doesn't use relative paths it uses absolute paths: <html><body> <a href="/u1/xfer/hdc/cat_copy/00251.txt">00251</a><br /> <a href="/u1/xfer/hdc/cat_copy/00278.txt">00278</a><br /> <a href="/u1/xfer/hdc/cat_copy/00279.txt">00279</a><br /> Note that the actual text files above (.txt) contain the words to be indexed, so I can find all the .txt files which contain the word "rugs" for example. Note the href is an absolute path, not a relative one. How for the error message. Running htdig on the hdc.index.html file with -vv gives lots of messages like this: pick: localhost, # servers = 1 66:66:1:http://localhost/u1/xfer/hdc/cat_copy/02128.txt: Trying local files tried local file /u1/index/html/u1/xfer/hdc/cat_copy/02128.txt not found So what is happening is, htdig is taking the "documentroot" file path (/u1/index/html/) and prepending it to the absolute paths reference in the html file (/u1/xfer/hdc/cat_copy/.....)... thus coming up with a totally bogus path and hence the failure. This confuses me - I would think that if the path in the href in the html didn't begin with a / that htdig would do exactly what it's doing. But, if the path in the href in the html DOES being with a / I wouldn't think htdig should be sticking the path to the "documentroom" in front. I've beat my head against this for a long time... can someone offer a suggestion? I could move the index.html file to sit above the *.txt files and use relative paths in the hrefs, but for other reasons I'd prefer not to do this. Please reply to this email address directoy ([EMAIL PROTECTED]) as I'm not on this list. Thanks! Jay West --- [This E-mail scanned for viruses by Declude Virus] ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

