HtDig 3.1.6, FreeBSD 4.7

I want to index the words in a set of files, with the file names in which
certain words appear.

It would appear that htdig MUST have an index.html file, even though this
technically isn't web content. So I wrote a php script that goes through the
directory, and creates an index.html file. However, I want this index.html
file to reside OUTSIDE of the directory structure of the files being
indexed. More to the point, here's the real example.

htdig.conf:
database_dir:   /u1/index/database/hdc
local_urls_only: true
local_urls:             http://localhost/=/u1/index/html/
local_default_doc:      hdc.index.html
start_url:              http://localhost/

so the html file being indexed is in /u1/index/html and is called
hdc.index.html.

Here is a fragment of the hdc.index.html file, and not that it doesn't use
relative paths it uses absolute paths:
<html><body>
<a href="/u1/xfer/hdc/cat_copy/00251.txt">00251</a><br />
<a href="/u1/xfer/hdc/cat_copy/00278.txt">00278</a><br />
<a href="/u1/xfer/hdc/cat_copy/00279.txt">00279</a><br />

Note that the actual text files above (.txt) contain the words to be
indexed, so I can find all the .txt files which contain the word "rugs" for
example. Note the href is an absolute path, not a relative one.

How for the error message. Running htdig on the hdc.index.html file with -vv
gives lots of messages like this:
pick: localhost, # servers = 1
66:66:1:http://localhost/u1/xfer/hdc/cat_copy/02128.txt: Trying local files
  tried local file /u1/index/html/u1/xfer/hdc/cat_copy/02128.txt
 not found

So what is happening is, htdig is taking the "documentroot" file path
(/u1/index/html/) and prepending it to the absolute paths reference in the
html file (/u1/xfer/hdc/cat_copy/.....)... thus coming up with a totally
bogus path and hence the failure.

This confuses me - I would think that if the path in the href in the html
didn't begin with a / that htdig would do exactly what it's doing. But, if
the path in the href in the html DOES being with a / I wouldn't think htdig
should be sticking the path to the "documentroom" in front.

I've beat my head against this for a long time... can someone offer a
suggestion? I could move the index.html file to sit above the *.txt files
and use relative paths in the hrefs, but for other reasons I'd prefer not to
do this. Please reply to this email address directoy ([EMAIL PROTECTED]) as
I'm not on this list.

Thanks!

Jay West

---
[This E-mail scanned for viruses by Declude Virus]



-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to