Re: [htdig] Problems indexing non-web content with absolute paths

Jim Cole Mon, 09 Dec 2002 14:00:44 -0800

On Monday, December 9, 2002, at 07:39 AM, Jay West wrote:

Jim wrote...
An index file is not a requirement. You can specify a list of
individual URLs in the start_url attribute. Or you can use backticks
with start_url to provide a regular file containing a list of
individual URLs (e.g. start_url: `/path/to/list_file`).
Right, but... the backtic to point start_url at a file STILL wants you to
point (via the file list of URL's), to URL's. If I point to the directory
where all the text files are that I want to index, htdig STILL looks for a
startup file (index.html) to decide "what is on the website" and thus what
should be indexed. Is there something I'm missing here?

You don't have to point to just a directory. You can list individual files using the same URLs you would otherwise have put in your index file. I was just suggesting this as a potential way to get rid of the extra index file you mentioned. I have never actually done this with local files, so perhaps I am the one missing something.

Though not what you want, I think this is the proper behavior. Even
though htdig is going directly to the file system, it is still
happening within a web server context. In that context, the document
root is by definition the root from which all other paths are built. In
essence, you have explicitly defined / to be equivalent to
/u1/index/html/.

I would disagree with that... htdig is not acting like a webserver to do
it's job. It's acting like a web client. The concept you mention about all

You are of course correct that htdig itself is acting in the role of a client. However I believe the intent of all the local_url stuff is in essence to masquerade the file system as a web server; you are making the file system look like a web server so that htdig can communicate with it as a web client.

not a client concept. For example, lynx, ie, netscape, and mozilla, ALL will
do this correctly if I point it at my "content" area using "file://". Only
HtDig does it differently. And HtDig does it "like a webserver" would with

If you are looking for support of the file:// protocol, and dealing with an environment that can tolerate beta code, you might try one of the 3.2.x snapshots. They support file:///path/to/files/ URLs. I believe their are some requirements on file extension in order to support MIME type determination, but I don't recall the exact details. I think .txt, .html, and .htm just work.

Jim

-------------------------------------------------------
This sf.net email is sponsored by:ThinkGeek
Welcome to geek heaven.
http://thinkgeek.com/sf
_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Re: [htdig] Problems indexing non-web content with absolute paths

Reply via email to