Jim wrote... > An index file is not a requirement. You can specify a list of > individual URLs in the start_url attribute. Or you can use backticks > with start_url to provide a regular file containing a list of > individual URLs (e.g. start_url: `/path/to/list_file`). Right, but... the backtic to point start_url at a file STILL wants you to point (via the file list of URL's), to URL's. If I point to the directory where all the text files are that I want to index, htdig STILL looks for a startup file (index.html) to decide "what is on the website" and thus what should be indexed. Is there something I'm missing here?
> Though not what you want, I think this is the proper behavior. Even > though htdig is going directly to the file system, it is still > happening within a web server context. In that context, the document > root is by definition the root from which all other paths are built. In > essence, you have explicitly defined / to be equivalent to > /u1/index/html/. I would disagree with that... htdig is not acting like a webserver to do it's job. It's acting like a web client. The concept you mention about all references being with regards to a "DocumentRoot" is a webserver concept, not a client concept. For example, lynx, ie, netscape, and mozilla, ALL will do this correctly if I point it at my "content" area using "file://". Only HtDig does it differently. And HtDig does it "like a webserver" would with regards to automatically prepending the "documentroot". So I agree that Htdig acts "logically" if you think of HtDig as a webserver. However, I believe that it is acting like a client browser in practice. I think HtDig should let the webserver put the document root on the front, and if you're going through local files, it shouldn't do this. > If you want to stick with some sort of index page and keep everything > where it is, the only thing I can think of is defining > http://localhost/ to map to / and changing start_url accordingly. > Depending on your environment, this might be a bad idea in terms of > security. > > If you primary goal here is just full path names for the files into the > database, you might also want to take a look at the following two > attributes that support manipulation of the URLs. > > http://www.htdig.org/attrs.html#url_rewrite_rules > http://www.htdig.org/attrs.html#url_part_aliases I'll play around with those suggestions today and see if it gets me closer to what I want. Thanks a million! Jay West --- [This E-mail scanned for viruses by Declude Virus] ------------------------------------------------------- This sf.net email is sponsored by:ThinkGeek Welcome to geek heaven. http://thinkgeek.com/sf _______________________________________________ htdig-general mailing list <[EMAIL PROTECTED]> To unsubscribe, send a message to <[EMAIL PROTECTED]> with a subject of unsubscribe FAQ: http://htdig.sourceforge.net/FAQ.html

