According to Paul Keck:
> On Wed, Sep 01, 1999 at 09:36:57AM -0500, Gilles Detillieux wrote:
> > The local_urls processing only works on files with .html or .htm suffixes.
> > If you're trying to index anything else, htdig will go to the server, so
> > it can determine the Content-Type.  Could this be the problem you're running
> > into?  Your local_urls line looks fine to me, provided that you did specify
> > the correct directory.
> 
> Hmmm.  Most of my URLs in documents point to directories (so the index.html
> gets read), e.g. <a href="/some/subdir"> .

With an href like that, htdig will ask the HTTP server for that URL, and
the server will give it a redirect to "/some/subdir/" (i.e. it adds the
trailing slash, to make it clear this is a directory and not a file).
Then, htdig will look in that directory for an index.html file, and if
it finds it, it will read it locally.  If it doesn't find it, it asks the
HTTP server, which will then automatically generate an HTML index from
the directory contents.

> Are you saying that ht://Dig
> reads the first index.html from the filesystem, then sees all these non-
> html URLS and starts using the HTTP requests?  If so, is there any way
> around this besides changing my URLs (or search engines)?  A patch maybe?

Yes, if any href points to a file whose name doesn't end in .html or
htm, or to a directory which doesn't contain an index.html, then htdig
passes the request onto the HTTP server.  We've wanted for some time
to expand this capability to handle other file types, but this would
require either adding individual mappings of suffix to content-type in
Document::RetrieveLocal() on an add-hoc basis (which would get ugly very
quickly), or add support for mime.types handling to the code (preferable,
but no one has picked up the ball to actually implement it).

If you'd like to take a crack at it, please do.  It would be a big help
to a number of users.  I can't justify the time to do it myself, because
our small web site has only a dozen or so non-HTML documents, so it's really
not a priority for us.

> And if I decide to go ahead and let it make all the HHTP requests, how can I
> get it to authenticate itself?  We have many many areas, some with different
> usernames and passwords.  Is this possible?  This is why I was leaning
> toward the filesystem approach.

Yes, this is a problem.  There are plans to add per-site authentication
configuration options, but until then, it would require separate digs of
each site that uses a unique username/password combination, each with its
own -u option, then merging together the results of all digs.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig mailing list, send a message to
[EMAIL PROTECTED] containing the single word unsubscribe in
the SUBJECT of the message.

Reply via email to