According to Dan Langille:
> On 1 Oct 2001 at 17:53, Geoff Hutchison wrote:
> > On Mon, 1 Oct 2001, Dan Langille wrote:
> > > redirect: http://www.unixathome.org/adsl/archives/2001_06/
> > > 
> > >    Rejected: URL not in the limits!
> > 
> > Right. This is what I suspected. In your config file, the limit_urls_to
> > attribute is restricting the indexing from looking at these URLs. So it
> > would help if you could post from your configuration things like:
> > 
> > limit_urls_to:
> > exclude_urls:
> > max_hop_count:
> 
> limit_urls_to:          ${start_url}
> exclude_urls:           /cgi-bin/ .cgi /phorum/
> max_hop_count: <== not found in config file.

That's the problem.  Your start_url is something like

   http://unixathome.org/

but the redirect gives http://www.unixathome.org/adsl/..., which doesn't
match the pattern in limit_urls_to as it has simply taken on the value of
start_url.  You should probably set the following in your htdig.conf:

limit_urls_to:  http://unixathome.org/ http://www.unixathome.org/
server_aliases: www.unixathome.org:80=unixathome.org:80

The limit_urls_to will allow URLs with or without the "www.", and the
server_aliases will strip off the "www." to avoid getting duplicates
in the database, with and without the "www." prefix.  If you prefer,
you couls also set limit_urls_to as...

limit_urls_to:  ${start_url} http://www.unixathome.org/

so any subsequent additions to start_url won't be excluded by limit_urls_to.

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-general mailing list <[EMAIL PROTECTED]>
To unsubscribe, send a message to <[EMAIL PROTECTED]> with a 
subject of unsubscribe
FAQ: http://htdig.sourceforge.net/FAQ.html

Reply via email to