According to Toxik - Dann Cohen:
> Hi Gilles,
>
> If I set the max_hop_count to 0, it will only fetch the first page,
> and want it to fetch 1 page further so max_hop_count need to be at 1
> but what's happening is that the fetch goes behond the 1800 domains,
> when it's supposed to reject the domain that are not in the start_url...
>
> Any suggestion, by the way it works fine when there less domain say
> 1500 domains ??? very strange...
Hmmm. I imagine that the very long list in start_url, which gets
transferred to limit_urls_to by default, is overflowing the StringMatch
state table for the limits matching. I don't know that there's an easy
fix for this. The 3.2 code will be using regular expression handling
rather than StringMatch for the limit_urls_to attribute, but I don't know
for a fact that it too won't have problems with a huge list like this.
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.