According to Toxik - Dann Cohen:
> I'm already using htdig-3.2.0b2, do you know by any chance where in
> the code is the validation made?
> 
> Thanks in advance

Yes, my mistake.  I should have clued in to the fact this was 3.2, because
of the HTTP statistics you included in your original message.

The list gets converted into a long regular expression, which is then
handled by the HtRegex::set() method in htlib/HtRegex.cc.  This method
in turn passes the string to regcomp(), which I believe may be either
the local C library's version of the function, or taken from the bundled
htlib/regex.c code.  regcomp() should return an error code if it can't
use the expression, but it seems that only causes the limits checking code
to assume there are no limits.  I think we need better error checking
and reporting for the HtRegex class!

> -----Original Message-----
> From: Gilles Detillieux [mailto:[EMAIL PROTECTED]]
> Sent: 4 janvier, 2001 13:21
> To: Toxik - Dann Cohen
> Cc: Gilles Detillieux; [EMAIL PROTECTED]
> Subject: Re: [htdig3-dev] Fetching outside of domain list (not supposed
> to)
> 
> According to Toxik - Dann Cohen:
> > Hi Gilles,
> > 
> > If I set the max_hop_count to 0, it will only fetch the first page,
> > and want it to fetch 1 page further so max_hop_count need to be at 1
> > but what's happening is that the fetch goes behond the 1800 domains,
> > when it's supposed to reject the domain that are not in the start_url...
> > 
> > Any suggestion, by the way it works fine when there less domain say
> > 1500 domains ??? very strange...
> 
> Hmmm.  I imagine that the very long list in start_url, which gets
> transferred to limit_urls_to by default, is overflowing the StringMatch
> state table for the limits matching.  I don't know that there's an easy
> fix for this.  The 3.2 code will be using regular expression handling
> rather than StringMatch for the limit_urls_to attribute, but I don't know
> for a fact that it too won't have problems with a huge list like this.


-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 


Reply via email to