According to Katherine Porter:
> Summary:
> 
>   bad_querystr -- when its blank, Retriever.cc regex matches ANY string
> with a query string (any URL that contains a "?") thus rejecting
> the URLs.

Right you are.  This seems to have broken when the HtRegexList class was
added.  HtRegexList::setEscaped() checks for an empty pattern list, and
sets "compiled" to TRUE if it finds one.  This breaks HtRegexList::match()'s
test for a null pattern.  Geoff, is it safe to just change that TRUE to
FALSE in setEscaped(), or will this break something else?

>   Retriever.cc bad_querystr matching code segment is comparing
> bad_querystr with the entire URL, not just the query string.
> It should be looking past the "?" only.

Correct again.  The call to match() should be using "ext", not "url".

-- 
Gilles R. Detillieux              E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre       WWW:    http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba  Phone:  (204)789-3766
Winnipeg, MB  R3E 3J7  (Canada)   Fax:    (204)789-3930

_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev

Reply via email to