According to Katherine Porter:
> Summary:
>
> bad_querystr -- when its blank, Retriever.cc regex matches ANY string
> with a query string (any URL that contains a "?") thus rejecting
> the URLs.
Right you are. This seems to have broken when the HtRegexList class was
added. HtRegexList::setEscaped() checks for an empty pattern list, and
sets "compiled" to TRUE if it finds one. This breaks HtRegexList::match()'s
test for a null pattern. Geoff, is it safe to just change that TRUE to
FALSE in setEscaped(), or will this break something else?
> Retriever.cc bad_querystr matching code segment is comparing
> bad_querystr with the entire URL, not just the query string.
> It should be looking past the "?" only.
Correct again. The call to match() should be using "ext", not "url".
--
Gilles R. Detillieux E-mail: <[EMAIL PROTECTED]>
Spinal Cord Research Centre WWW: http://www.scrc.umanitoba.ca/~grdetil
Dept. Physiology, U. of Manitoba Phone: (204)789-3766
Winnipeg, MB R3E 3J7 (Canada) Fax: (204)789-3930
_______________________________________________
htdig-dev mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/htdig-dev