Ok, I've solved my problem, and can now have a list of working
exclude_urls without the serious performance decrease. Here are the
changes I made (sorry I'm not sending a proper diff file... need
guidance on how to do that properly):
htdig/htdig.h
--------------------
added:
extern int exclude_checked;
extern int badquerystr_checked;
extern HtRegexList excludes;
extern HtRegexList badquerystr;
htdig/htdig.cc
----------------------
added these as global variable definitions:
int exclude_checked = 0;
int badquerystr_checked = 0;
HtRegexList excludes;
HtRegexList badquerystr;
htdig/Retriever.cc
added these conditionals and removed the previous tmplist creates and
.setEscaped() calls:
if(!(exclude_checked)){
//only parse this once and store into global variable
tmpList.Destroy();
tmpList.Create(config->Find(&aUrl, "exclude_urls"), " \t");
excludes.setEscaped(tmpList, config->Boolean("case_sensitive"));
exclude_checked = 1;
}
if(!(badquerystr_checked)){
//only parse this once and store into global variable
tmpList.Destroy();
tmpList.Create(config->Find(&aUrl, "bad_querystr"), " \t");
badquerystr.setEscaped(tmpList, config->Boolean("case_sensitive"));
badquerystr_checked = 1;
}
The difference in performance is night and day, and the excludes list
is only parsed once per dig rather than at *every* URL found.
If this is at all useful to anyone, let me know. I can send files or if
someone would enlighten me (even RTFM me) I can send diff/patches.
Cheers,
Chris
--
Christopher Murtagh
Enterprise Systems Administrator
ISR / Web Communications Group
McGill University
Montreal, Quebec
Canada
Tel.: (514) 398-3122
Fax: (514) 398-2017
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev