On Wed, 21 Apr 2004, Lachlan Andrew wrote:
> Date: Wed, 21 Apr 2004 23:13:27 +1000
> From: Lachlan Andrew <[EMAIL PROTECTED]>
> To: Gilles Detillieux <[EMAIL PROTECTED]>,
Christopher Murtagh <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: [htdig-dev] Re: Performance issue with exclude_urls
>
> Greetings Gilles + all,
>
> Yes, I agree that we need a more "polished" patch for the
> distribution. I still like my intermediate path: If *any* server
> blocks or URL blocks are used, then the user takes the performance
> hit and re-parses each time. If *no* server/URL blocks are used, we
> use Chris's patch. This should be just as fast as Chris's patch (in
> the "3.1-compatibly mode" without server/URL blocks), and just as
> flexible as the current status (if blocks are used). If that can get
> ht://Dig fast enough to get into sarge, then I suggest we implement
> it first, and then work on Gilles's more complete solution at more
> leisure.
I applied Chris' patch and ran htdig on the same site as before for
profile; htdig ran ~40% faster than last time;) Here is the profile:
ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/htdig.gmon.exclude_perform.gz
> A first hack at this (not even compile-tested) is attached, patched
> relative to Chris's patched version, so you can see what I mean. If
> people are in favour, I'll try to work on it over the weekend.
The "slightly-better.0" patch applies, but it does not compile:
Retriever.cc: In method `int Retriever::IsValidURL(const String &)':
Retriever.cc:998: `config_server_URL_blocks' undeclared (first use this function)
Retriever.cc:998: (Each undeclared identifier is reported only once
Retriever.cc:998: for each function it appears in.)
gmake[1]: *** [Retriever.o] Error 1
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED]
> One issue with caching input strings is that we would have to have
> some sort of cache-flushing, or just let the storage grow as HtRegEx
> is called repeatedly.
>
> Cheers,
> Lachlan
>
> On Wed, 21 Apr 2004 07:45 am, Gilles Detillieux wrote:
> > Hi, Chris and other developers. The problem with this fix is that
> > exclude_urls and bad_querystr can no longer be used in server
> > blocks or URL blocks, as they'll only be parsed once regardless of
> > how they're used.
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev