On Sat, 17 Apr 2004, Lachlan Andrew wrote:
> Date: Sat, 17 Apr 2004 00:20:07 +1000
> From: Lachlan Andrew <[EMAIL PROTECTED]>
> To: Joe R. Jah <[EMAIL PROTECTED]>
> Cc: [EMAIL PROTECTED]
> Subject: Profile
>
> Thanks, Joe :)
>
> There are some unexpected things about this profile. In particular,
> it seems that some files are not being profiled.
>
> * regcomp is taking *much* too much time! As I understand, regular
> expressions are being used for things which really "should" be just
> straight string compares. I vote that we go back to doing standard
> compares rather than create an escaped string and then call an
> expensive regex function...
>
> * Rather than 'fork'ing a separate external parser each time, could
> we look at having a "persistent parser", like a persistent TCP
> connection? It would require a way of specifying the end of one
> file and the start of the next, but it looks like the performance
> gain might be worthwhile.
>
> * regcomp, calloc and gethostbyname all seem to be being called a lot
> more often than gprof recognises.
>
> * gethostbyname seems to be too expensive. Joe, were you using
> "persistent connections"? Gabriele, could we reduce the number
> of times gethostbyname is called? Perhaps we could cache the
> names?
That's the default, and I did not override it in my configuration file.
By the way, all indexed documents were on the same server as htdig.
Regards,
Joe
--
_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah [EMAIL PROTECTED]
> * None of the functions in Connection.cc seem to be profiled.
> Any ideas why?
>
> Thanks all,
> Lachlan
>
> On Wed, 14 Apr 2004 14:36, Joe R. Jah wrote:
>
> > I compiled htdig-3.2.0b5 with -pg; the following patches applied:
> > DESTDIR.0 TMPFILE.0 extension_filter.0 fileSpace.0 operator[].0 and
> > robots.0
> >
> > I ran htdig on ~13k documents; it ran about 40% slower than my
> > regular htdig, (without -pg). I ran gprof htdig > htdig.gmon;gzip
> > htdig.gmon, and put the profile on the patch site, although it's
> > not a patch;)
> >
> > ftp://ftp.ccsf.org/htdig-patches/3.2.0b5/htdig.gmon.gz
> >
> > Hope it can help in improving htdig performance.
> >
> > Regards,
> >
> > Joe
-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
ht://Dig Developer mailing list:
[EMAIL PROTECTED]
List information (subscribe/unsubscribe, etc.)
https://lists.sourceforge.net/lists/listinfo/htdig-dev