Geoff Hutchison writes:
> Anyway, here are the gprof results. The attachment has the full
> results and here are the top 25 calls. Is there anything we can do
> about making some of these faster? I haven't looked at the WordKey
> code, but it's being called a *lot*, which is causing a big slowdown.
> The slowdown in Retriever doesn't even look very important from these
> results. (I'm also amazed by the sheer number of calls--2 billion?)
>
> -Geoff
>
> Each sample counts as 0.01 seconds.
> % cumulative self self total
> time seconds seconds calls ms/call ms/call name
> 11.49 125.99 125.99 2046356275 0.00 0.00 WordKey::Info(void)
> 10.80 244.43 118.44 2420354574 0.00 0.00
> WordKeyInfo::Instance(void)
> 10.20 356.27 111.84 1613865799 0.00 0.00 WordKey::NFields(void)
> 6.84 431.24 74.97 461307291 0.00 0.00
> WordKey::Set(int, unsigned int)
> 5.29 489.28 58.04 99298535 0.00 0.00 WordKey::Clear(void)
> 5.27 547.08 57.80 30122303 0.00 0.00 Object::~Object(void)
> 4.27 593.93 46.85 25878449 0.00 0.01
> WordKey::CopyFrom(WordKey const &)
...
I was first amazed to see this. It looks really bad. In a way it is.
The NFields() method is used by the compression without restriction or
cache, hence the huge number of calls. However, NFields() and Info() are
inlined and only show when compiling with -g -p. Even inlined, given the
number of calls, a cache of the result is more than necessary.
I've added a cache for the NFields() method and checked with the following
(profiling the t_htdig test):
before:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
11.43 0.04 0.04 __bam_search
11.43 0.08 0.04 memp_fget
8.57 0.11 0.03 __bam_cmp
5.71 0.13 0.02 112949 0.18 0.18 WordKey::Set(int, unsigned int)
5.71 0.15 0.02 24381 0.82 1.96 WordKey::CopyFrom(WordKey const
&)
5.71 0.17 0.02 Dbt::Dbt(void *, unsigned int)
5.71 0.19 0.02 __bam_c_get
5.71 0.21 0.02 memp_fset
2.86 0.22 0.01 766407 0.01 0.01 WordKeyInfo::Instance(void)
2.86 0.23 0.01 499098 0.02 0.04 WordKey::Info(void)
2.86 0.24 0.01 418869 0.02 0.07 WordKey::NFields(void)
2.86 0.25 0.01 41513 0.24 0.62 WordKey::Clear(void)
2.86 0.26 0.01 34738 0.29 0.29 Object::~Object(void)
after:
Each sample counts as 0.01 seconds.
% cumulative self self total
time seconds seconds calls us/call us/call name
10.26 0.04 0.04 45146 0.89 0.89 stat
10.26 0.08 0.04 __bam_search
7.69 0.11 0.03 __bam_c_get
7.69 0.14 0.03 __bam_c_put
7.69 0.17 0.03 memp_fget
5.13 0.19 0.02 83026 0.24 0.24 WordKeyInfo::Instance(void)
5.13 0.21 0.02 5333 3.75 17.02 Retriever::got_word(char const
*, int, int)
5.13 0.23 0.02 7 2857.14 15820.53 HTML::parse(Retriever &, URL &)
5.13 0.25 0.02 __bam_defpfx
5.13 0.27 0.02 __db_rdonly
2.56 0.28 0.01 282611 0.04 0.04 WordKey::Clear(void)
2.56 0.29 0.01 76884 0.13 0.13 WordKey::WordKey(WordKey const &)
2.56 0.30 0.01 22045 0.45 0.45 WordKey::Info(void)
And WordKey::NFields does not even show in the list of functions reported
by gprof. I will try to run profiling without -g to see how it performs,
this will use inlining and provide a basis for comparison.
Thanks a lot for doing this profiling and apologies for this performance
bottleneck.
Cheers,
--
Loic Dachary
24 av Secretan
75019 Paris
Tel: 33 1 42 45 09 16
e-mail: [EMAIL PROTECTED]
URL: http://www.senga.org/
------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED]
You will receive a message to confirm this.