Geoff Hutchison writes:
 > Anyway, here are the gprof results. The attachment has the full 
 > results and here are the top 25 calls. Is there anything we can do 
 > about making some of these faster? I haven't looked at the WordKey 
 > code, but it's being called a *lot*, which is causing a big slowdown. 
 > The slowdown in Retriever doesn't even look very important from these 
 > results. (I'm also amazed by the sheer number of calls--2 billion?)
 > 
 > -Geoff
 > 
 > Each sample counts as 0.01 seconds.
 >    %   cumulative   self              self     total
 >   time   seconds   seconds    calls  ms/call  ms/call  name
 >   11.49    125.99   125.99 2046356275     0.00     0.00  WordKey::Info(void)
 >   10.80    244.43   118.44 2420354574     0.00     0.00 
 > WordKeyInfo::Instance(void)
 >   10.20    356.27   111.84 1613865799     0.00     0.00  WordKey::NFields(void)
 >    6.84    431.24    74.97 461307291     0.00     0.00 
 > WordKey::Set(int, unsigned int)
 >    5.29    489.28    58.04 99298535     0.00     0.00  WordKey::Clear(void)
 >    5.27    547.08    57.80 30122303     0.00     0.00  Object::~Object(void)
 >    4.27    593.93    46.85 25878449     0.00     0.01 
 > WordKey::CopyFrom(WordKey const &)

...

 I was first amazed to see this. It looks really bad. In a way it is. 
The NFields() method is used by the compression without restriction or
cache, hence the huge number of calls. However, NFields() and Info() are
inlined and only show when compiling with -g -p. Even inlined, given the
number of calls, a cache of the result is more than necessary.

 I've added a cache for the NFields() method and checked with the following
(profiling the t_htdig test):

before:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 11.43      0.04     0.04                             __bam_search
 11.43      0.08     0.04                             memp_fget
  8.57      0.11     0.03                             __bam_cmp
  5.71      0.13     0.02   112949     0.18     0.18  WordKey::Set(int, unsigned int)
  5.71      0.15     0.02    24381     0.82     1.96  WordKey::CopyFrom(WordKey const 
&)
  5.71      0.17     0.02                             Dbt::Dbt(void *, unsigned int)
  5.71      0.19     0.02                             __bam_c_get
  5.71      0.21     0.02                             memp_fset
  2.86      0.22     0.01   766407     0.01     0.01  WordKeyInfo::Instance(void)
  2.86      0.23     0.01   499098     0.02     0.04  WordKey::Info(void)
  2.86      0.24     0.01   418869     0.02     0.07  WordKey::NFields(void)
  2.86      0.25     0.01    41513     0.24     0.62  WordKey::Clear(void)
  2.86      0.26     0.01    34738     0.29     0.29  Object::~Object(void)

after:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  us/call  us/call  name    
 10.26      0.04     0.04    45146     0.89     0.89  stat
 10.26      0.08     0.04                             __bam_search
  7.69      0.11     0.03                             __bam_c_get
  7.69      0.14     0.03                             __bam_c_put
  7.69      0.17     0.03                             memp_fget
  5.13      0.19     0.02    83026     0.24     0.24  WordKeyInfo::Instance(void)
  5.13      0.21     0.02     5333     3.75    17.02  Retriever::got_word(char const 
*, int, int)
  5.13      0.23     0.02        7  2857.14 15820.53  HTML::parse(Retriever &, URL &)
  5.13      0.25     0.02                             __bam_defpfx
  5.13      0.27     0.02                             __db_rdonly
  2.56      0.28     0.01   282611     0.04     0.04  WordKey::Clear(void)
  2.56      0.29     0.01    76884     0.13     0.13  WordKey::WordKey(WordKey const &)
  2.56      0.30     0.01    22045     0.45     0.45  WordKey::Info(void)

 And WordKey::NFields does not even show in the list of functions reported
by gprof. I will try to run profiling without -g to see how it performs,
this will use inlining and provide a basis for comparison.

 Thanks a lot for doing this profiling and apologies for this performance
bottleneck.

 Cheers,

-- 
                Loic Dachary

                24 av Secretan
                75019 Paris
                Tel: 33 1 42 45 09 16
                e-mail: [EMAIL PROTECTED]
                URL: http://www.senga.org/


------------------------------------
To unsubscribe from the htdig3-dev mailing list, send a message to
[EMAIL PROTECTED] 
You will receive a message to confirm this. 

Reply via email to