[rewritten/remeasured as per suggestion by Andy Kleen]

  Hello,

I've tried to measure some cache misses of 4.0.1 and 4.1.0 C++
compilers by using oprofile on amd64 box while compiling MICO sources
and found that:

0) compiler options used were:
   -I../include  -Wall -D_REENTRANT -D_GNU_SOURCE   -DPIC -fPIC  -c

1) the most expensive seems to be comptypes -- at least from data L2
   refill point of view (~17%)

2) comptypes is also the most CPU intensive operation since the most
   of time is spent there

3) some other data L2 refill expensive functions seems to be:
   push_to_top_level(~6%), compparms(~4%),
   htab_find_slot_with_hash(~3%), ggc_alloc_stat(~3%)

4) for 4.0.1 every data L2 refill happens every 774 (CPU_CLK_UNHALTED
   * 100 / DATA_CACHE_REFILLS_FROM_SYSTEM) CLK event

5) for 4.1.0 every data L2 refill happens every 765 CLK event

6) 4.1.0 is a _bit_ faster than 4.0.1

7) tables were produced after three cycles of "make; find . -name '*.o'
   -exec rm \{} \;"

I don't know if ICACHE_MISSES is that important since I think it
measures L1 I cache misses instead of L2. If I'm not right please
correct me.

First few lines of produced tables are below. One table is for overall
cc1plus run and one is for symbol listing.

Please let me know if you find something like that useful so I will
continue from time to time to provide such data or if it is completely
useless and I will try to help somewhere else.

Thanks!
Karel

GCC 4.0.1 20050514 (prerelease):
silence:~$ 
~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v
Using built-in specs.
Target: amd64-linux-gnu
Configured with: ../gcc-4_0-branch/configure 
--prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu
 --enable-shared --enable-threads --enable-languages=c++ --disable-checking 
--enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt 
amd64-linux-gnu
Thread model: posix

CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) 
with a unit mask of 0x1f (All cache states) count 1000
CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...|
  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------
  5937586 100.000   4068766 100.000    767082 100.000 cc1plus

CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) 
with a unit mask of 0x1f (All cache states
) count 1000
samples  %        samples  %        samples  %        symbol name
282129    4.7516  84062     2.0660  133054   17.3455  comptypes
222187    3.7420  35072     0.8620  14406     1.8780  lookup_fnfields_1
189661    3.1942  99075     2.4350  22870     2.9814  ggc_alloc_stat
163945    2.7611  10867     0.2671  1238      0.1614  dfs_walk_all
129072    2.1738  6189      0.1521  1649      0.2150  record_reg_classes
115945    1.9527  11575     0.2845  6508      0.8484  walk_tree
104466    1.7594  34266     0.8422  1044      0.1361  find_reloads
78529     1.3226  11466     0.2818  4045      0.5273  splay_tree_splay_helper
71485     1.2039  1881      0.0462  1164      0.1517  _cpp_lex_direct
66814     1.1253  52100     1.2805  23340     3.0427  htab_find_slot_with_hash
66042     1.1123  16046     0.3944  5365      0.6994  lookup_field_1
64969     1.0942  16433     0.4039  19151     2.4966  ht_lookup_with_hash
63059     1.0620  29488     0.7247  20545     2.6783  tsubst
60314     1.0158  124283    3.0546  1902      0.2480  grokdeclarator
59543     1.0028  5354      0.1316  3547      0.4624  cp_walk_subtrees
58087     0.9783  518       0.0127  398       0.0519  _cpp_clean_line
57753     0.9727  372       0.0091  63        0.0082  
dfs_find_final_overrider_pre
50981     0.8586  3283      0.0807  47105     6.1408  push_to_top_level


GCC 4.1.0 20050514 (experimental): silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v Using built-in specs. Target: amd64-unknown-linux-gnu Configured with: ../gcc-main/configure --prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu Thread model: posix gcc version 4.1.0 20050514 (experimental)

CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) 
with a unit mask of 0x1f (All cache states) count 1000
CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...|
  samples|      %|  samples|      %|  samples|      %|
------------------------------------------------------
  5892854 100.000   3907118 100.000    769938 100.000 cc1plus

CPU: AMD64 processors, speed 1802.33 MHz (estimated)
Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask 
of 0x00 (No unit mask) count 100000
Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 
0x00 (No unit mask) count 1000
Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) 
with a unit mask of 0x1f (All cache states
) count 1000
samples  %        samples  %        samples  %        symbol name
264029    4.4805  61866     1.5834  119923   15.5757  comptypes
209962    3.5630  35886     0.9185  15013     1.9499  lookup_fnfields_1
204992    3.4787  87966     2.2514  23110     3.0015  ggc_alloc_stat
168846    2.8653  17736     0.4539  1303      0.1692  dfs_walk_all
124715    2.1164  5806      0.1486  1771      0.2300  record_reg_classes
123015    2.0875  13427     0.3437  7191      0.9340  walk_tree
97145     1.6485  40692     1.0415  1079      0.1401  find_reloads
81300     1.3796  802       0.0205  631       0.0820  _cpp_lex_direct
74550     1.2651  9374      0.2399  3920      0.5091  splay_tree_splay_helper
69103     1.1727  1888      0.0483  31028     4.0299  compparms
67387     1.1435  14429     0.3693  5538      0.7193  lookup_field_1
67245     1.1411  27061     0.6926  21805     2.8320  tsubst
63820     1.0830  25820     0.6608  23317     3.0284  htab_find_slot_with_hash
62961     1.0684  5905      0.1511  18892     2.4537  ht_lookup_with_hash
61731     1.0476  143774    3.6798  1811      0.2352  grokdeclarator
61177     1.0382  6439      0.1648  3442      0.4470  cp_walk_subtrees
57836     0.9815  1432      0.0367  138       0.0179  
dfs_find_final_overrider_pre
57303     0.9724  335       0.0086  445       0.0578  _cpp_clean_line
50819     0.8624  2938      0.0752  48274     6.2699  push_to_top_level

--
Karel Gardas                  [EMAIL PROTECTED]
ObjectSecurity Ltd.           http://www.objectsecurity.com

Reply via email to