[rewritten/remeasured as per suggestion by Andy Kleen]
Hello,
I've tried to measure some cache misses of 4.0.1 and 4.1.0 C++ compilers by using oprofile on amd64 box while compiling MICO sources and found that:
0) compiler options used were: -I../include -Wall -D_REENTRANT -D_GNU_SOURCE -DPIC -fPIC -c
1) the most expensive seems to be comptypes -- at least from data L2 refill point of view (~17%)
2) comptypes is also the most CPU intensive operation since the most of time is spent there
3) some other data L2 refill expensive functions seems to be: push_to_top_level(~6%), compparms(~4%), htab_find_slot_with_hash(~3%), ggc_alloc_stat(~3%)
4) for 4.0.1 every data L2 refill happens every 774 (CPU_CLK_UNHALTED * 100 / DATA_CACHE_REFILLS_FROM_SYSTEM) CLK event
5) for 4.1.0 every data L2 refill happens every 765 CLK event
6) 4.1.0 is a _bit_ faster than 4.0.1
7) tables were produced after three cycles of "make; find . -name '*.o' -exec rm \{} \;"
I don't know if ICACHE_MISSES is that important since I think it measures L1 I cache misses instead of L2. If I'm not right please correct me.
First few lines of produced tables are below. One table is for overall cc1plus run and one is for symbol listing.
Please let me know if you find something like that useful so I will continue from time to time to provide such data or if it is completely useless and I will try to help somewhere else.
Thanks! Karel
GCC 4.0.1 20050514 (prerelease): silence:~$ ~/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu/bin/c++ -v Using built-in specs. Target: amd64-linux-gnu Configured with: ../gcc-4_0-branch/configure --prefix=/home/karel/usr/local/gcc-4_0-branch-20050514-mt-allocator-amd64-linux-gnu --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib --enable-libstdcxx-allocator=mt amd64-linux-gnu Thread model: posix
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states) count 1000 CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...| samples| %| samples| %| samples| %| ------------------------------------------------------ 5937586 100.000 4068766 100.000 767082 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states ) count 1000 samples % samples % samples % symbol name 282129 4.7516 84062 2.0660 133054 17.3455 comptypes 222187 3.7420 35072 0.8620 14406 1.8780 lookup_fnfields_1 189661 3.1942 99075 2.4350 22870 2.9814 ggc_alloc_stat 163945 2.7611 10867 0.2671 1238 0.1614 dfs_walk_all 129072 2.1738 6189 0.1521 1649 0.2150 record_reg_classes 115945 1.9527 11575 0.2845 6508 0.8484 walk_tree 104466 1.7594 34266 0.8422 1044 0.1361 find_reloads 78529 1.3226 11466 0.2818 4045 0.5273 splay_tree_splay_helper 71485 1.2039 1881 0.0462 1164 0.1517 _cpp_lex_direct 66814 1.1253 52100 1.2805 23340 3.0427 htab_find_slot_with_hash 66042 1.1123 16046 0.3944 5365 0.6994 lookup_field_1 64969 1.0942 16433 0.4039 19151 2.4966 ht_lookup_with_hash 63059 1.0620 29488 0.7247 20545 2.6783 tsubst 60314 1.0158 124283 3.0546 1902 0.2480 grokdeclarator 59543 1.0028 5354 0.1316 3547 0.4624 cp_walk_subtrees 58087 0.9783 518 0.0127 398 0.0519 _cpp_clean_line 57753 0.9727 372 0.0091 63 0.0082 dfs_find_final_overrider_pre 50981 0.8586 3283 0.0807 47105 6.1408 push_to_top_level
GCC 4.1.0 20050514 (experimental): silence:~$ ~/usr/local/gcc-main-20050514/bin/c++ -v Using built-in specs. Target: amd64-unknown-linux-gnu Configured with: ../gcc-main/configure --prefix=/home/karel/usr/local/gcc-main-20050514 --enable-shared --enable-threads --enable-languages=c++ --disable-checking --enable-__cxa_atexit --disable-multilib amd64-unknown-linux-gnu Thread model: posix gcc version 4.1.0 20050514 (experimental)
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states) count 1000 CPU_CLK_UNHALT...|ICACHE_MISSES:...|DATA_CACHE_REF...| samples| %| samples| %| samples| %| ------------------------------------------------------ 5892854 100.000 3907118 100.000 769938 100.000 cc1plus
CPU: AMD64 processors, speed 1802.33 MHz (estimated) Counted CPU_CLK_UNHALTED events (Cycles outside of halt state) with a unit mask of 0x00 (No unit mask) count 100000 Counted ICACHE_MISSES events (Instruction cache misses) with a unit mask of 0x00 (No unit mask) count 1000 Counted DATA_CACHE_REFILLS_FROM_SYSTEM events (Data cache refills from system) with a unit mask of 0x1f (All cache states ) count 1000 samples % samples % samples % symbol name 264029 4.4805 61866 1.5834 119923 15.5757 comptypes 209962 3.5630 35886 0.9185 15013 1.9499 lookup_fnfields_1 204992 3.4787 87966 2.2514 23110 3.0015 ggc_alloc_stat 168846 2.8653 17736 0.4539 1303 0.1692 dfs_walk_all 124715 2.1164 5806 0.1486 1771 0.2300 record_reg_classes 123015 2.0875 13427 0.3437 7191 0.9340 walk_tree 97145 1.6485 40692 1.0415 1079 0.1401 find_reloads 81300 1.3796 802 0.0205 631 0.0820 _cpp_lex_direct 74550 1.2651 9374 0.2399 3920 0.5091 splay_tree_splay_helper 69103 1.1727 1888 0.0483 31028 4.0299 compparms 67387 1.1435 14429 0.3693 5538 0.7193 lookup_field_1 67245 1.1411 27061 0.6926 21805 2.8320 tsubst 63820 1.0830 25820 0.6608 23317 3.0284 htab_find_slot_with_hash 62961 1.0684 5905 0.1511 18892 2.4537 ht_lookup_with_hash 61731 1.0476 143774 3.6798 1811 0.2352 grokdeclarator 61177 1.0382 6439 0.1648 3442 0.4470 cp_walk_subtrees 57836 0.9815 1432 0.0367 138 0.0179 dfs_find_final_overrider_pre 57303 0.9724 335 0.0086 445 0.0578 _cpp_clean_line 50819 0.8624 2938 0.0752 48274 6.2699 push_to_top_level
-- Karel Gardas [EMAIL PROTECTED] ObjectSecurity Ltd. http://www.objectsecurity.com