------- Comment #43 from amonakov at gcc dot gnu dot org 2008-08-29 13:12 ------- Checking original testcase times on x86_64 prescott with gentoo 4.2, 4.3 and today's trunk: 2.960s g++-4.2.4 (GCC) 4.2.4 (Gentoo 4.2.4 p1.0) 2.916s g++-4.3.1 (Gentoo 4.3.1-r1 p1.1) 4.3.1 3.993s g++ (GCC) 4.4.0 20080829 (experimental) 2.796s g++ (GCC) 4.4.0 20080829 (experimental) with --param max-inline-insns-auto=126
So I believe lack of inlining is the biggest 4.4's problem. We do not inline 3x3 matrix multiplication in benchmark loop. While looking at it I found that einline2 dump does not always show the reason for not inlining. I would like to propose the following patch: --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -1494,6 +1494,8 @@ cgraph_decide_inlining_incrementally (struct cgraph_node *node, } if (cgraph_default_inline_p (e->callee, &failed_reason)) inlined |= try_inline (e, mode, depth); + else if (dump_file) + fprintf (dump_file, "Not inlining: %s.\n", failed_reason); } node->aux = (void *)(size_t) old_mode; return inlined; -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=33604