On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov <ma...@codesourcery.com> wrote: > On 18/04/2012, at 9:17 PM, Richard Guenther wrote: > >> On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov <ma...@codesourcery.com> >> wrote: >>> Steven, >>> J"orn, >>> >>> I am looking into fixing performance regression on EEMBC's bitmnp01, and a >>> version of your combined patch attached to PR38785 still works very well. >>> Would you mind me getting it through upstream review, or are there any >>> issues with contributing this patch to GCC mainline? >>> >>> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since >>> GCC 4.4, and it didn't show any performance or correctness problems on x86, >>> ARM, MIPS, and other architectures. It also reliably fixes bitmnp01 >>> regression, which is still present in current mainline. >>> >>> I have tested this patch on recent mainline on i686-linux-gnu with no >>> regressions. Unless I hear from you to the contrary, I will push this >>> patch for upstream review and, hopefully, get it checked in. >>> >>> Previous discussion of this patch is at >>> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html >> >> The addition of -ftree-pre-partial-partial is ok if you change its name to >> -ftree-partial-pre and add documentation to invoke.texi. > > OK. Gerald, does the patch for gcc-4.8/changes.html look OK? > >> >> + /* Assuming the expression is 50% anticipatable, we have >> + to multiply the number of insertions needed by two for a >> cost >> + comparison. */ >> >> why assume 50% anticipatibility if you can compute the exact >> anticipatibility? > > Do you mean partial anticipatibility as in the updated patch? To compute > exact anticipatibility we would need to traverse the bottom part of CFG, to > get the numbers right. > >> >> + if (!optimize_function_for_speed_p (cfun) >> >> please look at how I changed regular PRE to look at whether we want to >> optimize the path which has the redundancy for speed via >> optimize_edge_for_speed_p. The same surgerly should be applied to >> PPRE. > > OK. In the updated patch we require at least one of the successors the > partially anticipates the expression to be on the speed path. > > Any other comments?
+ppre_n_insert_for_speed_p (pre_expr expr, basic_block block, + unsigned int inserts_needed) +{ + /* The more expensive EXPR is, the more we should be prepared to insert + in the predecessors of BLOCK to make EXPR fully redundant. + For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use + SSA_NAME with a constant as cheap. */ the function implementation is odd. cost is always 1 when used, and both memory loads and calls are always cheap, but for example casts are not? Isn't return EDGE_COUNT (block->preds) * cost >= inserts_needed; always true? Or is inserts_needed not what it suggests? + /* Insert only if we can remove a later expression on a path + that we want to optimize for speed. + The phi node that we will be inserting in BLOCK is not free, + and inserting it for the sake of !optimize_for_speed successor + may cause regressions on the speed path. */ + FOR_EACH_EDGE (succ, ei, block->succs) + { + if (bitmap_set_contains_value (PA_IN (succ->dest), val)) + { + if (optimize_edge_for_speed_p (succ)) + do_insertion = true; + + ++pa_succs; + } + } the change up to here looks good to me - can you factor out the command-line switch plus this optimize_edge_for_speed_p test into a separate patch (hereby approved)? Thanks, Richard. > Thank you, > > -- > Maxim Kuvyrkov > CodeSourcery / Mentor Graphics >