Re: [PATCH, PR38785] Throttle PRE at -O3

Richard Guenther Thu, 26 Apr 2012 06:18:14 -0700

On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov <ma...@codesourcery.com> wrote:
> On 18/04/2012, at 9:17 PM, Richard Guenther wrote:
>
>> On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov <ma...@codesourcery.com> 
>> wrote:
>>> Steven,
>>> J"orn,
>>>
>>> I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
>>> version of your combined patch attached to PR38785 still works very well.  
>>> Would you mind me getting it through upstream review, or are there any 
>>> issues with contributing this patch to GCC mainline?
>>>
>>> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since 
>>> GCC 4.4, and it didn't show any performance or correctness problems on x86, 
>>> ARM, MIPS, and other architectures.  It also reliably fixes bitmnp01 
>>> regression, which is still present in current mainline.
>>>
>>> I have tested this patch on recent mainline on i686-linux-gnu with no 
>>> regressions.  Unless I hear from you to the contrary, I will push this 
>>> patch for upstream review and, hopefully, get it checked in.
>>>
>>> Previous discussion of this patch is at 
>>> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html
>>
>> The addition of -ftree-pre-partial-partial is ok if you change its name to
>> -ftree-partial-pre and add documentation to invoke.texi.
>
> OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?
>
>>
>> +             /* Assuming the expression is 50% anticipatable, we have
>> +                to multiply the number of insertions needed by two for a 
>> cost
>> +                comparison.  */
>>
>> why assume 50% anticipatibility if you can compute the exact
>> anticipatibility?
>
> Do you mean partial anticipatibility as in the updated patch?  To compute 
> exact anticipatibility we would need to traverse the bottom part of CFG, to 
> get the numbers right.
>
>>
>> +             if (!optimize_function_for_speed_p (cfun)
>>
>> please look at how I changed regular PRE to look at whether we want to
>> optimize the path which has the redundancy for speed via
>> optimize_edge_for_speed_p.  The same surgerly should be applied to
>> PPRE.
>
> OK.  In the updated patch we require at least one of the successors the 
> partially anticipates the expression to be on the speed path.
>
> Any other comments?


+ppre_n_insert_for_speed_p (pre_expr expr, basic_block block,
+                          unsigned int inserts_needed)
+{
+  /* The more expensive EXPR is, the more we should be prepared to insert
+     in the predecessors of BLOCK to make EXPR fully redundant.
+     For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use
+     SSA_NAME with a constant as cheap.  */

the function implementation is odd.  cost is always 1 when used, and
both memory loads and calls are always cheap, but for example casts are not?
Isn't

return EDGE_COUNT (block->preds) * cost >= inserts_needed;

always true?  Or is inserts_needed not what it suggests?

+             /* Insert only if we can remove a later expression on a path
+                that we want to optimize for speed.
+                The phi node that we will be inserting in BLOCK is not free,
+                and inserting it for the sake of !optimize_for_speed successor
+                may cause regressions on the speed path.  */
+             FOR_EACH_EDGE (succ, ei, block->succs)
+               {
+                 if (bitmap_set_contains_value (PA_IN (succ->dest), val))
+                   {
+                     if (optimize_edge_for_speed_p (succ))
+                       do_insertion = true;
+
+                     ++pa_succs;
+                   }
+               }

the change up to here looks good to me - can you factor out the command-line
switch plus this optimize_edge_for_speed_p test into a separate patch
(hereby approved)?

Thanks,
Richard.

> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>

Re: [PATCH, PR38785] Throttle PRE at -O3

Reply via email to