Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-27 Thread Maxim Kuvyrkov
On 27/04/2012, at 1:17 AM, Richard Guenther wrote:

> On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov  
> wrote:
> 
...
> +ppre_n_insert_for_speed_p (pre_expr expr, basic_block block,
> +  unsigned int inserts_needed)
> +{
> +  /* The more expensive EXPR is, the more we should be prepared to insert
> + in the predecessors of BLOCK to make EXPR fully redundant.
> + For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use
> + SSA_NAME with a constant as cheap.  */
> 
> the function implementation is odd.  cost is always 1 when used, and
> both memory loads and calls are always cheap, but for example casts are not?
> Isn't
> 
> return EDGE_COUNT (block->preds) * cost >= inserts_needed;
> 
> always true?  Or is inserts_needed not what it suggests?

It /almost/ is what it suggests.  To account for the fact that inserts will not 
benefit some of the paths (50% in the initial version of the patch, and a more 
precise estimate in a latter version) we tell/lie ppre_n_insert_for_speed_p 
that we need a greater number of the inserts (e.g., 2x the real number) that we 
will perform.

> 
> + /* Insert only if we can remove a later expression on a path
> +that we want to optimize for speed.
> +The phi node that we will be inserting in BLOCK is not free,
> +and inserting it for the sake of !optimize_for_speed 
> successor
> +may cause regressions on the speed path.  */
> + FOR_EACH_EDGE (succ, ei, block->succs)
> +   {
> + if (bitmap_set_contains_value (PA_IN (succ->dest), val))
> +   {
> + if (optimize_edge_for_speed_p (succ))
> +   do_insertion = true;
> +
> + ++pa_succs;
> +   }
> +   }
> 
> the change up to here looks good to me - can you factor out the command-line
> switch plus this optimize_edge_for_speed_p test into a separate patch
> (hereby approved)?

Attached is what I checked in after retesting on a fresh mainline.  This part 
of the patch by itself turns out to fix the bitmnp01 regression in what seems 
to be a reliable way.  Therefore, I'll mark PR38785 as fixed.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




pr38785-2.ChangeLog
Description: Binary data


pr38785-2.patch
Description: Binary data


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-26 Thread Joern Rennecke

Quoting Richard Guenther :


the function implementation is odd.  cost is always 1 when used, and
both memory loads and calls are always cheap, but for example casts are not?
Isn't

return EDGE_COUNT (block->preds) * cost >= inserts_needed;

always true?  Or is inserts_needed not what it suggests?


The cost calculation and use is not an exact model of anything, but only
a rough approximation, and thus, like any heuristic, should be considered
subject to potential change when it can be shown that a change is beneficial
in general.
The idea was to write the code that changes to the cost calculation and use
could be applied with ease.
OTOH such changes shouldn't be adopted just because it looks like they'll
make a more realistic mode, there should be some evidence that they make
the compiler better in general.


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-26 Thread Richard Guenther
On Wed, Apr 25, 2012 at 8:06 AM, Maxim Kuvyrkov  wrote:
> On 18/04/2012, at 9:17 PM, Richard Guenther wrote:
>
>> On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov  
>> wrote:
>>> Steven,
>>> J"orn,
>>>
>>> I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
>>> version of your combined patch attached to PR38785 still works very well.  
>>> Would you mind me getting it through upstream review, or are there any 
>>> issues with contributing this patch to GCC mainline?
>>>
>>> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since 
>>> GCC 4.4, and it didn't show any performance or correctness problems on x86, 
>>> ARM, MIPS, and other architectures.  It also reliably fixes bitmnp01 
>>> regression, which is still present in current mainline.
>>>
>>> I have tested this patch on recent mainline on i686-linux-gnu with no 
>>> regressions.  Unless I hear from you to the contrary, I will push this 
>>> patch for upstream review and, hopefully, get it checked in.
>>>
>>> Previous discussion of this patch is at 
>>> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html
>>
>> The addition of -ftree-pre-partial-partial is ok if you change its name to
>> -ftree-partial-pre and add documentation to invoke.texi.
>
> OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?
>
>>
>> +             /* Assuming the expression is 50% anticipatable, we have
>> +                to multiply the number of insertions needed by two for a 
>> cost
>> +                comparison.  */
>>
>> why assume 50% anticipatibility if you can compute the exact
>> anticipatibility?
>
> Do you mean partial anticipatibility as in the updated patch?  To compute 
> exact anticipatibility we would need to traverse the bottom part of CFG, to 
> get the numbers right.
>
>>
>> +             if (!optimize_function_for_speed_p (cfun)
>>
>> please look at how I changed regular PRE to look at whether we want to
>> optimize the path which has the redundancy for speed via
>> optimize_edge_for_speed_p.  The same surgerly should be applied to
>> PPRE.
>
> OK.  In the updated patch we require at least one of the successors the 
> partially anticipates the expression to be on the speed path.
>
> Any other comments?

+ppre_n_insert_for_speed_p (pre_expr expr, basic_block block,
+  unsigned int inserts_needed)
+{
+  /* The more expensive EXPR is, the more we should be prepared to insert
+ in the predecessors of BLOCK to make EXPR fully redundant.
+ For now, only recognize AND, OR, XOR, PLUS and MINUS of a multiple-use
+ SSA_NAME with a constant as cheap.  */

the function implementation is odd.  cost is always 1 when used, and
both memory loads and calls are always cheap, but for example casts are not?
Isn't

return EDGE_COUNT (block->preds) * cost >= inserts_needed;

always true?  Or is inserts_needed not what it suggests?

+ /* Insert only if we can remove a later expression on a path
+that we want to optimize for speed.
+The phi node that we will be inserting in BLOCK is not free,
+and inserting it for the sake of !optimize_for_speed successor
+may cause regressions on the speed path.  */
+ FOR_EACH_EDGE (succ, ei, block->succs)
+   {
+ if (bitmap_set_contains_value (PA_IN (succ->dest), val))
+   {
+ if (optimize_edge_for_speed_p (succ))
+   do_insertion = true;
+
+ ++pa_succs;
+   }
+   }

the change up to here looks good to me - can you factor out the command-line
switch plus this optimize_edge_for_speed_p test into a separate patch
(hereby approved)?

Thanks,
Richard.

> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-25 Thread Gerald Pfeifer
On Wed, 25 Apr 2012, Maxim Kuvyrkov wrote:
> OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?

Yes, it does, thank you.  Just add "a" and "the" in three places,
as in "A new option", "the...optimization" and "at the...level"
when committing it, please.

Cheers,
Gerald


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-24 Thread Maxim Kuvyrkov
On 18/04/2012, at 9:17 PM, Richard Guenther wrote:

> On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov  
> wrote:
>> Steven,
>> J"orn,
>> 
>> I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
>> version of your combined patch attached to PR38785 still works very well.  
>> Would you mind me getting it through upstream review, or are there any 
>> issues with contributing this patch to GCC mainline?
>> 
>> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since 
>> GCC 4.4, and it didn't show any performance or correctness problems on x86, 
>> ARM, MIPS, and other architectures.  It also reliably fixes bitmnp01 
>> regression, which is still present in current mainline.
>> 
>> I have tested this patch on recent mainline on i686-linux-gnu with no 
>> regressions.  Unless I hear from you to the contrary, I will push this patch 
>> for upstream review and, hopefully, get it checked in.
>> 
>> Previous discussion of this patch is at 
>> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html
> 
> The addition of -ftree-pre-partial-partial is ok if you change its name to
> -ftree-partial-pre and add documentation to invoke.texi.

OK.  Gerald, does the patch for gcc-4.8/changes.html look OK?

> 
> + /* Assuming the expression is 50% anticipatable, we have
> +to multiply the number of insertions needed by two for a cost
> +comparison.  */
> 
> why assume 50% anticipatibility if you can compute the exact
> anticipatibility?

Do you mean partial anticipatibility as in the updated patch?  To compute exact 
anticipatibility we would need to traverse the bottom part of CFG, to get the 
numbers right.

> 
> + if (!optimize_function_for_speed_p (cfun)
> 
> please look at how I changed regular PRE to look at whether we want to
> optimize the path which has the redundancy for speed via
> optimize_edge_for_speed_p.  The same surgerly should be applied to
> PPRE.

OK.  In the updated patch we require at least one of the successors the 
partially anticipates the expression to be on the speed path.

Any other comments?

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



pr38785-rel-note.patch
Description: Binary data


pr38785.ChangeLog
Description: Binary data


pr38785.patch
Description: Binary data


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-23 Thread Gerald Pfeifer
On Wed, 18 Apr 2012, Richard Guenther wrote:
> The addition of -ftree-pre-partial-partial is ok if you change its name to
> -ftree-partial-pre and add documentation to invoke.texi.

...and the GCC 4.8 release notes (gcc-4.8/changes.html). ;-)

Gerald


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-18 Thread Steven Bosscher
On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov  wrote:
> I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
> version of your combined patch attached to PR38785 still works very well.  
> Would you mind me getting it through upstream review, or are there any issues 
> with contributing this patch to GCC mainline?

Not at all, please go ahead! :-)


Re: [PATCH, PR38785] Throttle PRE at -O3

2012-04-18 Thread Richard Guenther
On Wed, Apr 18, 2012 at 4:15 AM, Maxim Kuvyrkov  wrote:
> Steven,
> J"orn,
>
> I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
> version of your combined patch attached to PR38785 still works very well.  
> Would you mind me getting it through upstream review, or are there any issues 
> with contributing this patch to GCC mainline?
>
> We (CodeSourcery/Mentor) were carrying this patch in our toolchains since GCC 
> 4.4, and it didn't show any performance or correctness problems on x86, ARM, 
> MIPS, and other architectures.  It also reliably fixes bitmnp01 regression, 
> which is still present in current mainline.
>
> I have tested this patch on recent mainline on i686-linux-gnu with no 
> regressions.  Unless I hear from you to the contrary, I will push this patch 
> for upstream review and, hopefully, get it checked in.
>
> Previous discussion of this patch is at 
> http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html

The addition of -ftree-pre-partial-partial is ok if you change its name to
-ftree-partial-pre and add documentation to invoke.texi.

+ /* Assuming the expression is 50% anticipatable, we have
+to multiply the number of insertions needed by two for a cost
+comparison.  */

why assume 50% anticipatibility if you can compute the exact
anticipatibility?

+ if (!optimize_function_for_speed_p (cfun)

please look at how I changed regular PRE to look at whether we want to
optimize the path which has the redundancy for speed via
optimize_edge_for_speed_p.  The same surgerly should be applied to
PPRE.

Thanks,
Richard.


> Thank you,
>
> --
> Maxim Kuvyrkov
> CodeSourcery / Mentor Graphics
>


[PATCH, PR38785] Throttle PRE at -O3

2012-04-17 Thread Maxim Kuvyrkov
Steven,
J"orn,

I am looking into fixing performance regression on EEMBC's bitmnp01, and a 
version of your combined patch attached to PR38785 still works very well.  
Would you mind me getting it through upstream review, or are there any issues 
with contributing this patch to GCC mainline?

We (CodeSourcery/Mentor) were carrying this patch in our toolchains since GCC 
4.4, and it didn't show any performance or correctness problems on x86, ARM, 
MIPS, and other architectures.  It also reliably fixes bitmnp01 regression, 
which is still present in current mainline.

I have tested this patch on recent mainline on i686-linux-gnu with no 
regressions.  Unless I hear from you to the contrary, I will push this patch 
for upstream review and, hopefully, get it checked in.

Previous discussion of this patch is at 
http://gcc.gnu.org/ml/gcc-patches/2009-03/msg00250.html

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



pr38785.ChangeLog
Description: Binary data


pr38785.patch
Description: Binary data