https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Target Milestone|12.0                        |13.0
           Priority|P1                          |P2

--- Comment #35 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #34)
> As noted the effect of
> 
>   if(...) {
>    ux = 0.005;
>    uy = 0.002;
>    uz = 0.000;
>   }
> 
> is PRE of most(!) dependent instructions, creating
> 
>   # prephitmp_1099 = PHI <_1098(6),
> 6.49971724999999889149648879538290202617645263671875e-1(5)>
>   # prephitmp_1111 = PHI <_1110(6),
> 1.089805708333333178483570691241766326129436492919921875e-1(5)>
> ...
> 
> we successfully coalesce the non-constant incoming register with the result
> but have to emit copies for all constants on the other edge where we have
> quite a number of duplicate constants to deal with.
> 
> I've experimented with ensuring we get _full_ PRE of the dependent
> expressions
> by more aggressively re-associating (give PHIs with a constant incoming
> operand
> on at least one edge a rank similar to constants, 1).
> 
> This increases the number of PHIs further but reduces the followup
> computations
> more.  We still fail to simply tail-duplicate the merge block - another
> possibility to eventually save some of the overhead, our tail duplication
> code (gimple-ssa-split-paths.cc) doesn't handle this case since the
> diamond is not the one immediately preceeding the loop exit/latch.
> 
> The result of "full PRE" is a little bit worse than the current state (so
> it's not a full solution here).

Btw, looking at coverage the constant case is only an umimportant fraction
of the runtime, so the register pressure increase by the PRE dominates
(but the branch is predicted to be 50/50):

3562383000:  241:               if( TEST_FLAG_SWEEP( srcGrid, ACCEL )) {
 55296000:  242:                        ux = 0.005;
 55296000:  243:                        uy = 0.002;
 55296000:  244:                        uz = 0.000;
        -:  245:                }

we can also see that PGO notices this and we do _not_ perform the PRE.

So the root cause is nothing we can fix for GCC 12, tuning to avoid
spilling to GPRs can recover parts of the regression but will definitely
have effects elsewhere.

Re-targeting to GCC 13.

Reply via email to