Hi Andrew,
Given your latest patch set, and based on my understanding of LLVM's
sink-common-insts transformation, I wonder whether there are two possible
directions worth considering here.
The first would be a more general common-operation sinking/factoring pass
in GCC, perhaps placed in or near the existing sink / phiopt machinery.
Such a pass could try to match identical operations across control-flow
joins, build the required PHIs for differing operands, and then recreate
the common operation after the merge. This would be closer in spirit to
LLVM's approach of expression matching plus a profitability model.
I think some heuristic cost model would be important here, since sinking
is not always profitable and can sometimes hurt performance, for example
by extending live ranges or by moving computation to a less favorable
place in the CFG.
A second, more targeted direction might be to do a sinking/factoring step
specifically before the "No-Conditional Execution" loop conversion in
ifcvt. Since that path is already preparing loops for if-conversion and
eventual vectorization, a local pre-pass there could expose more cases to
the existing ifcvt factor_* logic before the conditional structure is
flattened.
This second approach seems especially relevant for the repeated or chained
factoring cases discussed above: if the current ifcvt machinery can already
handle simple matching-based sinking, then an earlier local sinking step
might make more complex chains visible in a form that ifcvt can consume.
So I was wondering whether the community would prefer to explore:
* a more general common-operation sinking/factoring pass in the middle end,
likely requiring an explicit profitability model, or
* a narrower pre-ifcvt sinking step for ifcvt'able loops, aimed at improving
the existing No-Conditional Execution conversion path.
Thanks again for the detailed explanation and pointers.
Best regards,
Guoce Feng
---- Replied Message ----
From Guoce Feng<[email protected]><mailto:[email protected]>
Date 5/19/2026 12:47
To Andrew
Pinski<[email protected]><mailto:[email protected]>
Cc [email protected]<[email protected]>,
<mailto:[email protected]>Richard Biener<[email protected]>,
<mailto:[email protected]>Andrew Pinski<[email protected]>,
<mailto:[email protected]>Jeff Law<[email protected]>,
<mailto:[email protected]>Kewen
Lin<[email protected]><mailto:[email protected]>
Subject Re: [RFC] Factoring common pure operations across PHIs / branches to
reduce duplicated sqrt calls
Hi Andrew,
Thanks for checking the previous testcase directly.
That clarifies things a lot. With your PR123113 patch set, the example I
sent is handled as desired under
-O3 -fno-math-errno -fno-trapping-math
and the ifcvt result with a single __builtin_sqrt after the conditional
selection is exactly the shape I was hoping for.
So, for that testcase, your patch series already addresses the issue.
More generally, it seems that the current ifcvt factoring machinery can
already handle some relatively simple matching-and-sinking cases.
I am now more interested in the broader extension you mentioned for the
factor_* logic in ifcvt.
As I understand it, the current implementation is mainly able to factor
operations when the matching structure is relatively simple, for example
when one operand already matches across the alternatives. A more general
form would be to recognize the same operation even when two or three
operands differ, and then materialize the corresponding 2/3 PHIs before
rebuilding the operation.
Conceptually, for example,
op1 (op2(a1, b1),c1)
op1 (op2(a2, b2),c2)
could become
a = PHI <a1, a2>
b = PHI <b1, b2>
c = PHI <c1,c2>
op1(op2 (a, b),c)
and similarly for operations with three varying inputs.
It also looks like there may still be a gap for repeated or chained
sinking/factoring opportunities. For example, this reduced case does not
appear to be fully handled:
https://godbolt.org/z/n3ezj1MPq
So, from these examples, my current impression is that ifcvt can already
perform simple matching-based sinking/factoring merges, but there may still
be limitations for multi-step chained sinking/factoring cases.
Is there community interest in extending the ifcvt factor_* machinery in
that direction, namely:
* matching identical operations with multiple differing operands,
* producing the required 2/3 PHIs, and
* handling repeated or chained factoring opportunities when profitable?
This seems related to the more general expression-matching direction you
mentioned, rather than only the current limited factoring cases.
Thanks again for the clarification and the pointers.
Best regards,
Guoce Feng
---- Replied Message ----
From Andrew
Pinski<[email protected]><mailto:[email protected]>
Date 5/19/2026 00:54
To Guoce Feng<[email protected]><mailto:[email protected]>
Cc [email protected]<[email protected]>,
<mailto:[email protected]>Richard Biener<[email protected]>,
<mailto:[email protected]>Andrew Pinski<[email protected]>,
<mailto:[email protected]>Jeff
Law<[email protected]><mailto:[email protected]>
Subject Re: [RFC] Factoring common pure operations across PHIs / branches to
reduce duplicated sqrt calls
On Mon, May 18, 2026 at 2:24 AM Guoce Feng
<[email protected]<mailto:[email protected]>> wrote:
Hi Andrew,
Thanks for the detailed explanation and for pointing me to PR123113 and
the related patch series.
I checked the scope of PR123113 more carefully. My understanding is that
those patches mainly relax existing restrictions in phiopt / cselim /
factor_out_conditional_operation around merge blocks with more than two
predecessors. That certainly seems relevant to part of the problem.
(Inbox<https://inbox.sourceware.org/gcc-bugs/?t=20251215070155&utm_source=chatgpt.com>)
However, I do not think those patches fully cover the motivating case I
had in mind.
The vectorized example you showed has the form
sqrt(expr1[i])
sqrt(expr2[i])
sqrt(expr3[i])
where expr1/expr2/expr3 are already available values, so ifcvt can form
select(select(...))
sqrt(selected_value)
quite naturally.
In my motivating kernel, the operands of sqrt are themselves computed
inside the individual control-flow paths. In simplified form, the three
arms look more like
sqrt(complex_expr1)
sqrt(complex_expr2)
sqrt(complex_expr3)
where the third arm also contains extra intermediate computations before
the final squared-distance expression is formed.
Here is a reduced Compiler Explorer example that still does not get
converted into the desired “blend operands first, then perform one sqrt”
shape:
https://godbolt.org/z/savsxzaWz
With my patch set with the above testcase with -O3 -fno-math-errno
-fno-trapping-math, we get in ifcvt:
```
_3 = _66 ? _21 : _16;
_65 = _122 ? _11 : _3;
distance_24 = __builtin_sqrt (_65);
total_49 = distance_24 + total_53;
```
So it is working like you want it to work. There is only one sqrt in the
vectorized code.
As for the other expressions that could/should be factor out we definitely need
something better than just this.
LLVM has a pass which matching up the expressions and then doing some cost
model.
The factor_* code in ifcvt could be improved to see if it is the same operation
and produce 2/3 phis.
Right now the factor code is only designed to match if one of the operands of
the operation matches because that was easiest at the time of writing it.
We could do this as a "pre"-pass for the ifcvt'able loops too.
Thanks,
Andrea
So the transformation I am interested in is still closer to:
PHI <sqrt(a), sqrt(b), sqrt(c)>
=>
sqrt(PHI <a, b, c>)
in cases where a/b/c may be branch-local computed expressions, not just
already-materialized values.
Put differently:
PR123113 improves factoring around multi-pred PHIs,
but does GCC still lack a stronger common pure-operation
factoring/sinking transformation for branch-local computed operands?
I would be interested in your thoughts on whether this stronger form is
considered worthwhile in GCC, and if so, whether you would expect it to
belong in phiopt, ifcvt, or some other middle-end transformation.
Thanks again for the clarification.
Best regards,
Guoce Feng
---- Replied Message ----
From Andrew
Pinski<[email protected]><mailto:[email protected]>
Date 5/18/2026 16:49
To Guoce Feng<[email protected]><mailto:[email protected]>
Cc [email protected]<[email protected]>,
<mailto:[email protected]>Richard Biener<[email protected]>,
<mailto:[email protected]>Andrew Pinski<[email protected]>,
<mailto:[email protected]>Jeff
Law<[email protected]><mailto:[email protected]>
Subject Re: [RFC] Factoring common pure operations across PHIs / branches to
reduce duplicated sqrt calls
On Mon, May 18, 2026 at 1:16 AM Guoce Feng via Gcc
<[email protected]<mailto:[email protected]>> wrote:
Hi,
I am investigating an optimization opportunity in GCC related to
factoring common pure operations out of control-flow joins.
Consider code of this general form:
if (c1 <= 0.0)
distance = sqrt(expr1);
else if (c2 <= c1)
distance = sqrt(expr2);
else
distance = sqrt(expr3);
After control-flow simplification / if-conversion, this can effectively
lead to a form resembling:
distance = select(cond1,
sqrt(expr1),
select(cond2, sqrt(expr2), sqrt(expr3)));
For vectorized code, this may become "three vector sqrt operations plus
blends", while a more profitable form would be:
merged_expr = select(cond1,
expr1,
select(cond2, expr2, expr3));
distance = sqrt(merged_expr);
That is, transform a conceptual pattern like
phi(sqrt(a), sqrt(b), sqrt(c))
into
sqrt(phi(a, b, c))
when the operation is safe to factor out and doing so is profitable.
I noticed that LLVM has a related transformation in SimplifyCFG under
the sink-common-insts machinery. In GCC, I am trying to understand what
the most appropriate place would be for a similar optimization.
Two possible implementation directions seem plausible:
1. Extend the existing PHI factoring logic in
tree-ssa-phiopt.cc, especially around
factor_out_conditional_operation, to also handle suitable
side-effect-free builtins / calls such as sqrt under the
appropriate semantic restrictions.
So part of the problem with the above case at -Ofast is really
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=123113 .
I have a patch set which is mostly approved but I need to double check
on the order of basic-blocks being looked at.
The patches are:
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://inbox.sourceware.org/gcc-patches/[email protected]/
https://patchwork.sourceware.org/project/gcc/patch/[email protected]/
In that order. (I can't seen to find the 3rd patch on inbox for some reason).
With those patches we get:
```
<bb 2> [local count: 1073741824]:
if (c1_2(D) <= 0.0)
goto <bb 3>; [41.00%]
else
goto <bb 4>; [59.00%]
<bb 3> [local count: 440234144]:
distance_7 = __builtin_sqrt (expr1_6(D)); [tail call]
goto <bb 7>; [100.00%]
<bb 4> [local count: 633507680]:
if (c1_2(D) >= c2_3(D))
goto <bb 5>; [50.00%]
else
goto <bb 6>; [50.00%]
<bb 5> [local count: 316753840]:
<bb 6> [local count: 633507680]:
# _8 = PHI <expr3_4(D)(4), expr2_5(D)(5)>
distance_10 = __builtin_sqrt (_8); [tail call]
<bb 7> [local count: 1073741824]:
# distance_1 = PHI <distance_10(6), distance_7(3)>
return distance_1;
```
Extending factor_* to handle the above could be done but I am not sure
phiopt is the correct place.
Note there is code in the ifcvt that does a similar thing as factor_*
in phiopt which then does handle the rest of it and for:
```
double f(double *expr1, double *expr2, double *expr3, double *c1, double *c2)
{
double t = 0.0;
for(int i =0 ;i < 1024; i++)
{
double distance = 0.0;
if (c1[i] <= 0.0)
distance = __builtin_sqrt(expr1[i]);
else if (c2[i] <= c1[i])
distance = __builtin_sqrt(expr2[i]);
else
distance = __builtin_sqrt(expr3[i]);
t+=distance;
}
return t;
}
```
With -mavx512f (since I was too lazy to figure out what expr* was and
too lazy to test non-x86_64 [since that is what I have built]) we get:
```
_7 = _34 ? _14 : _12;
_63 = _42 ? _6 : _7;
distance_17 = __builtin_sqrt (_63);
t_25 = distance_17 + t_29;
```
In ifcvt.
Which I assume is exactly what you were expecting.
2. Add a more general "common instruction sinking" transformation to
tree-ssa-sink.cc, analogous in spirit to the existing common-store
sinking support, but for a restricted class of pure scalar
computations.
My current intuition is that the PHIOPT direction may be a better fit
for this specific pattern, because the optimization is fundamentally
about factoring a common operation across PHI/merge structure rather
than ordinary code sinking. However, I would appreciate guidance from
the GCC community on the preferred design.
I am especially interested in feedback on:
* Whether tree-ssa-phiopt.cc is indeed the right place for this form of
transformation.
* Whether handling builtins such as sqrt is acceptable under the
relevant floating-point / errno / exception semantics.
* Whether this should be restricted to cases where the operation is
known to be side-effect-free and safe to move.
* Whether profitability should be guided primarily by code size,
vectorization opportunities, or target-independent cost heuristics.
* Whether there is existing work or a PR in this area that I should
build upon.
See above.
A motivating reduced example comes from a geometric distance kernel in
which three branch-local sqrt computations could ideally be turned into
one sqrt after value merging, improving the shape of the vectorized code.
See above.
Thanks,
Andrea
If this direction seems reasonable, I am interested in preparing a
prototype patch and test cases.
Thanks,
Guoce Feng