https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113682
Tamar Christina changed:
What|Removed |Added
Status|UNCONFIRMED |NEW
Ever confirmed|0 |1
Component|middle-end |rtl-optimization
Last reconfirmed||2024-04-02
--- Comment #7 from Tamar Christina ---
Ok, so for now I think we should separate out the issue of whether/when we
should perform the ifconversion and instead for a bit look at why we miss these
conversions:
The first problem is that GCC does not support if-converting blocks which
terminate into another branch. In other words, the destination of 1 of the
arms must have a single successor per noce_find_if_block which requires the
final block to be a JOIN block.
In
size_t bsearch2(ITEM* user, ITEM** tree, size_t treeSize) {
auto pos = tree;
size_t low = 0;
size_t high = treeSize;
while (low < high) {
size_t i = (low + high) / 2;
int res = cmp(user, tree[i]);
// These should be cmp + csinc + csel on arm
// and lea + test + cmov + cmov on x86.
low = res > 0 ? i + 1 : low; // csinc
high = res < 0 ? i: high; // csel
if (res == 0) return i;
}
return -1;
}
this fails because of the:
if (res == 0) return i;
which means that the final block doesn't adhere to this. so the cbranch blocks
the ifconversion.
if we simplify the testcase:
typedef unsigned long size_t;
extern int cmp(size_t *);
size_t bsearch2(size_t high, size_t low, size_t i) {
while (low < high) {
int res = cmp();
low = res > 0 ? i + 1 : low;
high = res < 0 ? i: high;
}
return -1;
}
we now detect the csel but not the csinc. This has two reasons, we visit the BB
in entry->exit order, but we probably should do exit->entry, such that when we
if-convert blocks and delete the branches, the preceeding blocks become valid
to if-convert.
but the main reason the csinc is missed is because of code sinking. The update
of the reference is sank into the res > 0 branch.
[local count: 955630224]:
# high_18 = PHI
# low_19 = PHI
res_11 = cmp ();
if (res_11 > 0)
goto ; [59.00%]
else
goto ; [41.00%]
[local count: 563821836]:
i.1_1 = i;
iftmp.0_12 = i.1_1 + 1;
goto ; [100.00%]
This is because the compiler (righfully) decides that on the dereference should
only happen in the branch that needs it. Because of this block 4 is no longer
if-convertable. However since the load is to the stack we are allowed to move
it around technically speaking. But this is only beneficial is the branch is
really unpredictable.
To show that it is the sinking, we can avoid it by using -fdisable-tree-sink2
and this example:
size_t bsearch2(size_t high, size_t low, size_t *iv, int *resv) {
while (low < high) {
int res = cmp (iv);
size_t i = *iv;
low = res > 0 ? i + 1 : low;
}
return -1;
}
which generates the csinc.
However the following example again doesn't generate the csinc for GCC but does
for LLVM:
size_t bsearch3(size_t high, size_t low, size_t i, int res) {
while (low < high) {
asm volatile ("" : "=w"(i));
asm volatile ("" : "=w"(res));
low = res > 0 ? i + 1 : low;
}
return -1;
}
which is because of the change header pass. GCC optimizes this scalar code
from:
size_t bsearch3 (size_t high, size_t low, size_t i, int res)
{
size_t iftmp.0_7;
[local count: 59055799]:
goto ; [100.00%]
[local count: 1014686024]:
__asm__ __volatile__("" : "=w" i_5);
__asm__ __volatile__("" : "=w" res_6);
if (res_6 > 0)
goto ; [5.50%]
else
goto ; [94.50%]
[local count: 55807731]:
iftmp.0_7 = i_5 + 1;
[local count: 114863530]:
# low_1 = PHI
[local count: 1073741824]:
if (low_1 < high_3(D))
goto ; [94.50%]
else
goto ; [5.50%]
[local count: 59055800]:
return 18446744073709551615;
}
into
size_t bsearch3 (size_t high, size_t low, size_t i, int res)
{
size_t iftmp.0_7;
[local count: 59055799]:
if (low_2(D) < high_3(D))
goto ; [48.59%]
else
goto ; [51.41%]
[local count: 958878294]:
__asm__ __volatile__("" : "=w" i_5);
__asm__ __volatile__("" : "=w" res_6);
if (res_6 > 0)
goto ; [5.50%]
else
goto ; [94.50%]
[local count: 55807731]:
# i_10 = PHI
iftmp.0_7 = i_10 + 1;
if (high_3(D) > iftmp.0_7)
goto ; [48.59%]
else
goto ; [51.41%]
[local count: 55807730]:
__asm__ __volatile__("" : "=w" i_8);
__asm__ __volatile__("" : "=w" res_9);
if (res_9 > 0)
goto ; [5.50%]
else
goto ; [94.50%]
[local count: 59055800]:
return 18446744073709551615;
}
which then hits the limitation of the single pred I mentioned before.
using -fdisable-tree-ch2 we get:
add x2, x2, 1
cselx1,