[Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)

2024-06-17 Thread Explorer09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

--- Comment #3 from Kang-Che Sung  ---
I'm not sure if this helps, but the idea is to recognize three-way comparison
as a special case.

My code was originally written in this ordering:

```c
if (x < c) {
  do_action_a();
} else if (x == c) {
  do_action_b();
} else {
  do_action_c();
}
```

But it should work no differently from this:

```c
if (x == c) {
  do_action_b();
} else if (x < c) {
  do_action_a();
} else {
  do_action_c();
}
```

Or this:

```c
if (x == c) {
  do_action_b();
} else if (x <= c) {
  do_action_a();
} else {
  do_action_c();
}
```

Or even this:

```c
if (x >= c) {
  if (x == c) {
do_action_b();
  } else {
do_action_c();
  }
} else {
  do_action_a();
}
```

[Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)

2024-06-17 Thread ubizjak at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

--- Comment #2 from Uroš Bizjak  ---
For the original testcase tree optimizers optimize to:

   [local count: 114863530]:
  _30 = _2 & 240;
  if (_30 == 224)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 75809929]:
  if (_30 <= 223)
goto ; [50.00%]
  else
goto ; [50.00%]


and for /* NOTE 1 */ workaround:

   [local count: 114863530]:
  _30 = _2 & 240;
  if (_30 == 224)
goto ; [34.00%]
  else
goto ; [66.00%]

   [local count: 75809929]:
  if (_30 <= 224)
goto ; [50.00%]
  else
goto ; [50.00%]


If the tree optimizer didn't over-optimize the original case and left:

   [local count: 75809929]:
  if (_30 < 224)
goto ; [50.00%]
  else
goto ; [50.00%]

then RTL CSE2 pass would be able to merge:

(insn 31 30 32 4 (set (reg:CCZ 17 flags)
(compare:CCZ (reg:QI 111 [ _30 ])
(const_int -32 [0xffe0]))) "pr115506.c":11:8 9
{*cmpqi_1}
 (nil))

and

(insn 36 33 37 5 (set (reg:CC 17 flags)
(compare:CC (reg:QI 111 [ _30 ])
(const_int -33 [0xffdf]))) "pr115506.c":14:15 9
{*cmpqi_1}
 (expr_list:REG_DEAD (reg:QI 111 [ _30 ])
(nil)))

Is there a way to avoid the over-optimization with tree optimizers? RTL part
has no way to update the flags user during CSE2 pass:

(jump_insn 37 36 38 5 (set (pc)
(if_then_else (gtu (reg:CC 17 flags)
(const_int 0 [0]))
(label_ref:DI 90)
(pc))) "pr115506.c":14:15 1130 {*jcc}
 (expr_list:REG_DEAD (reg:CC 17 flags)
(int_list:REG_BR_PROB 536870916 (nil)))

[Bug rtl-optimization/115506] Possible but missed "cmp" instruction merging (x86 & ARM, optimization)

2024-06-16 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115506

Richard Biener  changed:

   What|Removed |Added

  Component|middle-end  |rtl-optimization

--- Comment #1 from Richard Biener  ---
On GIMPLE this cannot be realized since we do not represent condition codes so
this is ideally dealt with on RTL.