在 2025/12/4 22:23, Richard Biener 写道:
On Thu, Dec 4, 2025 at 9:56 AM Dongyan Chen
<[email protected]> wrote:
Hi,
Following the previous discussion, I do some implemention in expr.cc,
This location allows access to tree-level type information while still
enabling queries to target-specific costs. However, I have some concerns
regarding the cost comparison logic.I am currently comparing the cost
of the multiplication directly against the sum of the decomposed
logical operations.
Does this cost heuristic seem reasonable to you?

Thanks and regards,
Dongyan

This patch implements an optimization to transform (a * b) == 0 to
(a == 0) || (b == 0) and (a * b != 0) to (a != 0) && (b != 0)
for signed and unsigned integer.

         PR target/122935

+  machine_mode mode = TYPE_MODE (type);
+  rtx reg = gen_raw_REG (mode, LAST_VIRTUAL_REGISTER + 1);
+  rtx mult_rtx = gen_rtx_MULT (mode, reg, reg);
+  int mult_cost = set_src_cost (mult_rtx, mode, speed_p);
+
+  int logic_cost = 0;
+  int cmp_cost = 0;
+  int logic_op_cost = 0;
+
+  if (comp_code == EQ_EXPR)
+    {
+      rtx eq_rtx = gen_rtx_EQ (mode, reg, const0_rtx);
+      cmp_cost = set_src_cost (eq_rtx, mode, speed_p);
+      rtx ior_rtx = gen_rtx_IOR (mode, reg, reg);
+      logic_op_cost = set_src_cost (ior_rtx, mode, speed_p);
+      logic_cost = 2 * cmp_cost + logic_op_cost;
+    }
+  else /* NE_EXPR */
+    {
+      rtx ne_rtx = gen_rtx_NE (mode, reg, const0_rtx);
+      cmp_cost = set_src_cost (ne_rtx, mode, speed_p);
+      rtx and_rtx = gen_rtx_AND (mode, reg, reg);
+      logic_op_cost = set_src_cost (and_rtx, mode, speed_p);
+      logic_cost = 2 * cmp_cost + logic_op_cost;
+    }
Can you check what AVR does for the above?  Esp. when
mode is bigger than word_mode.


I tested the patch on AVR as requested.

With -O3, the optimization triggers, replacing the slow __mulsi3 library call with the faster logical check sequence. With -Oz, the optimization is correctly rejected to prioritize code size.

This indicates that the cost model query via set_src_cost is working as intended.

I am currently reviewing the other comments and will address them in the next patch. It might take a little time.

```c

bool foo1(int32_t a, int32_t b) { return ((int32_t)a * b) == 0; }


```

``` -O3
foo1:
    push r28
    push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
    mov r28,r22
    mov r29,r23
    mov r30,r24
    mov r31,r25
    ldi r25,lo8(1)
    or r28,r29
    or r28,r30
    or r28,r31
    breq .L2
    ldi r25,0
.L2:
    ldi r24,lo8(1)
    or r18,r19
    or r18,r20
    or r18,r21
    breq .L3
    ldi r24,0
.L3:
    or r24,r25
/* epilogue start */
    pop r29
    pop r28
    ret
    .size   foo1, .-foo1

```

```-Oz
foo1:
    push r28
    push r29
/* prologue: function */
/* frame size = 0 */
/* stack size = 2 */
.L__stack_usage = 2
    mov r28,r22
    mov r29,r23
    mov r30,r24
    mov r31,r25
    mov r22,r18
    mov r23,r19
    mov r24,r20
    mov r25,r21
    mov r18,r28
    mov r19,r29
    mov r20,r30
    mov r21,r31
    rcall __mulsi3
    mov r20,r22
    mov r21,r23
    mov r22,r24
    mov r23,r25
    ldi r24,lo8(1)
    or r20,r21
    or r20,r22
    or r20,r23
    breq .L2
    ldi r24,0
.L2:
/* epilogue start */
    pop r29
    pop r28
    ret
    .size   foo1, .-foo1

```

Thanks,

Dongyan



Reply via email to