On Mon, Dec 15, 2025 at 8:30 AM Kishan Parmar <[email protected]> wrote:
>
> Hi Andrew,
>
> Thanks for the feedback.
>
> Here is a reduced testcase derived from a bitfield insertion scenario.
>
> static inline unsigned
> rot_insert (unsigned x, unsigned y, unsigned n, unsigned mb, unsigned me)
> {
> if (n)
> x = (x << n) | (x >> (32 - n));
>
> unsigned s = -1;
> if (n)
> s = (s << n) | (s >> (32 - n));
>
> unsigned mask = 0;
> mask += 1U << (31 - mb);
> mask += 1U << (31 - mb);
> mask -= 1U << (31 - me);
> mask -= (mb > me);
>
> if (mask & ~s)
> return 12345 * y;
>
> return (x & mask) | (y & ~mask);
> }
>
> unsigned foo (unsigned x, unsigned y)
> {
> return rot_insert (x, y, 1, 31, 31);
> }
>
> I’ve tested on Power10 and on my local Apple M3 pro, which i assume has Arm
> v8.6-A ISA.
> If we allow constants (remove INTEGER_CST check), we regress on both.
>
> Assembly Comparison
> ————————————
>
> I we include INTEGER_CST check and don’t allow constants, we will have XOR
> AND XOR form.. which will produce below asms.
Note you have the same issue with the following C example before this
patch (just after you get with the original form too):
```
unsigned foo0 (unsigned x, unsigned y)
{
unsigned x1 = x >> 31;
unsigned yy = y & ~1;
return x1 | yy;
}
```
That was something I was trying to point out.
And why having the check for INTEGER_CST is just exchanging one issue
for another. It does not solve the new issue here.
Wait, I think we are losing/not using the SUBREG_PROMOTED_UNSIGNED_P
fact here which is causing the issues really.
We lost it when we did the original 2->7 combine (for foo0 above):
```
Trying 2 -> 7:
2: r121:DI=r128:DI
REG_DEAD r128:DI
REG_EQUIV [ap:DI+0x30]
7: r124:SI=r121:DI#4 0>>0x1f
REG_DEAD r121:DI
Failed to match this instruction:
(set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
(zero_extract:DI (reg:DI 128 [ x+-4 ])
(const_int 32 [0x20])
(const_int 1 [0x1])))
Successfully matched this instruction:
(set (subreg:DI (reg:SI 124 [ x1_2 ]) 0)
(and:DI (lshiftrt:DI (reg:DI 128 [ x+-4 ])
(const_int 31 [0x1f]))
(const_int 4294967295 [0xffffffff])))
```
insn 7 was:
```
(insn 7 4 9 2 (set (reg:SI 124 [ x1_2 ])
(lshiftrt:SI (subreg/s/v:SI (reg/v:DI 121 [ x+-4 ]) 4)
(const_int 31 [0x1f]))) "/app/example.cpp":30:12 292 {lshrsi3}
(expr_list:REG_DEAD (reg/v:DI 121 [ x+-4 ])
(nil)))
```
insn 2 was just move.
Note /s/v on the subreg there.
If we didn't lose the SUBREG_PROMOTED_UNSIGNED_P, then things would just work.
So you need to figure out how to get
So again the INTEGER_CST is just working around an issue down the line
and not really a good idea; in the case of ppc, the losing of the
SUBREG_PROMOTED_UNSIGNED_P.
Thanks,
Andrew Pinski
>
> Power10
>
> foo:
> .LFB1:
> .cfi_startproc
> .localentry foo,1
> rlwimi 4,3,32-31,31,31
> rldicl 3,4,0,32
> blr
>
> Aarch64
>
> foo:
> LFB1:
> bfxil w1, w0, 31, 1
> mov w0, w1
> ret
>
> And if we remove INTEGER_CST check and allow constants, we will have IOR AND
> ANDN form., which produces below asm.
>
> Power10
> +2 extra instructions
> foo:
> .LFB1:
> .cfi_startproc
> .localentry foo,1
> rlwinm 4,4,0,0,30
> srwi 3,3,31
> or 3,3,4
> rldicl 3,3,0,32
> blr
>
> Aarch64:
> Does not match Bitfield extract and insert semantics.
> foo:
> LFB1:
> and w1, w1, -2
> orr w0, w1, w0, lsr 31
> ret
>
>
> For Aarch64, i noticed combine generating below pattern failed to match,
>
> (set (reg/i:SI 0 x0)
> (ior:SI (and:SI (reg:SI 112 [ y ])
> (const_int -2 [0xfffffffffffffffe]))
> (lshiftrt:SI (reg:SI 111 [ x ])
> (const_int 31 [0x1f]))))
>
> For which we can write a pattern for bfxil which can emit “bfxil w0, w112,
> #31, #1” maybe?
>
> But for RS6000, We hit another issue.
> We try to merge 3 instructions,
>
> i1 : (set (reg:SI 126 [ _8 ])
> (lshiftrt:SI (subreg:SI (reg:DI 130 [ x ]) 0)
> (const_int 31 [0x1f])))
>
> i2 : (set (reg:SI 127)
> (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
> (const_int -2 [0xfffffffffffffffe])))
>
> to,
> i3 : (set (reg:SI 124 [ _7 ])
> (ior:SI (reg:SI 126 [ _8 ])
> (reg:SI 127)))
>
> This could have matched just fine, if and only if i1 was the same it was.
> But here source of i1,
>
> (lshiftrt:SI (subreg:SI (reg:DI 130 [ x ]) 0)
> (const_int 31 [0x1f]))
>
> gets transformed to
>
> (subreg:SI (lshiftrt:DI (reg:DI 130 [ x ])
> (const_int 31 [0x1f])) 0).
>
> Because reason being combine performs shift in wider mode.
> And later simplify-rtx tries to simplify further and the rtx we endup having,
> (set (reg:SI 124 [ _7 ])
> (ior:SI (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
> (const_int -2 [0xfffffffffffffffe]))
> (subreg:SI (zero_extract:DI (reg:DI 130 [ x ])
> (const_int 32 [0x20])
> (const_int 1 [0x1])) 0)))
>
> zero_extract, is not allowed in rs6000 backend,
> After we try to revert/replace compound zero_extract to normal basic rtx
> operations,
> we endup having non-matchable code.
> (set (reg:SI 124 [ _7 ])
> (ior:SI (and:SI (subreg:SI (reg:DI 131 [ y ]) 0)
> (const_int -2 [0xfffffffffffffffe]))
> (subreg:SI (and:DI (lshiftrt:DI (reg:DI 130 [ x ])
> (const_int 31 [0x1f]))
> (const_int 4294967295 [0xffffffff])) 0))).
>
> Because the shift logic is now wrapped in (subreg:SI (and:DI (lshiftrt:DI
> ...))),
> the backend patterns for rlwimi (which expect clear SImode operations) no
> longer match.
>
> >> This is the part I am not getting. Specifically the whole `+` part and
> >> which instructions are being combined.
> >>
> >> Is it that we originally had:
> >> A = RRotate(P, N)
> >> T = (A & C)
> >> T1 = (B & ~C)
> >> R = T | T1
> >>
> >> And then we get:
> >> T = lshiftrt (P, N)
> >> T1 = B & ~C
> >> R = T | T1
> >>
> >> Which then is not recognized as inserting P into B starting at N for
> >> the bitmask ?
> >> But can't that show up even without the andn pattern?
> >> Which means there might be a missing rs6000 pattern for this case?
> >>
> >> Thanks,
> >> Andrew
> We do have a pattern for that,
> 4359 (define_insn "*rotlsi3_insert_4"
> 4360 [(set (match_operand:SI 0 "gpc_reg_operand" "=r")
> 4361 (ior:SI (and:SI (match_operand:SI 3 "gpc_reg_operand" "0")
> 4362 (match_operand:SI 4 "const_int_operand" "n"))
> 4363 (lshiftrt:SI (match_operand:SI 1 "gpc_reg_operand" "r")
> 4364 (match_operand:SI 2 "const_int_operand"
> "n"))))]
> 4365 "INTVAL (operands[2]) + exact_log2 (-UINTVAL (operands[4])) == 32"
> 4366 "rlwimi %0,%1,32-%h2,%h2,31"
> 4367 [(set_attr "type" "insert")])
>
> If we have a form XOR-AND-XOR, which gets transformed to above pattern just
> fine via simplify-rtx.cc.
> 4052 /* If we have (xor (and (xor A B) C) A) with C a constant we can
> instead
> 4053 do (ior (and A ~C) (and B C)) which is a machine instruction on
> some
> 4054 machines, and also has shorter instruction path length. */
>
>
> Since simplify-rtx handles the XOR-form correctly, restricting this GIMPLE
> patch to variables seemed the safest path to avoid these specific Combine
> side-effects.
>
> I have attached the combine dumps for reference. I will also work on adding
> the GIMPLE testcases and target-supports entries as you requested.
>
> Thanks,
> Kishan
>
>