https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585
--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jeff Law <[email protected]>: https://gcc.gnu.org/g:576fead3848b9288a914d9d10333de36ec791a57 commit r17-1855-g576fead3848b9288a914d9d10333de36ec791a57 Author: Milan Tripkovic <[email protected]> Date: Thu Jun 25 07:34:14 2026 -0600 [RISC-V][PR target/123884] Generate more Zbs instructions for RISC-V So between Milan's work and my own realization earlier this week, we can finally get resolution on pr123884 and pr106585. The core issue is for what appear to be natural looking single bit manipulations we often struggle to generate the bset, bclr or binv on riscv64. A few items get in the way. First, rv64 (of course) promotes sub-word objects (ie ints) into word sized objects. Second, we do have *some* instructions that work on sub-objects (such as shifts which sign extend out to 64 bits). Third, a 32-bit object on rv64 is supposed to be sign extended out to 64 bits when it "escapes" or gets used in a comparison. Finally, the Zbs instructions do not have forms which sign extend out to 64 bits if the SI sign bit is changed. Consider this reduced fragment from x264: int x264_macroblock_encode_p8x8(int dst, int a) { dst &= ~(1 << a); return dst; } The code going into ext-dce looks like this: > (insn 7 4 8 2 (set (reg:SI 141) > (const_int 1 [0x1])) "j.c":3:16 276 {*movsi_internal} > (nil)) > (insn 8 7 10 2 (set (reg:DI 142) > (sign_extend:DI (ashift:SI (reg:SI 141) > (subreg:QI (reg/v:DI 138 [ a ]) 0)))) "j.c":3:16 312 {ashlsi3_extend} > (expr_list:REG_DEAD (reg:SI 141) > (expr_list:REG_DEAD (reg/v:DI 138 [ a ]) > (expr_list:REG_EQUAL (sign_extend:DI (ashift:SI (const_int 1 [0x1]) > (subreg:QI (reg/v:DI 138 [ a ]) 0))) > (nil))))) > (insn 10 8 12 2 (set (reg:DI 143) > (not:DI (reg:DI 142))) "j.c":3:12 113 {one_cmpldi2} > (expr_list:REG_DEAD (reg:DI 142) > (nil))) > (insn 12 10 13 2 (set (reg:DI 145) > (and:DI (reg/v:DI 137 [ dst ]) > (reg:DI 143))) "j.c":3:9 104 {*anddi3} > (expr_list:REG_DEAD (reg:DI 143) > (expr_list:REG_DEAD (reg/v:DI 137 [ dst ]) > (nil)))) > (insn 13 12 18 2 (set (reg:DI 146) > (sign_extend:DI (subreg:SI (reg:DI 145) 0))) "j.c":4:12 discrim 1 125 {*extendsidi2_internal} > (expr_list:REG_DEAD (reg:DI 145) > (nil))) That'll end up generating something like: > li a5,1 # 7 [c=4 l=4] *movsi_internal/1 > sllw a1,a5,a1 # 8 [c=8 l=4] ashlsi3_extend > not a1,a1 # 10 [c=4 l=4] one_cmpldi2 > and a0,a0,a1 # 18 [c=4 l=4] *anddi3/0 That seems almost tailor made for bclr. Except for the possibility that we're clearing bit 31. THe RTL above would clear bits 31..63. bclr just clears one bit. ext-dce comes along and realizes that insn 13 is redundant and removes it. But we still have the problem that the RTL would clear bits 31..63 if shift count in (reg:DI 138) is 31 and thus trying to combine the result and generate a bclr is wrong. We had the idea that we could follow the bclr with a sign extension. That works for this testcase, but is wrong in general. The insight from earlier this week is the semantics of bclr+sext work when (reg:DI 137) has 33 or more sign bit copies. In that limited, but common, case the semantics of bclr+sext match the RTL exactly. Just as important the output is 2 insns, so we can model it as a simple define_split rather than a define_insn_and_split. So I've adjusted Milan's patch to check the number of sign bit copies in the object where we want to clear a single bit. If the number of sign bit copies is 33 or more, then we allow the splitter to trigger. The net is we get: > bclr a0,a0,a1 # 10 [c=4 l=4] *bclrdi > sext.w a0,a0 # 18 [c=4 l=4] *extendsidi2_internal/0 The same idea works for bset and binv. This has been bootstrapped and regression tested on both the c920 (where it should do nothing, no Zbs extension) and the K3 (where I turn on Zbs by default). It's also been regression tested on riscv32-elf and riscv64-elf. I'll obviously wait for pre-commit CI to run before moving forward. But I think we've finally got this issue nailed down. PR target/123884 PR target/106585 gcc/ * config/riscv/bitmanip.md: Add new splits for single bit manipulation cases followed by a sign extension. gcc/testsuite * gcc.target/riscv/pr123884-a.c: New test. * gcc.target/riscv/pr123884-b.c: New test. * gcc.target/riscv/pr123884-c.c: New test. Co-authored-by: Jeff Law <[email protected]>
