https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585

--- Comment #19 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <[email protected]>:

https://gcc.gnu.org/g:576fead3848b9288a914d9d10333de36ec791a57

commit r17-1855-g576fead3848b9288a914d9d10333de36ec791a57
Author: Milan Tripkovic <[email protected]>
Date:   Thu Jun 25 07:34:14 2026 -0600

    [RISC-V][PR target/123884] Generate more Zbs instructions for RISC-V

    So between Milan's work and my own realization earlier this week, we can
    finally get resolution on pr123884 and pr106585.

    The core issue is for what appear to be natural looking single bit
    manipulations we often struggle to generate the bset, bclr or binv on
riscv64.

    A few items get in the way.  First, rv64 (of course) promotes sub-word
objects
    (ie ints) into word sized objects.  Second, we do have *some* instructions
that
    work on sub-objects (such as shifts which sign extend out to 64 bits). 
Third,
    a 32-bit object on rv64 is supposed to be sign extended out to 64 bits when
it
    "escapes" or gets used in a comparison.  Finally, the Zbs instructions do
not
    have forms which sign extend out to 64 bits if the SI sign bit is changed.

    Consider this reduced fragment from x264:

    int x264_macroblock_encode_p8x8(int dst, int a)
    {
        dst &= ~(1 << a);
        return dst;
    }

    The code going into ext-dce looks like this:

    > (insn 7 4 8 2 (set (reg:SI 141)
    >         (const_int 1 [0x1])) "j.c":3:16 276 {*movsi_internal}
    >      (nil))
    > (insn 8 7 10 2 (set (reg:DI 142)
    >         (sign_extend:DI (ashift:SI (reg:SI 141)
    >                 (subreg:QI (reg/v:DI 138 [ a ]) 0)))) "j.c":3:16 312
{ashlsi3_extend}
    >      (expr_list:REG_DEAD (reg:SI 141)
    >         (expr_list:REG_DEAD (reg/v:DI 138 [ a ])
    >             (expr_list:REG_EQUAL (sign_extend:DI (ashift:SI (const_int 1
[0x1])
    >                         (subreg:QI (reg/v:DI 138 [ a ]) 0)))
    >                 (nil)))))
    > (insn 10 8 12 2 (set (reg:DI 143)
    >         (not:DI (reg:DI 142))) "j.c":3:12 113 {one_cmpldi2}
    >      (expr_list:REG_DEAD (reg:DI 142)
    >         (nil)))
    > (insn 12 10 13 2 (set (reg:DI 145)
    >         (and:DI (reg/v:DI 137 [ dst ])
    >             (reg:DI 143))) "j.c":3:9 104 {*anddi3}
    >      (expr_list:REG_DEAD (reg:DI 143)
    >         (expr_list:REG_DEAD (reg/v:DI 137 [ dst ])
    >             (nil))))
    > (insn 13 12 18 2 (set (reg:DI 146)
    >         (sign_extend:DI (subreg:SI (reg:DI 145) 0))) "j.c":4:12 discrim 1
125 {*extendsidi2_internal}
    >      (expr_list:REG_DEAD (reg:DI 145)
    >         (nil)))
    That'll end up generating something like:

    >         li      a5,1            # 7  [c=4 l=4]  *movsi_internal/1
    >         sllw    a1,a5,a1        # 8     [c=8 l=4] ashlsi3_extend
    >         not     a1,a1   # 10    [c=4 l=4]  one_cmpldi2
    >         and     a0,a0,a1        # 18    [c=4 l=4]  *anddi3/0

    That seems almost tailor made for bclr.  Except for the possibility that
we're
    clearing bit 31.  THe RTL above would clear bits 31..63.  bclr just clears
one
    bit.

    ext-dce comes along and realizes that insn 13 is redundant and removes it. 
But
    we still have the problem that the RTL would clear bits 31..63 if shift
count
    in (reg:DI 138) is 31 and thus trying to combine the result and generate a
bclr
    is wrong.

    We had the idea that we could follow the bclr with a sign extension.  That
    works for this testcase, but is wrong in general.

    The insight from earlier this week is the semantics of bclr+sext work when
    (reg:DI 137) has 33 or more sign bit copies.  In that limited, but common,
case
    the semantics of bclr+sext match the RTL exactly.  Just as important the
output
    is 2 insns, so we can model it as a simple define_split rather than a
    define_insn_and_split.

    So I've adjusted Milan's patch to check the number of sign bit copies in
the
    object where we want to clear a single bit.  If the number of sign bit
copies
    is 33 or more, then we allow the splitter to trigger.  The net is we get:

    >         bclr    a0,a0,a1        # 10 [c=4 l=4]  *bclrdi
    >         sext.w  a0,a0   # 18    [c=4 l=4] *extendsidi2_internal/0

    The same idea works for bset and binv.

    This has been bootstrapped and regression tested on both the c920 (where it
    should do nothing, no Zbs extension) and the K3 (where I turn on Zbs by
    default).  It's also been regression tested on riscv32-elf and riscv64-elf.
    I'll obviously wait for pre-commit CI to run before moving forward.  But I
    think we've finally got this issue nailed down.

            PR target/123884
            PR target/106585
    gcc/
            * config/riscv/bitmanip.md: Add new splits for single bit
manipulation
            cases followed by a sign extension.

    gcc/testsuite
            * gcc.target/riscv/pr123884-a.c: New test.
            * gcc.target/riscv/pr123884-b.c: New test.
            * gcc.target/riscv/pr123884-c.c: New test.

    Co-authored-by: Jeff Law  <[email protected]>

Reply via email to