On 10/23/2025 10:43 AM, Nikita Biryukov wrote:
While investigating Zicond extension code generation on RISC-V, I identified
several cases where GCC (trunk) generates suboptimal code due to premature 
if-conversion.

Consider the following test case:
CFLAGS: -march=rv64gc_zicond -mabi=lp64d -O2

int test_IOR_ceqz_x (int x, int z, int c)
{
   if (c)
     x = x | z;
   return x;
}

Before the patch:
   or      a1,a0,a1
   czero.eqz a1,a1,a2
   czero.nez a0,a0,a2
   add     a0,a0,a1
   ret

The issue occurs when ifcvt encounters the following RTL pattern:
   (set reg1 (ior:DI (reg2:DI) (reg3:DI)))
   (set reg4 (sign_extend:DI (subreg:SI (reg1:DI))))

When reg1 is no longer used, this expression could be simplified. However,
noce_convert_multiple_sets converts the block early, preventing combine from
optimizing the pattern.

This patch adds checks to bb_ok_for_noce_convert_multiple_sets to detect
such sign/zero extension patterns and reject noce_convert_multiple_sets when
combine has not yet run. This allows combine to simplify the expressions,
resulting in better code generation during the second ifcvt pass.

To minimize false positives, the additional checks only apply before the
combine pass.

Generated code for test_IOR_ceqz_x after the patch:
   czero.eqz a2,a1,a2
   or       a0,a0,a2
   ret

The patch has been bootstrapped and tested on riscv64-unknown-linux-gnu.

gcc/
        * ifcvt.cc (noce_extended_and_dead_set_p): New function.
        (bb_ok_for_noce_convert_multiple_sets): Use 
noce_extended_and_dead_set_p.

gcc/testsuite/
        * gcc.target/riscv/zicond_ifcvt_opt_int.c: New test.
So this feels fairly hackish to me.  I don't think we have any data that says this particular class of extensions will typically be eliminated.  And elimination would depend on ABI requirements as well as target behavior.  It's also the case that this is fairly sensitive to targets were we implicitly promote to WORD_MODE because the target doesn't have sub-word logical operations.

I suspect, but have not confirmed that combine is able to eliminate the extension due to it realizing the two inputs are already sign extended and the result will necessarily be sign extended already and the explicit sign extension is redundant.

That points at another approach, specifically can we eliminate the extension earlier, never generate it to begin with, or generate it at a different location.  fwprop seems like a potential candidate as it will try to simplify sign extension of this object:

(subreg:SI (ior:DI (reg/v:DI 135 [ a ])
        (reg/v:DI 136 [ b ])) 0)

Unfortunately num_sign_bit_copies for those objects is not returning anything useful in this context.

I want to think about this a bit more.  It really feels like we should have  a better solution than special casing this in ifcvt.

Jeff

Reply via email to