On Sun, Nov 24, 2024 at 10:02:22PM +0100, Uros Bizjak wrote:
> PR target/36503
>
> gcc/ChangeLog:
>
> * config/i386/i386.md (*ashl<mode>3_negcnt):
> New define_insn_and_split pattern.
> (*ashl<mode>3_negcnt_1): Ditto.
> (*<insn><mode>3_negcnt): Ditto.
> (*<insn><mode>3_negcnt_1): Ditto.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/pr36503-1.c: New test.
> * gcc.target/i386/pr36503-2.c: New test.
> +(define_insn_and_split "*ashl<mode>3_negcnt"
> + [(set (match_operand:SWI48 0 "nonimmediate_operand")
> + (ashift:SWI48
> + (match_operand:SWI48 1 "nonimmediate_operand")
> + (subreg:QI
> + (minus
> + (match_operand 3 "const_int_operand")
> + (match_operand 2 "int248_register_operand" "c,r")) 0)))
> + (clobber (reg:CC FLAGS_REG))]
> + "ix86_binary_operator_ok (ASHIFT, <MODE>mode, operands)
> + && INTVAL (operands[3]) == <MODE_SIZE> * BITS_PER_UNIT
Any reason for an exact comparison rather than
&& (INTVAL (operands[3]) & (<MODE_SIZE> * BITS_PER_UNIT - 1)) == 0
?
I mean, we can optimize this way 1U << (32 - x) or
1U << (1504 - x) or any other multiply of 32.
Similarly, we can optimize 1U << (32 + x) to 1U << x and
again do that for any other multiplies of 32.
Jakub