> Can you add a testcase for this? I don't mind if it's x86 specific and > does a bit of asm scanning. > > Also note that the context for this patch has changed, so it won't > automatically apply. So be extra careful when updating so that it goes > into the right place (all the more reason to have a testcase validating > that the optimization works correctly). > > > I think the patch itself is fine. So further review is just for the > testcase and should be easy. rebased and add a testcase.
Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}. Ok for trunk? When mask is (1 << (prec - imm) - 1) which is used to clear upper bits of A, then it can be simplified to LSHIFTRT. i.e Simplify (and:v8hi (ashifrt:v8hi A 8) (const_vector 0xff x8)) to (lshifrt:v8hi A 8) gcc/ChangeLog: PR target/114428 * simplify-rtx.cc (simplify_context::simplify_binary_operation_1): Simplify (AND (ASHIFTRT A imm) mask) to (LSHIFTRT A imm) for specific mask. gcc/testsuite/ChangeLog: * gcc.target/i386/pr114428-1.c: New test. --- gcc/simplify-rtx.cc | 25 ++++++++++++++ gcc/testsuite/gcc.target/i386/pr114428-1.c | 39 ++++++++++++++++++++++ 2 files changed, 64 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr114428-1.c diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc index 5caf1dfd957..05d410898b3 100644 --- a/gcc/simplify-rtx.cc +++ b/gcc/simplify-rtx.cc @@ -4050,6 +4050,31 @@ simplify_context::simplify_binary_operation_1 (rtx_code code, return tem; } + /* (and:v4si + (ashiftrt:v4si A 16) + (const_vector: 0xffff x4)) + is just (lshiftrt:v4si A 16). */ + if (VECTOR_MODE_P (mode) && GET_CODE (op0) == ASHIFTRT + && (CONST_INT_P (XEXP (op0, 1)) + || (GET_CODE (XEXP (op0, 1)) == CONST_VECTOR + && CONST_VECTOR_DUPLICATE_P (XEXP (op0, 1)))) + && GET_CODE (op1) == CONST_VECTOR + && CONST_VECTOR_DUPLICATE_P (op1)) + { + unsigned HOST_WIDE_INT shift_count + = (CONST_INT_P (XEXP (op0, 1)) + ? UINTVAL (XEXP (op0, 1)) + : UINTVAL (XVECEXP (XEXP (op0, 1), 0, 0))); + unsigned HOST_WIDE_INT inner_prec + = GET_MODE_PRECISION (GET_MODE_INNER (mode)); + + /* Avoid UD shift count. */ + if (shift_count < inner_prec + && (UINTVAL (XVECEXP (op1, 0, 0)) + == (HOST_WIDE_INT_1U << (inner_prec - shift_count)) - 1)) + return simplify_gen_binary (LSHIFTRT, mode, XEXP (op0, 0), XEXP (op0, 1)); + } + tem = simplify_byte_swapping_operation (code, mode, op0, op1); if (tem) return tem; diff --git a/gcc/testsuite/gcc.target/i386/pr114428-1.c b/gcc/testsuite/gcc.target/i386/pr114428-1.c new file mode 100644 index 00000000000..927476f2269 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr114428-1.c @@ -0,0 +1,39 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -msse2" } */ +/* { dg-final { scan-assembler-times "psrlw" 1 } } */ +/* { dg-final { scan-assembler-times "psrld" 1 } } */ +/* { dg-final { scan-assembler-times "psrlq" 1 { target { ! ia32 } } } } */ + + +#define SHIFTC 12 + +typedef int v4si __attribute__((vector_size(16))); +typedef short v8hi __attribute__((vector_size(16))); +typedef long long v2di __attribute__((vector_size(16))); + +v8hi +foo1 (v8hi a) +{ + return + (a >> (16 - SHIFTC)) & (__extension__(v8hi){(1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1, + (1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1}); +} + +v4si +foo2 (v4si a) +{ + return + (a >> (32 - SHIFTC)) & (__extension__(v4si){(1<<SHIFTC)-1, (1<<SHIFTC)-1, + (1<<SHIFTC)-1,(1<<SHIFTC)-1}); +} + +v2di +__attribute__((target("avx512vl"))) +foo3 (v2di a) +{ + return + (a >> (long long)(64 - SHIFTC)) & (__extension__(v2di){(1ULL<<SHIFTC)-1, + (1ULL<<SHIFTC)-1}); +} -- 2.31.1