[PATCH][ARM] Fix more -Wreturn-type fallout
Hi This patch fixes a couple of more tests that are giving out warnings with -Wreturn-type: - g++.dg/ext/pr57735.C - gcc.target/arm/pr54300.C *** gcc/testsuite/ChangeLog *** 2017-11-10 Sudakshina Das * g++.dg/ext/pr57735.C: Add -Wno-return-type for test. * gcc.target/arm/pr54300.C (main): Add return type and return a value. diff --git a/gcc/testsuite/g++.dg/ext/pr57735.C b/gcc/testsuite/g++.dg/ext/pr57735.C index a8f7d05712c9a823564d58e2a149c7bc356a5e9e..d9fc9e4aa5e4f47a77c2f9266456e274a9f06d68 100644 --- a/gcc/testsuite/g++.dg/ext/pr57735.C +++ b/gcc/testsuite/g++.dg/ext/pr57735.C @@ -2,7 +2,7 @@ /* { dg-require-effective-target arm_arch_v5te_ok } */ /* { dg-require-effective-target arm_arm_ok } */ /* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } {"-mfloat-abi=soft" } } */ -/* { dg-options "-march=armv5te -marm -mtune=xscale -mfloat-abi=soft -O1" } */ +/* { dg-options "-march=armv5te -marm -mtune=xscale -mfloat-abi=soft -O1 -Wno-return-type" } */ typedef unsigned int size_t; __extension__ diff --git a/gcc/testsuite/gcc.target/arm/pr54300.C b/gcc/testsuite/gcc.target/arm/pr54300.C index eb1a74e36cff6d57f7444e1ca34b704c80efbf54..9105e279b331180aed8c5cadef2194cfe5b446ea 100644 --- a/gcc/testsuite/gcc.target/arm/pr54300.C +++ b/gcc/testsuite/gcc.target/arm/pr54300.C @@ -51,6 +51,7 @@ test(unsigned short *_Inp, int32_t *_Out, vst1q_s32( _Out, c ); } +int main() { unsigned short a[4] = {1, 2, 3, 4}; @@ -58,4 +59,5 @@ main() test(a, b, 1, 1, ~0); if (b[0] != 1 || b[1] != 2 || b[2] != 3 || b[3] != 4) abort(); + return 0; }
[PATCH][ARM] Fix test armv8_2-fp16-move-1.c
Hi This patch fixes the test case armv8_2-fp16-move-1.c for arm-none-linux-gnueabihf where 2 of the scan-assembler directives were failing. We now generate less vmov between core and VFP registers. Thus changing those directives to reflect that. Is this ok for trunk? If yes could someone commit it on my behalf? Sudi *** gcc/testsuite/ChangeLog *** 2017-11-16 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler directives. diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index bb4e68f..0ed8560 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c) /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } } */ int test_compare_1 (__fp16 a, __fp16 b)
Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
Hi Kyrill Thanks I have made the change. Sudi From: Kyrill Tkachov Sent: Thursday, November 16, 2017 5:03 PM To: Sudi Das; gcc-patches@gcc.gnu.org Cc: nd; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c Hi Sudi, On 16/11/17 16:37, Sudi Das wrote: > Hi > > This patch fixes the test case armv8_2-fp16-move-1.c for > arm-none-linux-gnueabihf where 2 of the scan-assembler directives were > failing. We now generate less vmov between core and VFP registers. > Thus changing those directives to reflect that. > > Is this ok for trunk? > If yes could someone commit it on my behalf? > > Sudi > > > *** gcc/testsuite/ChangeLog *** > > 2017-11-16 Sudakshina Das > > * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov scan-assembler > directives. > diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index bb4e68f..0ed8560 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c) /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } } */ +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } } */ Some of the moves between core and fp registers were the result of inefficient codegen and in hindsight scanning for them was not very useful. Now that we emit only the required ones I think scanning for the plain vmovs between two S-registers doesn't test anything useful. So can you please just remove the second scan-assembler directive here? Thanks, Kyrill diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index bb4e68f..8c0a53c 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -101,8 +101,7 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c) /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, s[0-9]+} 1 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } } */ -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } */ +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } } */ int test_compare_1 (__fp16 a, __fp16 b)
Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c
Hi Kyrill and Christophe In case of soft fp testing, there are other assembly directives apart from the vmov one which are also failing. The directives probably make more sense in the hard fp context so instead of removing the vmov, I have added the -mfloat-abi=hard option. Is this ok for trunk? If yes could someone post it on my behalf? Sudi *** gcc/testsuite/ChangeLog *** 2017-11-22 Sudakshina Das * gcc.target/arm/armv8_2-fp16-move-1.c: Add -mfloat-abi=hard option. From: Kyrill Tkachov Sent: Monday, November 20, 2017 2:20 PM To: Christophe Lyon Cc: Sudi Das; gcc-patches@gcc.gnu.org; nd; Ramana Radhakrishnan; Richard Earnshaw Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c On 20/11/17 14:14, Christophe Lyon wrote: > Hi, > > On 17 November 2017 at 12:12, Kyrill Tkachov > wrote: >> On 17/11/17 10:45, Sudi Das wrote: >>> Hi Kyrill >>> >>> Thanks I have made the change. >> >> Thanks Sudi, I've committed this on your behalf with r254863. >> >> Kyrill >> >> >>> Sudi >>> >>> >>> >>> From: Kyrill Tkachov >>> Sent: Thursday, November 16, 2017 5:03 PM >>> To: Sudi Das; gcc-patches@gcc.gnu.org >>> Cc: nd; Ramana Radhakrishnan; Richard Earnshaw >>> Subject: Re: [PATCH][ARM] Fix test armv8_2-fp16-move-1.c >>> >>> Hi Sudi, >>> >>> On 16/11/17 16:37, Sudi Das wrote: >>>> Hi >>>> >>>> This patch fixes the test case armv8_2-fp16-move-1.c for >>>> arm-none-linux-gnueabihf where 2 of the scan-assembler directives were >>>> failing. We now generate less vmov between core and VFP registers. >>>> Thus changing those directives to reflect that. >>>> >>>> Is this ok for trunk? >>>> If yes could someone commit it on my behalf? >>>> >>>> Sudi >>>> >>>> >>>> *** gcc/testsuite/ChangeLog *** >>>> >>>> 2017-11-16 Sudakshina Das >>>> >>>> * gcc.target/arm/armv8_2-fp16-move-1.c: Edit vmov >>>> scan-assembler >>>> directives. >>>> >>> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c >>> b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c >>> index bb4e68f..0ed8560 100644 >>> --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c >>> +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c >>> @@ -101,8 +101,8 @@ test_select_8 (__fp16 a, __fp16 b, __fp16 c) >>> /* { dg-final { scan-assembler-times {vselgt\.f16\ts[0-9]+, s[0-9]+, >>> s[0-9]+} 1 } } */ >>> /* { dg-final { scan-assembler-times {vselge\.f16\ts[0-9]+, s[0-9]+, >>> s[0-9]+} 1 } } */ >>> -/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 4 } >>> } */ >>> -/* { dg-final { scan-assembler-times {vmov\.f16\tr[0-9]+, s[0-9]+} 4 } } >>> */ >>> +/* { dg-final { scan-assembler-times {vmov\.f16\ts[0-9]+, r[0-9]+} 2 } } >>> */ >>> +/* { dg-final { scan-assembler-times {vmov\ts[0-9]+, s[0-9]+} 4 } } */ >>> Some of the moves between core and fp registers were the result of >>> inefficient codegen and in hindsight >>> scanning for them was not very useful. Now that we emit only the required >>> ones I think scanning for the plain >>> vmovs between two S-registers doesn't test anything useful. >>> So can you please just remove the second scan-assembler directive here? >>> > You are probably already aware of that: the tests fail on > arm-none-linux-gnueabi/arm-none-eabi > FAIL: gcc.target/arm/armv8_2-fp16-move-1.c scan-assembler-times > vmov\\.f16\\ts[0-9]+, r[0-9]+ 2 (found 38 times) > > but this is not a regression, the previous version of the test had the > same problem. Grrr, that's because the softfp ABI necessitates moves between core and FP registers, so scanning for a particular number of vmovs between them is just not gonna be stable across soft-float ABIs. At this point I'd just remove the scan for vmovs completely as it doesn't seem to check anything useful. A patch to remove that scan for VMOV is pre-approved. Thanks for reminding me of this Christophe, softfp tends to slip my mind :( Kyrill > Christophe > >>> Thanks, >>> Kyrill >>> >>> >> diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c index 8c0a53c..167286d 100644 --- a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c +++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-1.c @@ -1,6 +1,6 @@ /* { dg-do compile } */ /* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */ -/* { dg-options "-O2" } */ +/* { dg-options "-O2 -mfloat-abi=hard" } */ /* { dg-add-options arm_v8_2a_fp16_scalar } */ __fp16
[COMMITTED, AARCH64, SVE TESTSUITE] Remove a couple of xfail from slp_5.c
Hi Since Richard Biener's commit from 16th May: commit 2f05c3c7324cd293b7b2ba197e0a88d9749361cc Author: rguenth Date: Wed May 16 13:08:04 2018 + the test case slp_5.c had started showing a couple of XPASS failures. XPASS: gcc.target/aarch64/sve/slp_5.c -march=armv8.2-a+sve scan-assembler-not \\tld2d\\t XPASS: gcc.target/aarch64/sve/slp_5.c -march=armv8.2-a+sve scan-assembler-times \\tld1d\\t 3 Richard Sandiford helped to confirm that these xfails can now be removed since the said commit fixed the issue that was preventing the expected behavior. This patch does the same. Testing done: Only test case change, so only ran the testcase on aarch64-none-linux-gnu. Committed as an obvious fix. Thanks Sudi *** gcc/testsuite/ChangeLog *** 2018-05-18 Sudakshina Das * gcc.target/aarch64/sve/slp_5.c: Remove xfail for tld1d and tld2d.diff --git a/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c b/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c index 7ff12c5..b75edc69e 100644 --- a/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c +++ b/gcc/testsuite/gcc.target/aarch64/sve/slp_5.c @@ -42,11 +42,11 @@ TEST_ALL (VEC_PERM) /* { dg-final { scan-assembler-times {\tld1b\t} 1 } } */ /* { dg-final { scan-assembler-times {\tld1h\t} 2 } } */ /* { dg-final { scan-assembler-times {\tld1w\t} 3 } } */ -/* { dg-final { scan-assembler-times {\tld1d\t} 3 { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-times {\tld1d\t} 3 } } */ /* { dg-final { scan-assembler-not {\tld2b\t} } } */ /* { dg-final { scan-assembler-not {\tld2h\t} } } */ /* { dg-final { scan-assembler-not {\tld2w\t} } } */ -/* { dg-final { scan-assembler-not {\tld2d\t} { xfail *-*-* } } } */ +/* { dg-final { scan-assembler-not {\tld2d\t} } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 4 { xfail *-*-* } } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.h} 4 { xfail *-*-* } } } */ /* { dg-final { scan-assembler-times {\tuaddv\td[0-9]+, p[0-7], z[0-9]+\.b} 2 } } */
Re: [PATCH][GCC] Simplification of 1U << (31 - x)
Sorry about the delayed response but looking at the above discussion, should I conclude that this is a valid tree simplification? I am pasting the diff of the assembly that AArch64 generates with the test case that I added. I see fewer instructions generated with the patch. --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 @@ -24,10 +24,8 @@ str x0, [sp, 8] ldr x0, [sp, 8] mov w1, w0 - mov w0, 63 - sub w0, w0, w1 - mov x1, 1 - lsl x0, x1, x0 + mov x0, -9223372036854775808 + lsr x0, x0, x1 add sp, sp, 16 ret .size f2, .-f2 @@ -39,10 +37,8 @@ str x0, [sp, 8] ldr x0, [sp, 8] mov w1, w0 - mov w0, 63 - sub w0, w0, w1 - mov x1, 1 - lsl x0, x1, x0 + mov x0, -9223372036854775808 + lsr x0, x0, x1 add sp, sp, 16 ret .size f3, .-f3 @@ -52,11 +48,9 @@ f4: sub sp, sp, #16 str w0, [sp, 12] - mov w1, 31 ldr w0, [sp, 12] - sub w0, w1, w0 - mov w1, 1 - lsl w0, w1, w0 + mov w1, -2147483648 + lsr w0, w1, w0 add sp, sp, 16 ret .size f4, .-f4 Thanks Sudi From: Wilco Dijkstra Sent: Thursday, April 13, 2017 1:01 PM To: Richard Biener; Jakub Jelinek Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) Richard Biener wrote: > It is IMHO a valid GIMPLE optimization / canonicalization. > > movabsq $-9223372036854775808, %rax > > so this should then have been generated as 1<<63? > > At some point variable shifts were quite expensive as well.. Yes I don't see a major difference between movabsq and movl $1, %eax salq $63, %rax on my Sandy Bridge, but if the above is faster then that is what the x64 backend should emit - it's 1 byte smaller as well, so probably better in all cases. Wilco
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi Richard I have updated the patch according to your comments. Thanks for pointing it out and sorry for the delay! Sudi 2017-08-07 Sudakshina Das * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New check type for aarch64_simd_valid_immediate. (aarch64_output_simd_general_immediate): New declaration. (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/aarch64-simd.md (orr3): modified pattern to add support for ORR-immediate. (and3): modified pattern to add support for BIC-immediate. * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now checks for valid immediate for BIC and ORR based on new enum argument. (aarch64_output_simd_general_immediate): New function to output new BIC/ORR. * config/aarch64/constraints.md (Do): New vector immediate constraint. (Db): Likewise. 2017-08-07 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise. From: Richard Earnshaw (lists) Sent: Friday, May 5, 2017 2:30 PM To: Sudi Das; gcc-patches@gcc.gnu.org Cc: nd; Marcus Shawcroft; James Greenhalgh Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On 18/04/17 17:39, Sudi Das wrote: > > Hello all > > This patch adds the support for BIC (vector, immediate) and ORR (vector, > immediate) SIMD patterns to the AArch64 backend. > One of the examples of this is : (with -O2 -ftree-vectorize) > > void > bic_s (short *a) > { > for (int i = 0; i < 1024; i++) > a[i] &= ~(0xff); > } > > which now produces : > bic_s: > add x1, x0, 2048 > .p2align 2 > .L2: > ldr q0, [x0] > bic v0.8h, #255 > str q0, [x0], 16 > cmp x1, x0 > bne .L2 > ret > > instead of > bic_s: > movi v1.8h, 0xff, lsl 8 > add x1, x0, 2048 > .p2align 2 > .L2: > ldr q0, [x0] > and v0.16b, v0.16b, v1.16b > str q0, [x0], 16 > cmp x1, x0 > bne .L2 > ret > > Added new tests and checked for regressions on bootstrapped > aarch64-none-linux-gnu > Ok for stage 1? > > Thanks > Sudi > > 2017-04-04 Sudakshina Das > > * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New >check type > for aarch64_simd_valid_immediate. > (aarch64_output_simd_general_immediate): New declaration. > (aarch64_simd_valid_immediate): Update prototype. > > * config/aarch64/aarch64-simd.md (*bic_imm_3): New pattern. > (*ior_imm_3): Likewise. > > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function >now checks > for valid immediate for BIC and ORR based on new enum argument. > (aarch64_output_simd_general_immediate): New function to output new >BIC/ORR. > > * config/aarch64/predicates.md (aarch64_simd_valid_bic_imm_p) : New. > (aarch64_simd_valid_orr_imm_p) : Likewise. > > 2017-04-04 Sudakshina Das > > * gcc.target/aarch64/bic_imm_1.c: New test. > * gcc.target/aarch64/orr_imm_1.c: Likewise. > > > patch-7260-2.diff > > > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index 9543f8c..89cc455 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -297,6 +297,15 @@ enum aarch64_parse_opt_result > AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ > }; > > +/* Enum to distinguish which type of check is to be done in > + aarch64_simd_valid_immediate. This is used as a bitmask where CHECK_ALL > + has both bits set. Adding new types would require changes accordingly. > */ > +enum simd_immediate_check { > + CHECK_I = 1, /* Perform only non-inverted immediate checks (ORR). */ > + CHECK_NI = 2, /* Perform only inverted immediate checks (BIC). */ > + CHECK_ALL = 3 /* Perform all checks (MOVI/MNVI). */ > +}; > + > extern struct tune_params aarch64_tune_params; > > HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); > @@ -334,6 +343,8 @@ rtx aarch64_reverse_mask (enum machine_mode); > bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT); > char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode); > char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned); > +char *aarch64_output_simd_general_immediate (rtx, machine_mode, unsigned, > + const char*); > bool aarch64_pad_arg_upward (machi
[PATCH][AArch64] Fix missing optimization for CMP+AND
Hi all During combine GCC tries to merge CMP (with zero) and AND into a TST. However, in cases where an ANDS operand is not compatible, this was being missed. Adding a define_split where this operand was moved to a register seems to help out. For example for a test : int f (unsigned char *p) { return p[0] == 50 || p[0] == 52; } int g (unsigned char *p) { return (p[0] >> 4 & 0xfd) == 0; } we are now generating f: ldrbw0, [x0] mov w1, 253 sub w0, w0, #50 tst w0, w1 csetw0, eq ret .size f, .-f .align 2 .p2align 3,,7 .global g .type g, %function g: ldrbw1, [x0] mov w0, 13 tst w0, w1, lsr 4 csetw0, eq ret instead of f: ldrbw0, [x0] sub w0, w0, #50 and w0, w0, -3 and w0, w0, 255 cmp w0, 0 csetw0, eq ret .size f, .-f .align 2 .p2align 3,,7 .global g .type g, %function g: ldrbw0, [x0] lsr w0, w0, 4 and w0, w0, -3 cmp w0, 0 csetw0, eq ret Added this new test and checked for regressions on bootstrapped aarch64-none-linux-gnu Ok for stage 1? Thanks Sudi 2017-03-17 Kyrylo Tkachov Sudakshina Das * config/aarch64/aarch64.md (define_split for and3nr_compare0): Move non aarch64_logical_operand to a register. (define_split for and_3nr_compare0): Move non register immediate operand to a register. * config/aarch64/predicates.md (aarch64_mov_imm_operand): New. 2017-03-17 Kyrylo Tkachov Sudakshina Das * gcc.target/aarch64/tst_imm_split_1.c: New Test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5adc5ed..5e5dbff 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3881,6 +3881,22 @@ [(set_attr "type" "logics_reg,logics_imm")] ) +(define_split + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ + (and:GPI (match_operand:GPI 0 "register_operand") + (match_operand:GPI 1 "aarch64_mov_imm_operand")) + (const_int 0))) + (clobber (match_operand:SI 2 "register_operand"))] + "" + [(set (match_dup 2) (match_dup 1)) + (set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ + (and:GPI (match_dup 0) + (match_dup 2)) + (const_int 0)))] +) + (define_insn "*and3nr_compare0_zextract" [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ @@ -3916,6 +3932,26 @@ [(set_attr "type" "logics_shift_imm")] ) +(define_split + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ + (and:GPI (SHIFT:GPI + (match_operand:GPI 0 "register_operand") + (match_operand:QI 1 "aarch64_shift_imm_")) + (match_operand:GPI 2 "aarch64_mov_imm_operand")) + (const_int 0))) +(clobber (match_operand:SI 3 "register_operand"))] + "" + [(set (match_dup 3) (match_dup 2)) + (set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ + (and:GPI (SHIFT:GPI + (match_dup 0) + (match_dup 1)) + (match_dup 3)) + (const_int 0)))] +) + ;; --- ;; Shifts ;; --- diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index e83d45b..ed62b8e 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -106,6 +106,10 @@ (ior (match_operand 0 "register_operand") (match_operand 0 "aarch64_logical_immediate"))) +(define_predicate "aarch64_mov_imm_operand" + (and (match_code "const_int") + (match_test "aarch64_move_imm (INTVAL (op), mode)"))) + (define_predicate "aarch64_logical_and_immediate" (and (match_code "const_int") (match_test "aarch64_and_bitmask_imm (INTVAL (op), mode)"))) diff --git a/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c b/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c new file mode 100644 index 000..33a2c0f --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/tst_imm_split_1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options "-O2" } */ + +int +f (unsigned char *p) +{ + return p[0] == 50 || p[0] == 52; +} + +int +g (unsigned char *p) +{ + return (p[0] >> 4 & 0xfd) == 0; +} + +/* { dg-final { scan-assembler-not "and\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+.*" } } */ +/* { dg-final { scan-assembler "tst\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+" } } */ +/* { dg-final { scan-assembler "tst\\t\[xw\]\[0-9\]+, \[xw\]\[0-9\]+, lsr 4" } } */
[PATCH][GCC] Simplification of 1U << (31 - x)
Hi all This is a fix for PR 80131 Currently the code A << (B - C) is not simplified. However at least a more specific case of 1U << (C -x) where C = precision(type) - 1 can be simplified to (1 << C) >> x. This is done by adding a new simplification rule in match.pd So for a test case : unsigned int f1(unsigned int i) { return 1U << (31 - i); } We see a gimple dump of f1 (unsigned int i) { unsigned int D.3121; D.3121 = 2147483648 >> i; return D.3121; } instead of f1 (unsigned int i) { unsigned int D.3121; _1 = 31 - i; D.3121 = 1 << _1; return D.3121; } Add a new test case and checked for regressions on bootstrapped aarch64-none-linux-gnu. Ok for stage 1? Thanks Sudi 2017-03-23 Sudakshina Das PR middle-end/80131 * match.pd: Simplify 1 << (C - x) where C = precision (type) - 1. 2017-03-23 Sudakshina Das PR middle-end/80131 * testsuite/gcc.dg/pr80131-1.c: New Test.diff --git a/gcc/match.pd b/gcc/match.pd index 7b96800..be20fb7 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -508,6 +508,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (lshift @0 @2))) +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @1) @2) + (with +{ tree utype = unsigned_type_for (type); } +(convert:type (rshift (lshift (convert:utype @0) @1) @2)) + /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c new file mode 100644 index 000..2bb6ff3 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80131-1.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options "-fdump-tree-gimple" } */ + +/* Checks the simplification of: + 1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1 + f1 is not simplified but f2, f3 and f4 are. */ + +__INT64_TYPE__ f1 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (31 - i); +} + +__INT64_TYPE__ f2 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (63 - i); +} + +__UINT64_TYPE__ f3 (__INT64_TYPE__ i) +{ + return (__UINT64_TYPE__)1 << (63 - i); +} + +__INT32_TYPE__ f4 (__INT32_TYPE__ i) +{ + return (__INT32_TYPE__)1 << (31 - i); +} + +/* { dg-final { scan-tree-dump-times "= 31 -" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
[PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hello all This patch adds the support for BIC (vector, immediate) and ORR (vector, immediate) SIMD patterns to the AArch64 backend. One of the examples of this is : (with -O2 -ftree-vectorize) void bic_s (short *a) { for (int i = 0; i < 1024; i++) a[i] &= ~(0xff); } which now produces : bic_s: add x1, x0, 2048 .p2align 2 .L2: ldr q0, [x0] bic v0.8h, #255 str q0, [x0], 16 cmp x1, x0 bne .L2 ret instead of bic_s: moviv1.8h, 0xff, lsl 8 add x1, x0, 2048 .p2align 2 .L2: ldr q0, [x0] and v0.16b, v0.16b, v1.16b str q0, [x0], 16 cmp x1, x0 bne .L2 ret Added new tests and checked for regressions on bootstrapped aarch64-none-linux-gnu Ok for stage 1? Thanks Sudi 2017-04-04 Sudakshina Das * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New check type for aarch64_simd_valid_immediate. (aarch64_output_simd_general_immediate): New declaration. (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/aarch64-simd.md (*bic_imm_3): New pattern. (*ior_imm_3): Likewise. * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now checks for valid immediate for BIC and ORR based on new enum argument. (aarch64_output_simd_general_immediate): New function to output new BIC/ORR. * config/aarch64/predicates.md (aarch64_simd_valid_bic_imm_p) : New. (aarch64_simd_valid_orr_imm_p) : Likewise. 2017-04-04 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 9543f8c..89cc455 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -297,6 +297,15 @@ enum aarch64_parse_opt_result AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ }; +/* Enum to distinguish which type of check is to be done in + aarch64_simd_valid_immediate. This is used as a bitmask where CHECK_ALL + has both bits set. Adding new types would require changes accordingly. */ +enum simd_immediate_check { + CHECK_I = 1, /* Perform only non-inverted immediate checks (ORR). */ + CHECK_NI = 2, /* Perform only inverted immediate checks (BIC). */ + CHECK_ALL = 3 /* Perform all checks (MOVI/MNVI). */ +}; + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); @@ -334,6 +343,8 @@ rtx aarch64_reverse_mask (enum machine_mode); bool aarch64_offset_7bit_signed_scaled_p (machine_mode, HOST_WIDE_INT); char *aarch64_output_scalar_simd_mov_immediate (rtx, machine_mode); char *aarch64_output_simd_mov_immediate (rtx, machine_mode, unsigned); +char *aarch64_output_simd_general_immediate (rtx, machine_mode, unsigned, + const char*); bool aarch64_pad_arg_upward (machine_mode, const_tree); bool aarch64_pad_reg_upward (machine_mode, const_tree, bool); bool aarch64_regno_ok_for_base_p (int, bool); @@ -345,7 +356,8 @@ bool aarch64_simd_imm_zero_p (rtx, machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, machine_mode); bool aarch64_simd_shift_imm_p (rtx, machine_mode, bool); bool aarch64_simd_valid_immediate (rtx, machine_mode, bool, - struct simd_immediate_info *); + struct simd_immediate_info *, + enum simd_immediate_check w = CHECK_ALL); bool aarch64_split_dimode_const_store (rtx, rtx); bool aarch64_symbolic_address_p (rtx); bool aarch64_uimm12_shift (HOST_WIDE_INT); diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index c462164..92275dc 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -280,6 +280,26 @@ [(set_attr "type" "neon_logic")] ) +(define_insn "*bic_imm_3" + [(set (match_operand:VDQ_I 0 "register_operand" "=w") + (and:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0") + (match_operand:VDQ_I 2 "aarch64_simd_valid_bic_imm_p" "")))] + "TARGET_SIMD" + { return aarch64_output_simd_general_immediate (operands[2], + mode, GET_MODE_BITSIZE (mode), "bic"); } + [(set_attr "type" "neon_logic")] +) + +(define_insn "*ior_imm_3" + [(set (match_operand:VDQ_I 0 "register_operand" "=w") + (ior:VDQ_I (match_operand:VDQ_I 1 "register_operand" "0") + (match_operand:VDQ_I 2 "aarch64_simd_valid_orr_imm_p" "")))] + "TARGET_SIMD" + { return aarch64_output_simd_general_immediate (operands[2], + mode, GET_MODE_BITSIZE (mode), "orr"); } + [(set_attr "type" "neon_logic")] +) + (define_insn "add3" [(set (match_operand:VDQ_I 0 "register_operand" "=w") (plus:VDQ_I (match_operand:VDQ_I 1 "register_operand" "w") diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 4f769a4..450c42d 100644 ---
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi James I put aarch64_output_simd_general_immediate looking at the similarities of the immediates for mov/mvni and orr/bic. The CHECK macro in aarch64_simd_valid_immediate both checks and converts the immediates in a manner that are needed for the instructions. Having said that, I agree that maybe I could have refactored aarch64_output_simd_mov_immediate to do the work rather than creating a new functions to do similar things. I have done so in this patch. I have also changed the names of the enum simd_immediate_check to be better indicative of what they are doing. Lastly I have added more cases in the tests (according to all the possible CHECKs) and made them dg-do assemble (although I had to add --save-temps so that the scan-assembler would work). Do you think I should not put that option and rather create separate tests? Thanks, Sudi From: James Greenhalgh Sent: Wednesday, September 20, 2017 11:39 AM To: Sudi Das Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On Mon, Aug 07, 2017 at 02:56:09PM +0100, Sudi Das wrote: > > Hi Richard > > I have updated the patch according to your comments. Thanks for pointing it > out and sorry for the delay! Hi Sudi, I've taken a look at your patch - at a high level, I think that adding aarch64_output_simd_general_immediate seems like an over-complication, and I'm not sure I follow why we need some of the logic you add. I think we might be able to really simplify this patch, by adding a new character to aarch64_print_operand for an (optionally) shifted immmediate to bic/orr. > diff --git a/gcc/config/aarch64/aarch64-protos.h > b/gcc/config/aarch64/aarch64-protos.h > index beff28e..5ae73c7 100644 > --- a/gcc/config/aarch64/aarch64-protos.h > +++ b/gcc/config/aarch64/aarch64-protos.h > @@ -308,6 +308,15 @@ enum aarch64_parse_opt_result > AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ > }; > > +/* Enum to distinguish which type of check is to be done in > + aarch64_simd_valid_immediate. This is used as a bitmask where CHECK_ALL > + has both bits set. Adding new types would require changes accordingly. > */ > +enum simd_immediate_check { > + CHECK_I = 1, /* Perform only non-inverted immediate checks (ORR). */ > + CHECK_NI = 2, /* Perform only inverted immediate checks (BIC). */ Are these comments the right way round? If so, why is I non-inverted and NI inverted - those names seem back to front to me. These should probably all be AARCH64_CHECK_* rather than just CHECK_* > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 055ebaf..2cce5af 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -13029,6 +13041,47 @@ aarch64_output_simd_mov_immediate (rtx const_vector, > return templ; > } > > +/* This function is similar to aarch64_output_simd_mov_immediate, used for > + immediate versions of 'bic' or 'orr'. */ > +char* > +aarch64_output_simd_general_immediate (rtx const_vector, > + machine_mode mode, > + unsigned width, You can drop this parameter - it is always GET_MODE_BITSIZE (mode) so just calculate that in the function body. > + const char *mnemonic) > +{ > + bool is_valid; > + static char templ[40]; > + unsigned int lane_count = 0; > + char element_char; > + > + struct simd_immediate_info info = { NULL_RTX, 0, 0, false, false }; > + > + if (strcmp (mnemonic, "orr") == 0) For all the difference it makes, I'd just pass a bool and save save yourself the short string comparison. > + is_valid = aarch64_simd_valid_immediate (const_vector, mode, false, > + &info, CHECK_I); > + else > + is_valid = aarch64_simd_valid_immediate (const_vector, mode, false, > + &info, CHECK_NI); > + > + gcc_assert (is_valid); > + gcc_assert (CONST_INT_P (info.value)); > + element_char = sizetochar (info.element_width); > + lane_count = width / info.element_width; Width is always GET_MODE_BITSIZE (mode), and mode is one of V8QI V16QI V4HI V8HI V2SI V4SI V2DI - so width is either 64 or 128. in the CHECK_I and CHECK_NI cases of aarch64_simd_valid_immediate the elsize (used to set info.element_width) is only ever 16 or 32. That means lane count is either going to be 2, 4 or 8... > + if (lane_count == 1) > + sprintf (templ, "%s\t%%d0, #" HOST_WIDE_INT_PRINT_DEC, > + mnemonic, UINTVAL (info.value)); Which means this case never fires -- Which is a good thing, as my copy of the Arm Architecture Reference Manual does
Re: [PATCH][GCC] Simplification of 1U << (31 - x)
Still waiting on Jakub's comment on whether there are more things needed at the backend. But I have updated the patch according to Richard's comments. Thanks Sudi From: Richard Biener Sent: Friday, August 4, 2017 11:16 AM To: Sudi Das Cc: Wilco Dijkstra; Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) On Tue, Aug 1, 2017 at 11:14 AM, Sudi Das wrote: > > > > > Sorry about the delayed response but looking at the above discussion, should > I conclude that this is a valid tree simplification? Yes, I think so. Jakub requested code to undo this at RTL expansion based on target costs, not sure if we really should require that from you given the user could have written the target sequence himself. Few comments about the patch: +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus INTEGER_CST@1 @2)) I think this warrants a single_use check on the minus (note :s isn't enough as with the unsigned case we'd happily ignore it by design). + (if (INTEGRAL_TYPE_P (type) + && TYPE_PRECISION (type) <= HOST_BITS_PER_WIDE_INT + && tree_to_uhwi (@1) == (unsigned)(TYPE_PRECISION (type) - 1)) You can relax this with using && wi::eq_p (@1, TYPE_PRECISION (type) - 1) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @1) @2) + (with + { tree utype = unsigned_type_for (type); } + (convert:type (rshift (lshift (convert:utype @0) @1) @2)) + You can write (convert (rshift ...)), without the :type. I'm leaving it to Jakub whether you need to write that RTL expansion tweak. Thanks, Richard. > I am pasting the diff of the assembly that AArch64 generates with the test > case that I added. I see fewer instructions generated with the patch. > > --- pr80131-1.s 2017-08-01 10:02:43.243374174 +0100 > +++ pr80131-1.s-patched 2017-08-01 10:00:54.776455630 +0100 > @@ -24,10 +24,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f2, .-f2 > @@ -39,10 +37,8 @@ > str x0, [sp, 8] > ldr x0, [sp, 8] > mov w1, w0 > - mov w0, 63 > - sub w0, w0, w1 > - mov x1, 1 > - lsl x0, x1, x0 > + mov x0, -9223372036854775808 > + lsr x0, x0, x1 > add sp, sp, 16 > ret > .size f3, .-f3 > @@ -52,11 +48,9 @@ > f4: > sub sp, sp, #16 > str w0, [sp, 12] > - mov w1, 31 > ldr w0, [sp, 12] > - sub w0, w1, w0 > - mov w1, 1 > - lsl w0, w1, w0 > + mov w1, -2147483648 > + lsr w0, w1, w0 > add sp, sp, 16 > ret > .size f4, .-f4 > > > Thanks > > Sudi > > > > > From: Wilco Dijkstra > Sent: Thursday, April 13, 2017 1:01 PM > To: Richard Biener; Jakub Jelinek > Cc: Sudi Das; GCC Patches; nd; Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) > > Richard Biener wrote: >> It is IMHO a valid GIMPLE optimization / canonicalization. >> >> movabsq $-9223372036854775808, %rax >> >> so this should then have been generated as 1<<63? >> >> At some point variable shifts were quite expensive as well.. > > Yes I don't see a major difference between movabsq and > > movl $1, %eax > salq $63, %rax > > on my Sandy Bridge, but if the above is faster then that is what the x64 > backend should emit - it's 1 byte smaller as well, so probably better in all > cases. > > Wilco diff --git a/gcc/match.pd b/gcc/match.pd index e9017e4..160c12d 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (lshift @0 @2))) +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3)) + (if (INTEGRAL_TYPE_P (type) + && wi::eq_p (@2, TYPE_PRECISION (type) - 1) + && single_use (@1)) + (if (TYPE_UNSIGNED(type)) + (rshift (lshift @0 @2) @3) + (with +{ tree utype = unsigned_type_for (type); } +(convert (rshift (lshift (convert:utype @0) @2) @3)) + /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/p
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi James I have made the requested changes to the patch. 2017-09-27 Sudakshina Das * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New check type for aarch64_simd_valid_immediate. (aarch64_output_simd_mov_immediate): Update prototype. (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/aarch64-simd.md (orr3): modified pattern to add support for ORR-immediate. (and3): modified pattern to add support for BIC-immediate. * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now checks for valid immediate for BIC and ORR based on new enum argument. (aarch64_output_simd_mov_immediate): Function now used to output BIC/ORR imm as well based on new enum argument. * config/aarch64/constraints.md (Do): New vector immediate constraint. (Db): Likewise. 2017-09-27 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise. Thanks Sudi From: James Greenhalgh Sent: Tuesday, September 26, 2017 8:04:38 PM To: Sudi Das Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote: > > Hi James > > I put aarch64_output_simd_general_immediate looking at the similarities of > the immediates for mov/mvni and orr/bic. The CHECK macro in > aarch64_simd_valid_immediate both checks > and converts the immediates in a manner that are needed for the instructions. > > Having said that, I agree that maybe I could have refactored > aarch64_output_simd_mov_immediate to do the work rather than creating a new > functions to do similar things. I have done so in this patch. Thanks, this looks much neater. > I have also changed the names of the enum simd_immediate_check to be better > indicative of what they are doing. Thanks, I'd tweak them to look more like the bitmasks you use them as, but that is a small change for my personal preference. > Lastly I have added more cases in the tests (according to all the possible > CHECKs) and made them dg-do assemble (although I had to add --save-temps so > that the scan-assembler would work). Do you think I should not put that > option and rather create separate tests? This is good - thanks. I think clean up the enum definitions and this patch will be good. > @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result > AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ > }; > > +/* Enum to distinguish which type of check is to be done in > + aarch64_simd_valid_immediate. This is used as a bitmask where > + AARCH64_CHECK_MOV has both bits set. Thus AARCH64_CHECK_MOV will > + perform all checks. Adding new types would require changes accordingly. > */ > +enum simd_immediate_check { > + AARCH64_CHECK_ORR = 1, /* Perform immediate checks for ORR. */ > + AARCH64_CHECK_BIC = 2, /* Perform immediate checks for BIC. */ > + AARCH64_CHECK_MOV = 3 /* Perform all checks (used for MOVI/MNVI). */ These are used in bit-mask style, so how about: AARCH64_CHECK_ORR = 1 << 0, AARCH64_CHECK_BIC = 1 << 1, AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC Which is more self-documenting. > @@ -13001,7 +13013,8 @@ aarch64_float_const_representable_p (rtx x) > char* > aarch64_output_simd_mov_immediate (rtx const_vector, > machine_mode mode, > - unsigned width) > + unsigned width, > + enum simd_immediate_check which) This function is sorely missing a comment explaining the parameters - it would be very helpful if you could add one as part of this patch. Thanks, James diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index e67c2ed..5d7c5df 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ }; +/* Enum to distinguish which type of check is to be done in + aarch64_simd_valid_immediate. This is used as a bitmask where + AARCH64_CHECK_MOV has both bits set. Thus AARCH64_CHECK_MOV will + perform all checks. Adding new types would require changes accordingly. */ +enum simd_immediate_check { + AARCH64_CHECK_ORR = 1 << 0, + AARCH64_CHECK_BIC = 1 << 1, + AARCH64_CHECK_MOV = AARCH64_CHECK_ORR | AARCH64_CHECK_BIC +}; + extern struct tune_params aarch64_tune_params; HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned); @@ -345,7 +355,8 @@ bool aarch64_mov_operand_p (
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi Richard Thanks, I have made the change to the patch. 2017-10-02 Sudakshina Das * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New check type for aarch64_simd_valid_immediate. (aarch64_output_simd_mov_immediate): Update prototype. (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/aarch64-simd.md (orr3): modified pattern to add support for ORR-immediate. (and3): modified pattern to add support for BIC-immediate. * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function now checks for valid immediate for BIC and ORR based on new enum argument. (aarch64_output_simd_mov_immediate): Function now used to output BIC/ORR imm as well based on new enum argument. * config/aarch64/constraints.md (Do): New vector immediate constraint. (Db) : Likewise. * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New predicate. (aarch64_reg_or_bic_imm): Likewise. 2017-10-02 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise. From: Richard Earnshaw (lists) Sent: Thursday, September 28, 2017 9:55 AM To: Sudi Das; James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On 27/09/17 18:57, Sudi Das wrote: > > > Hi James > > I have made the requested changes to the patch. > > > 2017-09-27 Sudakshina Das > > * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New >check type > for aarch64_simd_valid_immediate. > (aarch64_output_simd_mov_immediate): Update prototype. > (aarch64_simd_valid_immediate): Update prototype. > > * config/aarch64/aarch64-simd.md (orr3): modified pattern to add > support for ORR-immediate. > (and3): modified pattern to add support for BIC-immediate. > > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function >now checks > for valid immediate for BIC and ORR based on new enum argument. > (aarch64_output_simd_mov_immediate): Function now used to output >BIC/ORR imm > as well based on new enum argument. > > * config/aarch64/constraints.md (Do): New vector immediate constraint. > (Db): Likewise. > > 2017-09-27 Sudakshina Das > > * gcc.target/aarch64/bic_imm_1.c: New test. > * gcc.target/aarch64/orr_imm_1.c: Likewise. > > > Thanks > Sudi > > > From: James Greenhalgh > Sent: Tuesday, September 26, 2017 8:04:38 PM > To: Sudi Das > Cc: Richard Earnshaw; gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft > Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern > > On Mon, Sep 25, 2017 at 11:13:57AM +0100, Sudi Das wrote: >> >> Hi James >> >> I put aarch64_output_simd_general_immediate looking at the similarities of >> the immediates for mov/mvni and orr/bic. The CHECK macro in >> aarch64_simd_valid_immediate both checks >> and converts the immediates in a manner that are needed for the instructions. >> >> Having said that, I agree that maybe I could have refactored >> aarch64_output_simd_mov_immediate to do the work rather than creating a new >> functions to do similar things. I have done so in this patch. > > Thanks, this looks much neater. > >> I have also changed the names of the enum simd_immediate_check to be better >> indicative of what they are doing. > > Thanks, I'd tweak them to look more like the bitmasks you use them as, but > that is a small change for my personal preference. > >> Lastly I have added more cases in the tests (according to all the possible >> CHECKs) and made them dg-do assemble (although I had to add --save-temps so >> that the scan-assembler would work). Do you think I should not put that >> option and rather create separate tests? > > This is good - thanks. > > I think clean up the enum definitions and this patch will be good. > >> @@ -308,6 +308,16 @@ enum aarch64_parse_opt_result >> AARCH64_PARSE_INVALID_ARG /* Invalid arch, tune, cpu arg. */ >> }; >> >> +/* Enum to distinguish which type of check is to be done in >> + aarch64_simd_valid_immediate. This is used as a bitmask where >> + AARCH64_CHECK_MOV has both bits set. Thus AARCH64_CHECK_MOV will >> + perform all checks. Adding new types would require changes accordingly. >> */ >> +enum simd_immediate_check { >> + AARCH64_CHECK_ORR = 1, /* Perform immediate checks for ORR. */ >> + AARCH64_CHECK_BIC = 2, /* Perform imm
Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern
Hi Steve Sorry about this. I am on it. I have a fix and I am running tests on it right now. Sudi From: Steve Ellcey Sent: Thursday, October 5, 2017 12:05 AM To: Richard Earnshaw; Sudi Das; James Greenhalgh Cc: gcc-patches@gcc.gnu.org; nd; Marcus Shawcroft Subject: Re: [PATCH][AArch64] Add BIC-imm and ORR-imm SIMD pattern On Wed, 2017-10-04 at 16:41 +0100, Richard Earnshaw (lists) wrote: > On 02/10/17 10:05, Sudi Das wrote: > > > > 2017-10-02 Sudakshina Das > > > > * config/aarch64/aarch64-protos.h (enum simd_immediate_check): New > >check type > > for aarch64_simd_valid_immediate. > > (aarch64_output_simd_mov_immediate): Update prototype. > > (aarch64_simd_valid_immediate): Update prototype. > > > > * config/aarch64/aarch64-simd.md (orr3): modified pattern to add > > support for ORR-immediate. > > (and3): modified pattern to add support for BIC-immediate. > > > > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Function > >now checks > > for valid immediate for BIC and ORR based on new enum argument. > > (aarch64_output_simd_mov_immediate): Function now used to output > >BIC/ORR imm > > as well based on new enum argument. > > > > * config/aarch64/constraints.md (Do): New vector immediate constraint. > > (Db) : Likewise. > > > > * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): New > > predicate. > > (aarch64_reg_or_bic_imm): Likewise. I think this patch is causing a bunch of test failures on aarch64. I had to apply the patch for PR82396 (that was reverted) in order to build ToT GCC, but when I did that and ran the testsuite I got a bunch of failures like: /home/sellcey/cavium-pr-27386/src/gcc/gcc/testsuite/gcc.c- torture/compile/pr54713-1.c:45:18: internal compiler error: in aarch64_simd_valid_immediate, at config/aarch64/aarch64.c:11539 0xf2227b aarch64_simd_valid_immediate(rtx_def*, machine_mode, bool, simd_immediate_info*, simd_immediate_check) ../../../src/gcc/gcc/config/aarch64/aarch64.c:11539 0x11047b3 aarch64_reg_or_bic_imm(rtx_def*, machine_mode) ../../../src/gcc/gcc/config/aarch64/predicates.md:79 0xab29ab insn_operand_matches(insn_code, unsigned int, rtx_def*) ../../../src/gcc/gcc/optabs.c:6891 0xab29ab maybe_legitimize_operand_same_code ../../../src/gcc/gcc/optabs.c:6919 0xab545f maybe_legitimize_operand ../../../src/gcc/gcc/optabs.c:6990 0xab545f maybe_legitimize_operands(insn_code, unsigned int, unsigned int, expand_operand*) ../../../src/gcc/gcc/optabs.c:7055 0xab5a8f maybe_gen_insn(insn_code, unsigned int, expand_operand*) ../../../src/gcc/gcc/optabs.c:7073 0xab8503 expand_binop_directly ../../../src/gcc/gcc/optabs.c:1075 0xab87af expand_binop(machine_mode, optab_tag, rtx_def*, rtx_def*, rtx_def*, int, optab_methods) ../../../src/gcc/gcc/optabs.c:1156 0x8736d7 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) ../../../src/gcc/gcc/expr.c:9582 0x749027 expand_gimple_stmt_1 ../../../src/gcc/gcc/cfgexpand.c:3691 0x749027 expand_gimple_stmt ../../../src/gcc/gcc/cfgexpand.c:3751 0x750387 expand_gimple_basic_block ../../../src/gcc/gcc/cfgexpand.c:5750 0x751ef7 execute ../../../src/gcc/gcc/cfgexpand.c:6357
[PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate
Hi This patch is a fix for PR 82440. The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out on checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate function. Also I think James forgot to add the test cases in the original patch submitted. Testing done : Checked for regressions on bootstrapped aarch64-none-linux-gnu. Ok for trunk? Thanks Sudi The ChangeLog entry is as follows: *** gcc/ChangeLog *** 2017-10-06 Sudakshina Das PR target/82440 * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified. (aarch64_reg_or_bic_imm): Likewise. *** gcc/testsuite/ChangeLog *** 2017-10-06 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise.diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 887a13e..bf23b88 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -71,13 +71,15 @@ (define_predicate "aarch64_reg_or_orr_imm" (ior (match_operand 0 "register_operand") - (match_test "aarch64_simd_valid_immediate (op, mode, false, - NULL, AARCH64_CHECK_ORR)"))) + (and (match_code "const_vector") + (match_test "aarch64_simd_valid_immediate (op, mode, false, + NULL, AARCH64_CHECK_ORR)" (define_predicate "aarch64_reg_or_bic_imm" (ior (match_operand 0 "register_operand") - (match_test "aarch64_simd_valid_immediate (op, mode, false, - NULL, AARCH64_CHECK_BIC)"))) + (and (match_code "const_vector") + (match_test "aarch64_simd_valid_immediate (op, mode, false, + NULL, AARCH64_CHECK_BIC)" (define_predicate "aarch64_fp_compare_operand" (ior (match_operand 0 "register_operand") diff --git a/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c new file mode 100644 index 000..b14f009 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bic_imm_1.c @@ -0,0 +1,56 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ + +/* Each function uses the correspoding 'CLASS' in + Marco CHECK (aarch64_simd_valid_immediate). */ + +void +bic_6 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0xab); +} + +void +bic_7 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0xcd00); +} + +void +bic_8 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0xef); +} + +void +bic_9 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0x1200); +} + +void +bic_10 (short *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0x34); +} + + +void +bic_11 (short *a) +{ + for (int i = 0; i < 1024; i++) +a[i] &= ~(0x5600); +} + + +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #171" } } */ +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #205, lsl #8" } } */ +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #239, lsl #16" } } */ +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.4s, #18, lsl #24" } } */ +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #52" } } */ +/* { dg-final { scan-assembler "bic\\tv\[0-9\]+.8h, #86, lsl #8" } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c b/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c new file mode 100644 index 000..ff6f683 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/orr_imm_1.c @@ -0,0 +1,54 @@ +/* { dg-do assemble } */ +/* { dg-options "-O2 --save-temps -ftree-vectorize" } */ + +/* Each function uses the correspoding 'CLASS' in + Marco CHECK (aarch64_simd_valid_immediate). */ + +void +orr_0 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0xab; +} + +void +orr_1 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0xcd00; +} + +void +orr_2 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0x00ef; +} + +void +orr_3 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0x1200; +} + +void +orr_4 (short *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0x00340034; +} + +void +orr_5 (int *a) +{ + for (int i = 0; i < 1024; i++) +a[i] |= 0x56005600; +} + +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #171" } } */ +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #205, lsl #8" } } */ +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #239, lsl #16" } } */ +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.4s, #18, lsl #24" } } */ +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.8h, #52" } } */ +/* { dg-final { scan-assembler "orr\\tv\[0-9\]+.8h, #86, lsl #8" } } */
Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate
Hi Jakub I have modified the entries: *** gcc/ChangeLog *** 2017-10-05 Sudakshina Das PR target/82440 * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Changed to to only call aarch64_simd_valid_immediate on CONST_VECTORs. (aarch64_reg_or_bic_imm): Likewise. *** gcc/testsuite/ChangeLog *** 2017-10-05 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise.. Thanks Sudi From: Jakub Jelinek Sent: Friday, October 6, 2017 11:11 AM To: Sudi Das Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate On Fri, Oct 06, 2017 at 09:52:35AM +, Sudi Das wrote: > This patch is a fix for PR 82440. > The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out on > checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate > function. > Also I think James forgot to add the test cases in the original patch > submitted. > > Testing done : Checked for regressions on bootstrapped aarch64-none-linux-gnu. > Ok for trunk? > > Thanks > Sudi > > The ChangeLog entry is as follows: > > *** gcc/ChangeLog *** > > 2017-10-06 Sudakshina Das > > PR target/82440 > * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified. > (aarch64_reg_or_bic_imm): Likewise. I'll defer the actual review to aarch64 maintainers, just want to say that this is not a correct ChangeLog entry. You should say what has changed, not just that something has changed. Something like Only call aarch64_simd_valid_immediate on CONST_VECTORs. or similar. Jakub
Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate
Hi Richard *** gcc/ChangeLog *** 2017-10-05 Sudakshina Das PR target/82440 * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Only call aarch64_simd_valid_immediate on CONST_VECTORs. (aarch64_reg_or_bic_imm): Likewise. *** gcc/testsuite/ChangeLog *** 2017-10-05 Sudakshina Das * gcc.target/aarch64/bic_imm_1.c: New test. * gcc.target/aarch64/orr_imm_1.c: Likewise. Also can someone please apply it for me. I do not have commit access. Thanks Sudi From: Richard Earnshaw (lists) Sent: Friday, October 6, 2017 2:01 PM To: Sudi Das; Jakub Jelinek Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; James Greenhalgh Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate On 06/10/17 12:01, Sudi Das wrote: > > Hi Jakub > > I have modified the entries: > > *** gcc/ChangeLog *** > > 2017-10-05 Sudakshina Das > > PR target/82440 > * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Changed to > to only call aarch64_simd_valid_immediate on CONST_VECTORs. You don't need to say 'Changed to' (or even 'Changed to to' :-). Simply say 'Only call ...'. > (aarch64_reg_or_bic_imm): Likewise. > > *** gcc/testsuite/ChangeLog *** > > 2017-10-05 Sudakshina Das > > * gcc.target/aarch64/bic_imm_1.c: New test. > * gcc.target/aarch64/orr_imm_1.c: Likewise.. too many full stops. OK with those nits fixed. R. > > > Thanks > Sudi > > > From: Jakub Jelinek > Sent: Friday, October 6, 2017 11:11 AM > To: Sudi Das > Cc: gcc-patches@gcc.gnu.org; nd; sell...@cavium.com; Marcus Shawcroft; > Richard Earnshaw; James Greenhalgh > Subject: Re: [PATCH][AArch64] Fix ICE caused in aarch64_simd_valid_immediate > > On Fri, Oct 06, 2017 at 09:52:35AM +, Sudi Das wrote: >> This patch is a fix for PR 82440. >> The predicates aarch64_reg_or_bic_imm and aarch64_reg_or_orr_imm missed out >> on >> checking for a CONST_VECTOR before calling aarch64_simd_valid_immediate >> function. >> Also I think James forgot to add the test cases in the original patch >> submitted. >> >> Testing done : Checked for regressions on bootstrapped >> aarch64-none-linux-gnu. >> Ok for trunk? >> >> Thanks >> Sudi >> >> The ChangeLog entry is as follows: >> >> *** gcc/ChangeLog *** >> >> 2017-10-06 Sudakshina Das >> >> PR target/82440 >> * config/aarch64/predicates.md (aarch64_reg_or_orr_imm): Modified. >> (aarch64_reg_or_bic_imm): Likewise. > > I'll defer the actual review to aarch64 maintainers, just want to say that > this is not a correct ChangeLog entry. You should say what has changed, not > just that something has changed. Something like > Only call aarch64_simd_valid_immediate on CONST_VECTORs. > or similar. > > Jakub > >
Re: [PATCH][GCC] Simplification of 1U << (31 - x)
Hi Jakub As per the discussions, I have a created a bug report for the possible regression this may cause. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82454 Sudi From: Wilco Dijkstra Sent: Tuesday, September 26, 2017 2:20 PM To: Sudi Das; Jakub Jelinek Cc: Richard Biener; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) Jakub Jelinek wrote: > Well, we don't want to regress performance wise on one of the most important > primary targets. I don't care that much if the RTL/backend work is done > together with the patch, or as a follow-up during stage1/3, but it should be > done, the testcases I've posted can be used as a basis of a P1 runtime > performance regression. It should be sufficient to file a bug about inefficient 64-bit constant expansions on x64. I didn't see a significant difference in my benchmarking of it on x64, so I'd say it's only a performance regression if large benchmarks regress measurably (quite unlikely). Wilco
Re: [PATCH][GCC] Simplification of 1U << (31 - x)
Thanks, I have made the changes to the patch. Also can someone please apply it for me. I do not have commit access. 2017-10-10 Sudakshina Das PR middle-end/80131 * match.pd: Simplify 1 << (C - x) where C = precision (x) - 1. 2017-10-10 Sudakshina Das PR middle-end/80131 * testsuite/gcc.dg/pr80131-1.c: New Test. With regards to the existing missed optimizations needed to the x86 RTL expansion, I think the discussions can take place on the bug report that I created and maybe someone will pick it up. Thanks Sudi From: Wilco Dijkstra Sent: Monday, October 9, 2017 2:02 PM To: Richard Biener; Sudi Das Cc: Jakub Jelinek; GCC Patches; nd; Richard Earnshaw; James Greenhalgh Subject: Re: [PATCH][GCC] Simplification of 1U << (31 - x) Richard Biener wrote: > I think the patch is ok with these changes but obviously we should try > to address > the code-generation issue on x86 at RTL expansion time. They are sort-of > existing missing optimizations. Note the only x64 specific issue is the backend expansion of 64-bit immediates which could be improved like I suggested. However what we're really missing is a generic optimization pass that tries to simplify immediates using accurate target costs. In eg. x >= C or x > C we can use either C or C-1 - on many targets one option may be a single instruction, while the other might take 2 or even 3. Wilcodiff --git a/gcc/match.pd b/gcc/match.pd index e58a65a..7a25a1b 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -600,6 +600,19 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) && tree_nop_conversion_p (type, TREE_TYPE (@1))) (lshift @0 @2))) +/* Fold (1 << (C - x)) where C = precision(type) - 1 + into ((1 << C) >> x). */ +(simplify + (lshift integer_onep@0 (minus@1 INTEGER_CST@2 @3)) + (if (INTEGRAL_TYPE_P (type) + && wi::eq_p (@2, TYPE_PRECISION (type) - 1) + && single_use (@1)) + (if (TYPE_UNSIGNED (type)) + (rshift (lshift @0 @2) @3) + (with +{ tree utype = unsigned_type_for (type); } +(convert (rshift (lshift (convert:utype @0) @2) @3)) + /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git a/gcc/testsuite/gcc.dg/pr80131-1.c b/gcc/testsuite/gcc.dg/pr80131-1.c new file mode 100644 index 000..0bfe1f4 --- /dev/null +++ b/gcc/testsuite/gcc.dg/pr80131-1.c @@ -0,0 +1,31 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target int32plus } */ +/* { dg-options "-fdump-tree-gimple" } */ + +/* Checks the simplification of: + 1 << (C - x) to (1 << C) >> x, where C = precision (type) - 1 + f1 is not simplified but f2, f3 and f4 are. */ + +__INT64_TYPE__ f1 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (31 - i); +} + +__INT64_TYPE__ f2 (__INT64_TYPE__ i) +{ + return (__INT64_TYPE__)1 << (63 - i); +} + +__UINT64_TYPE__ f3 (__INT64_TYPE__ i) +{ + return (__UINT64_TYPE__)1 << (63 - i); +} + +__INT32_TYPE__ f4 (__INT32_TYPE__ i) +{ + return (__INT32_TYPE__)1 << (31 - i); +} + +/* { dg-final { scan-tree-dump-times "= 31 -" 1 "gimple" } } */ +/* { dg-final { scan-tree-dump-times "9223372036854775808 >>" 2 "gimple" } } */ +/* { dg-final { scan-tree-dump "2147483648 >>" "gimple" } } */
[PATCH] [ARM] Cleanup macro TARGET_EITHER
Hi all This is a cleanup patch to remove the macro TARGET_EITHER. This macro seems to have become irrelevant in recent times since its previous definition had been commented out and replaced with 1. Bootstrapped and tested on arm-none-linux-gnueabihf. Sudi 2017-03-10 Sudakshina Das * config/arm/arm.h (TARGET_EITHER): Delete. * config/arm/arm.md: Delete instances of TARGET_EITHER.diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h index e95eda3..6afb54b 100644 --- a/gcc/config/arm/arm.h +++ b/gcc/config/arm/arm.h @@ -130,7 +130,6 @@ extern tree arm_fp16_type_node; #define TARGET_REALLY_IWMMXT2 (TARGET_IWMMXT2 && TARGET_32BIT) #define TARGET_IWMMXT_ABI (TARGET_32BIT && arm_abi == ARM_ABI_IWMMXT) #define TARGET_ARM (! TARGET_THUMB) -#define TARGET_EITHER 1 /* (TARGET_ARM | TARGET_THUMB) */ #define TARGET_BACKTRACE (crtl->is_leaf \ ? TARGET_TPCS_LEAF_FRAME \ : TARGET_TPCS_FRAME) diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8720a71..3d9a09f 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -448,7 +448,7 @@ (plus:DI (match_operand:DI 1 "s_register_operand" "") (match_operand:DI 2 "arm_adddi_operand" ""))) (clobber (reg:CC CC_REGNUM))])] - "TARGET_EITHER" + "" " if (TARGET_THUMB1) { @@ -577,7 +577,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (plus:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "reg_or_int_operand" "")))] - "TARGET_EITHER" + "" " if (TARGET_32BIT && CONST_INT_P (operands[2])) { @@ -1254,7 +1254,7 @@ (minus:DI (match_operand:DI 1 "s_register_operand" "") (match_operand:DI 2 "s_register_operand" ""))) (clobber (reg:CC CC_REGNUM))])] - "TARGET_EITHER" + "" " if (TARGET_THUMB1) { @@ -1430,7 +1430,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (minus:SI (match_operand:SI 1 "reg_or_int_operand" "") (match_operand:SI 2 "s_register_operand" "")))] - "TARGET_EITHER" + "" " if (CONST_INT_P (operands[1])) { @@ -1566,7 +1566,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (mult:SI (match_operand:SI 2 "s_register_operand" "") (match_operand:SI 1 "s_register_operand" "")))] - "TARGET_EITHER" + "" "" ) @@ -2345,7 +2345,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (and:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "reg_or_int_operand" "")))] - "TARGET_EITHER" + "" " if (TARGET_32BIT) { @@ -3210,7 +3210,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (ior:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "reg_or_int_operand" "")))] - "TARGET_EITHER" + "" " if (CONST_INT_P (operands[2])) { @@ -3394,7 +3394,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (xor:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "reg_or_int_operand" "")))] - "TARGET_EITHER" + "" "if (CONST_INT_P (operands[2])) { if (TARGET_32BIT) @@ -4052,7 +4052,7 @@ [(set (match_operand:SI0 "s_register_operand" "") (ashift:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "arm_rhs_operand" "")))] - "TARGET_EITHER" + "" " if (CONST_INT_P (operands[2]) && (UINTVAL (operands[2])) > 31) @@ -4121,7 +4121,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (ashiftrt:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "arm_rhs_operand" "")))] - "TARGET_EITHER" + "" " if (CONST_INT_P (operands[2]) && UINTVAL (operands[2]) > 31) @@ -4187,7 +4187,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (lshiftrt:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "arm_rhs_operand" "")))] - "TARGET_EITHER" + "" " if (CONST_INT_P (operands[2]) && (UINTVAL (operands[2])) > 31) @@ -4219,7 +4219,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (rotatert:SI (match_operand:SI 1 "s_register_operand" "") (match_operand:SI 2 "arm_rhs_operand" "")))] - "TARGET_EITHER" + "" " if (TARGET_32BIT) { @@ -4677,7 +4677,7 @@ [(set (match_operand:DI 0 "s_register_operand" "") (neg:DI (match_operand:DI 1 "s_register_operand" ""))) (clobber (reg:CC CC_REGNUM))])] - "TARGET_EITHER" + "" { if (TARGET_NEON) { @@ -4730,7 +4730,7 @@ (define_expand "negsi2" [(set (match_operand:SI 0 "s_register_operand" "") (neg:SI (match_operand:SI 1 "s_register_operand" "")))] - "TARGET_EITHER" + "" "" ) @@ -4882,7 +4882,7 @@ [(set (match_operand:SI 0 "s_register_operand" "") (abs:SI (match_operand:SI 1 "s_register_operand" ""))) (clobber (match_dup 2))])] - "TARGET_EITHER" + "" " if (TARGET_THUMB1)
[PATCH][AArch64] Allow CMP+SHIFT when comparing with zero
Hi all The backend pattern for combining a CMP+SHIFT was missing out on a case when comparing with zero. This was happening because aarch64_select_cc_mode (SELECT_CC_MODE) was not returning the correct mode (in this case CC_SWP) which was needed to identify the combine. This patch adds this missing case. For the test case : int f3 (int x, int y) { int res = x << 3; return res != 0; } We are now generating (at -O2) f3: cmp wzr, w0, lsl 3 csetw0, ne ret instead of : f3: lsl w0, w0, 3 cmp w0, 0 csetw0, ne ret Added this new test and checked for regressions on bootstrapped aarch64-none-linux-gnu. Ok for stage 1? Thanks Sudi 2017-03-10 Sudakshina Das * config/aarch64/aarch64.c (aarch64_select_cc_mode): Return CC_SWP for comparision with zero. 2017-03-10 Sudakshina Das * gcc.target/aarch64/cmp_shifted_reg_1.c: New Test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 714bb79..01af2a7 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4707,7 +4707,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) the comparison will have to be swapped when we emit the assembly code. */ if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode) - && (REG_P (y) || GET_CODE (y) == SUBREG) + && (REG_P (y) || GET_CODE (y) == SUBREG || y == const0_rtx) && (GET_CODE (x) == ASHIFT || GET_CODE (x) == ASHIFTRT || GET_CODE (x) == LSHIFTRT || GET_CODE (x) == ZERO_EXTEND || GET_CODE (x) == SIGN_EXTEND)) diff --git a/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c b/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c new file mode 100644 index 000..cacecf4 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmp_shifted_reg_1.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 " } */ + +int f3 (int x, int y) +{ + int res = x << 3; + return res != 0; +} + +/* We should combine the shift and compare */ +/* { dg-final { scan-assembler "cmp\.*\twzr, w\[0-9\]+, lsl 3" } } */