RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] Sent: 19 May 2014 11:45 To: Ian Bolton Cc: gcc-patches Subject: Re: [PATCH, AArch64] Fix macro in vdup_lane_2 test case On 8 May 2014 18:41, Ian Bolton ian.bol...@arm.com wrote: gcc/testsuite * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an actual instruction to move into the allocated register. This macro is attempting to force a value to a particular class of register, we don't need or want the mov instruction at all. Isn't something like this sufficient: #define force_simd(V1) asm volatile ( \ : +w(V1)\ : \ : /* No clobbers */) ? /Marcus Thanks for the review, Marcus. I did not think of that and it looks sane, but your suggested approach leads to some of the dup instructions being optimised away. Ordinarily, that would be great but these test cases are trying to force the dups to occur. Cheers, Ian
RE: Using particular register class (like floating point registers) as spill register class
Please can you try that on trunk and report back. OK, this is trunk, and I'm not longer seeing that happen. However, I am seeing: 0x007fb76dc82c +160: adrpx25, 0x7fb7c8 0x007fb76dc830 +164: add x25, x25, #0x480 0x007fb76dc834 +168: fmovd8, x0 0x007fb76dc838 +172: add x0, x29, #0x160 0x007fb76dc83c +176: fmovd9, x0 0x007fb76dc840 +180: add x0, x29, #0xd8 0x007fb76dc844 +184: fmovd10, x0 0x007fb76dc848 +188: add x0, x29, #0xf8 0x007fb76dc84c +192: fmovd11, x0 followed later by: 0x007fb76dd224 +2712:fmovx0, d9 0x007fb76dd228 +2716:add x6, x29, #0x118 0x007fb76dd22c +2720:str x20, [x0,w27,sxtw #3] 0x007fb76dd230 +2724:fmovx0, d10 0x007fb76dd234 +2728:str w28, [x0,w27,sxtw #2] 0x007fb76dd238 +2732:fmovx0, d11 0x007fb76dd23c +2736:str w19, [x0,w27,sxtw #2] which seems a bit suboptimal, given that these double registers now have to be saved in the prologue. Thanks for doing that. Many AArch64 improvements have gone in since 4.8 was released. I think we'd have to see the output for the whole function to determine whether that code is sane. I don't suppose the source code is shareable or you have a testcase for this you can share? Cheers, Ian
RE: Using particular register class (like floating point registers) as spill register class
On 05/16/2014 12:05 PM, Kugan wrote: On 16/05/14 20:40, pins...@gmail.com wrote: On May 16, 2014, at 3:23 AM, Kugan kugan.vivekanandara...@linaro.org wrote: I would like to know if there is anyway we can use registers from particular register class just as spill registers (in places where register allocator would normally spill to stack and nothing more), when it can be useful. In AArch64, in some cases, compiling with -mgeneral-regs-only produces better performance compared not using it. The difference here is that when -mgeneral-regs-only is not used, floating point register are also used in register allocation. Then IRA/LRA has to move them to core registers before performing operations as shown below. Can you show the code with fp register disabled? Does it use the stack to spill? Normally this is due to register to register class costs compared to register to memory move cost. Also I think it depends on the processor rather the target. For thunder, using the fp registers might actually be better than using the stack depending if the stack was in L1. Not all the LDR/STR combination match to fmov. In the testcase I have, aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S -mgeneral-regs-only grep -c ldr sha_dgst.s 50 grep -c str sha_dgst.s 42 grep -c fmov sha_dgst.s 0 aarch64-none-linux-gnu-gcc sha_dgst.c -O2 -S grep -c ldr sha_dgst.s 42 grep -c str sha_dgst.s 31 grep -c fmov sha_dgst.s 105 I am not saying that we shouldn't use floating point register here. But from the above, it seems like register allocator is using it as more like core register (even though the cost mode has higher cost) and then moving the values to core registers before operations. if that is the case, my question is, how do we just make this as spill register class so that we will replace ldr/str with equal number of fmov when it is possible. I'm also seeing stuff like this: = 0x7fb72a0928 ClassFileParser::parse_constant_pool_entries(int, Thread*)+2500: add x21, x4, x21, lsl #3 = 0x7fb72a092c ClassFileParser::parse_constant_pool_entries(int, Thread*)+2504: fmov w2, s8 = 0x7fb72a0930 ClassFileParser::parse_constant_pool_entries(int, Thread*)+2508: str w2, [x21,#88] I guess GCC doesn't know how to store an SImode value in an FP register into memory? This is 4.8.1. Please can you try that on trunk and report back. Thanks, Ian
RE: soft-fp functions support without using libgcc
On Fri, May 16, 2014 at 6:34 AM, Sheheryar Zahoor Qazi sheheryar.zahoor.q...@gmail.com wrote: I am trying to provide soft-fp support to a an 18-bit soft-core processor architecture at my university. But the problem is that libgcc has not been cross-compiled for my target architecture and some functions are missing so i cannot build libgcc.I believe soft-fp is compiled in libgcc so i am usable to invoke soft-fp functions from libgcc. It is possible for me to provide soft-fp support without using libgcc. How should i proceed in defining the functions? Any idea? And does any archoitecture provide floating point support withoput using libgcc? I'm sorry, I don't understand the premise of your question. It is not necessary to build libgcc before building libgcc. That would not make sense. If you have a working compiler that is missing some functions provided by libgcc, that should be sufficient to build libgcc. If you replace cross-compiled with ported, I think it makes senses. Can one provide soft-fp support without porting libgcc for their architecture? Cheers, Ian
RE: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
Ping. This should be relatively simple to review. Many thanks. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Ian Bolton Sent: 08 May 2014 18:36 To: gcc-patches Subject: [PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long beefcafebabe () { return 0xBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0x, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0x. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test.
RE: [PATCH, AArch64] Fix macro in vdup_lane_2 test case
Ping. This may well be classed as obvious, but that's not obvious to me, so I request a review. Many thanks. -Original Message- From: gcc-patches-ow...@gcc.gnu.org [mailto:gcc-patches- ow...@gcc.gnu.org] On Behalf Of Ian Bolton Sent: 08 May 2014 18:42 To: gcc-patches Subject: [PATCH, AArch64] Fix macro in vdup_lane_2 test case This patch fixes a defective macro definition, based on correct definition in similar testcases. The test currently passes through luck rather than correctness. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/testsuite * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an actual instruction to move into the allocated register.
GCC driver to Compile twice, score the assembly, choose the best?
Hi, fellow GCC developers! I was wondering if the gcc driver could be made to invoke cc1 twice, with different flags, and then just keep the better of the two .s files that comes out? I'm sure this is not a new idea, but I'm not aware of anything being done in this area, so I've made this post to gather your views. :) The kinds of flags I am thinking could be toggled are register allocation and instruction scheduling ones, since it's very hard to find one-size-fits-all there and we don't really want to have the user depend on knowing the right one. Obviously, compilation time will go up, but the run-time benefits could be huge. What are your thoughts? What work in this area have I failed to dig up in my limited research? Many thanks, Ian
RE: GCC driver to Compile twice, score the assembly, choose the best?
Thanks for the quick response. On Thu, May 15, 2014 at 1:46 PM, Ian Bolton ian.bol...@arm.com wrote: Hi, fellow GCC developers! I was wondering if the gcc driver could be made to invoke cc1 twice, with different flags, and then just keep the better of the two .s files that comes out? I'd be interested in your .s comparison tool that decides which one is better! Well, yes, that's the hard part and it could be done a number of ways. I think it would make a good contest actually, or summer coding project. Or at least a nice subject for brainstorming at the GNU Cauldron. :) Cheers, Ian
[PATCH, AArch64] Implement HARD_REGNO_CALLER_SAVE_MODE
Currently, on AArch64, when a caller-save register is saved/restored, GCC is accessing the maximum size of the hard register. So an SImode integer (4 bytes) value is being stored as DImode (8 bytes) because the int registers are 8 bytes wide, and an SFmode float (4 bytes) and DFmode double (8 bytes) are being stored as TImode (16 bytes) to capture the full 128-bits of the vector register. This patch corrects this, by implementing the HARD_REGNO_CALLER_SAVE_MODE hook, which is called by LRA to determine the minimise size it might need to save/restore. Tested on GCC regression suite and verified impact on a number of examples. OK for trunk? Cheers, Ian 2014-05-12 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64-protos.h (aarch64_hard_regno_caller_save_mode): New prototype. * config/aarch64/aarch64.c (aarch64_hard_regno_caller_save_mode): New function. * config/aarch64/aarch64.h (HARD_REGNO_CALLER_SAVE_MODE): New macro.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 04cbc78..7cf7d9f 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -202,6 +202,8 @@ enum aarch64_symbol_type aarch64_classify_symbol (rtx, enum aarch64_symbol_type aarch64_classify_tls_symbol (rtx); enum reg_class aarch64_regno_regclass (unsigned); int aarch64_asm_preferred_eh_data_format (int, int); +enum machine_mode aarch64_hard_regno_caller_save_mode (unsigned, unsigned, + enum machine_mode); int aarch64_hard_regno_mode_ok (unsigned, enum machine_mode); int aarch64_hard_regno_nregs (unsigned, enum machine_mode); int aarch64_simd_attr_length_move (rtx); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 8655f04..c2cc81b 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -424,6 +424,24 @@ aarch64_hard_regno_mode_ok (unsigned regno, enum machine_mode mode) return 0; } +/* Implement HARD_REGNO_CALLER_SAVE_MODE. */ +enum machine_mode +aarch64_hard_regno_caller_save_mode (unsigned regno, unsigned nregs, +enum machine_mode mode) +{ + /* Handle modes that fit within single registers. */ + if (nregs == 1 GET_MODE_SIZE (mode) = 16) +{ + if (GET_MODE_SIZE (mode) = 4) +return mode; + else +return SImode; +} + /* Fall back to generic for multi-reg and very large modes. */ + else +return choose_hard_reg_mode (regno, nregs, false); +} + /* Return true if calls to DECL should be treated as long-calls (ie called via a register). */ static bool diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index c9b30d0..0574593 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -824,6 +824,11 @@ do { \ #define SHIFT_COUNT_TRUNCATED !TARGET_SIMD +/* Choose appropriate mode for caller saves, so we do the minimum + required size of load/store. */ +#define HARD_REGNO_CALLER_SAVE_MODE(REGNO, NREGS, MODE) \ + aarch64_hard_regno_caller_save_mode ((REGNO), (NREGS), (MODE)) + /* Callee only saves lower 64-bits of a 128-bit register. Tell the compiler the callee clobbers the top 64-bits when restoring the bottom 64-bits. */
[PATCH, AArch64] Use MOVN to generate 64-bit negative immediates where sensible
Hi, It currently takes 4 instructions to generate certain immediates on AArch64 (unless we put them in the constant pool). For example ... long long beefcafebabe () { return 0xBEEFCAFEBABEll; } leads to ... mov x0, 0x47806 mov x0, 0xcafe, lsl 16 mov x0, 0xbeef, lsl 32 orr x0, x0, -281474976710656 The above case is tackled in this patch by employing MOVN to generate the top 32-bits in a single instruction ... mov x0, -71536975282177 movk x0, 0xcafe, lsl 16 movk x0, 0xbabe, lsl 0 Note that where at least two half-words are 0x, existing code that does the immediate in two instructions is still used.) Tested on standard gcc regressions and the attached test case. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Use MOVN when top-most half-word (and only that half-word) is 0x. gcc/testsuite/ * gcc.target/aarch64/movn_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 43a83566..a8e504e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -1177,6 +1177,18 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } + /* Look for case where upper 16 bits are set, so we can use MOVN. */ + if ((val 0xll) == 0xll) +{ + emit_insn (gen_rtx_SET (VOIDmode, dest, + GEN_INT (~ (~val (0xll 32); + emit_insn (gen_insv_immdi (dest, GEN_INT (16), +GEN_INT ((val 16) 0x))); + emit_insn (gen_insv_immdi (dest, GEN_INT (0), +GEN_INT (val 0x))); + return; +} + simple_sequence: first = true; mask = 0x; diff --git a/gcc/testsuite/gcc.target/aarch64/movn_1.c b/gcc/testsuite/gcc.target/aarch64/movn_1.c new file mode 100644 index 000..cc11ade --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movn_1.c @@ -0,0 +1,27 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +long long +foo () +{ + /* { dg-final { scan-assembler mov\tx\[0-9\]+, -71536975282177 } } */ + return 0xbeefcafebabell; +} + +long long +merge4 (int a, int b, int c, int d) +{ + return ((long long) a 48 | (long long) b 32 + | (long long) c 16 | (long long) d); +} + +int main () +{ + if (foo () != merge4 (0x, 0xbeef, 0xcafe, 0xbabe)) +abort (); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Fix macro in vdup_lane_2 test case
This patch fixes a defective macro definition, based on correct definition in similar testcases. The test currently passes through luck rather than correctness. OK for commit? Cheers, Ian 2014-05-08 Ian Bolton ian.bol...@arm.com gcc/testsuite * gcc.target/aarch64/vdup_lane_2.c (force_simd): Emit an actual instruction to move into the allocated register.diff --git a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c index 7c04e75..2072c79 100644 --- a/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c +++ b/gcc/testsuite/gcc.target/aarch64/vdup_lane_2.c @@ -4,10 +4,11 @@ #include arm_neon.h -#define force_simd(V1) asm volatile ( \ - : =w(V1) \ - : w(V1)\ - : /* No clobbers */) +/* Used to force a variable to a SIMD register. */ +#define force_simd(V1) asm volatile (orr %0.16b, %1.16b, %1.16b\ + : =w(V1) \ + : w(V1)\ + : /* No clobbers */); extern void abort (void);
RE: [PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15
Hi, On 28 January 2014 13:10, Ramana Radhakrishnan ramana@googlemail.com wrote: On Fri, Jan 24, 2014 at 5:16 PM, Ian Bolton ian.bol...@arm.com wrote: Hi there! An existing optimisation for Thumb-2 converts t32 encodings to t16 encodings to reduce codesize, at the expense of causing redundant flag setting for ADD, AND, etc. This redundant flag setting can have negative performance impact on cortex-a15. This patch introduces two new tuning options so that the conversion from t32 to t16, which takes place in thumb2_reorg, can be suppressed for cortex-a15. To maintain some of the original benefit (reduced codesize), the suppression is only done where the enclosing basic block is deemed worthy of optimising for speed. This tested with no regressions and performance has improved for the workloads tested on cortex-a15. (It might be beneficial to other processors too, but that has not been investigated yet.) OK for stage 1? This is OK for stage1. Ramana Cheers, Ian 2014-01-24 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm-protos.h (tune_params): New struct members. * config/arm/arm.c: Initialise tune_params per processor. (thumb2_reorg): Suppress conversion from t32 to t16 when optimizing for speed, based on new tune_params. This causes gcc.target/arm/negdi-1.c gcc.target/arm/negdi-2.c to FAIL when GCC is configured as: --with-mode=ar --with-cpu=cortex-a15 --with-fpu=neon-vfpv4 both tests used to PASS. (see http://cbuild.validation.linaro.org/build/cross- validation/gcc/209561/report-build-info.html) Hi Christophe, I don't recall the failure when I did the work, but I see now that the test is looking for negs when my patch is specifically trying to avoid flag-setting operations. So we are now getting an rsb instead of a negs, as intended, and the test needs fixing! Open question: Should I look for either rsb or negs in a single scan-assembler or look for different ones dependent on the cpu in question or just not run the test for cortex-a15? Cheers, Ian
RE: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
-Original Message- From: Richard Earnshaw Sent: 21 March 2014 13:57 To: Ian Bolton Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A On 19/03/14 16:53, Ian Bolton wrote: This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. arm-and-ior-notdi-zeroextend-patch-v1.txt diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr type multiple)] ) +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r,r] The early clobber and register tying here is unnecessary. All of the input operands are consumed in the first instruction, so you can eliminate the ties and the restriction on the overlap. Something like (untested): +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r] Ok for stage-1 with that change (though I'd recommend a another test run to validate the above). R. Thanks, Richard. Regression runs came back OK with that change, so I will consider this ready for stage 1. The patch is attached for reference. Cheers, Ian diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..4176b7ff 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr type multiple)] ) +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r] + TARGET_32BIT + # + TARGET_32BIT reload_completed + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *anddi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr type multiple)] ) +(define_insn_and_split *iordi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) + (zero_extend:DI +(match_operand:SI 1 s_register_operand r,r] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *iordi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +typedef
[PATCH, ARM] Optimise NotDI AND/OR ZeroExtendSI for ARMv7A
This is a follow-on patch to one already committed: http://gcc.gnu.org/ml/gcc-patches/2014-02/msg01128.html It implements patterns to simplify our RTL as follows: OR (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half can be done with a MVN AND (Not:DI (A:DI), ZeroExtend:DI (B:SI)) -- the top half becomes zero. I've added test cases for both of these and also the existing anddi_notdi patterns. The tests all pass. Full regression runs passed. OK for stage 1? Cheers, Ian 2014-03-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm.md (*anddi_notdi_zesidi): New pattern * config/arm/thumb2.md (*iordi_notdi_zesidi): New pattern. testsuite/ * gcc.target/arm/anddi_notdi-1.c: New test. * gcc.target/arm/iordi_notdi-1.c: New test case. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 2ddda02..d2d85ee 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -2962,6 +2962,28 @@ (set_attr type multiple)] ) +(define_insn_and_split *anddi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) +(and:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) +(zero_extend:DI + (match_operand:SI 1 s_register_operand r,r] + TARGET_32BIT + # + TARGET_32BIT reload_completed + [(set (match_dup 0) (and:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int 0))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *anddi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (and:DI (not:DI (sign_extend:DI diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 467c619..10bc8b1 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1418,6 +1418,30 @@ (set_attr type multiple)] ) +(define_insn_and_split *iordi_notdi_zesidi + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (match_operand:DI 2 s_register_operand 0,?r)) + (zero_extend:DI +(match_operand:SI 1 s_register_operand r,r] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (not:SI (match_dup 4)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[4] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn_and_split *iordi_notsesidi_di [(set (match_operand:DI 0 s_register_operand =r,r) (ior:DI (not:DI (sign_extend:DI diff --git a/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c new file mode 100644 index 000..cfb33fc --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/anddi_notdi-1.c @@ -0,0 +1,65 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +anddi_di_notdi (s64int a, s64int b) +{ + return (a ~b); +} + +s64int +anddi_di_notzesidi (s64int a, u32int b) +{ + return (a ~(u64int) b); +} + +s64int +anddi_notdi_zesidi (s64int a, u32int b) +{ + return (~a (u64int) b); +} + +s64int +anddi_di_notsesidi (s64int a, s32int b) +{ + return (a ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x5f470112ll; + s64int c64 = 0xdeadbeef300fll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = anddi_di_notdi (c64, b64); + if (z != 0xdeadbeef2008ll) +abort (); + + z = anddi_di_notzesidi (a64, c32); + if (z != 0xdeadbeefb0b0ll) +abort (); + + z = anddi_notdi_zesidi (c64, c32); + if (z != 0x01104f4fll) +abort (); + + z = anddi_di_notsesidi (a64, d32); + if (z != 0x0531ll) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times bic\t 6 } } */ + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c index cda9c0e..249f080 100644 --- a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c +++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c @@ -9,19 +9,25 @@ typedef unsigned long long u64int; typedef unsigned int u32int; s64int -iordi_notdi (s64int a, s64int b) +iordi_di_notdi (s64int a, s64int b) { return (a | ~b); } s64int -iordi_notzesidi (s64int a, u32int b
[PATCH] Keep -ffp-contract=fast by default if we have -funsafe-math-optimizations
Hi, In common.opt, -ffp-contract=fast is set as the default for GCC. But then it gets disabled in c-family/c-opts.c if you are using ISO C (e.g. with -std=c99). The reason for this patch is that if you have also specified -funsafe-math-optimizations (or -Ofast or -ffast-math) then it is likely your preference to have -ffp-contract=fast on, so you can generate fused multiply adds (fma standard pattern). This patch works by blocking the override if you have -funsafe-math-optimizations (directly or indirectly), causing fused multiply add to be used in the places where we might hope to see it. (I had considered forcing -ffp-contract=fast on in opts.c if you have -funsafe-math-optimizations, but it is already on by default ... and it didn't work either! The problem is that it is forced off unless you have explicitly asked for -ffp-contract=fast at the command-line.) Standard regressions passed. OK for trunk or stage 1? Cheers, Ian 10-03-2014 Ian Bolton ian.bol...@arm.com * gcc/c-family/c-opts.c (c_common_post_options): Don't override -ffp-contract=fast if unsafe-math-optimizations is on.diff --git a/gcc/c-family/c-opts.c b/gcc/c-family/c-opts.c index b7478f3..92ba481 100644 --- a/gcc/c-family/c-opts.c +++ b/gcc/c-family/c-opts.c @@ -834,7 +834,8 @@ c_common_post_options (const char **pfilename) if (flag_iso !c_dialect_cxx () (global_options_set.x_flag_fp_contract_mode - == (enum fp_contract_mode) 0)) + == (enum fp_contract_mode) 0) + flag_unsafe_math_optimizations == 0) flag_fp_contract_mode = FP_CONTRACT_OFF; /* By default we use C99 inline semantics in GNU99 or C99 mode. C99
RE: Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?
-Original Message- From: Joseph S. Myers On Thu, 6 Mar 2014, Ian Bolton wrote: I see in common.opt that fp-contract=fast is the default for GCC. But then it gets disabled in c-family/c-opts.c if you are using ISO C (e.g. with -std=c99). But surely if you have also specified -funsafe-math-optimizations then it should flip it back onto fast? That seems reasonable. Thanks for the feedback, Joseph. I've been working on the patch, but I see in gcc/opts.c that there is a definite distinction between set_fast_math_flags and set_unsafe_math_optimizations_flags. I'm thinking this is more of a fast-math thing than an unsafe_math_optimizations thing, so I should actually be adding it in set_fast_math_flags. Is this correct? Whilst I'm on, I think I found a bug in the Optimize-Options documentation ... It says that -ffast-math is enabled by -Ofast and that -ffast-math enabled -funsafe-math-optimizations. But the definition of -funsafe-math-optimizations says it is not turned on by any -O option. I think it should say that it is turned on by -Ofast (via -ffast-math), just for clarity and consistency. Does that make sense? Many thanks, Ian -- Joseph S. Myers jos...@codesourcery.com
RE: Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?
Thanks for the feedback, Joseph. I've been working on the patch, but I see in gcc/opts.c that there is a definite distinction between set_fast_math_flags and set_unsafe_math_optimizations_flags. I'm thinking this is more of a fast-math thing than an unsafe_math_optimizations thing, so I should actually be adding it in set_fast_math_flags. Is this correct? Scratch that. I went ahead and tried to set it in opts.c, but the c-opts.c override (to OFF) will currently happen in every case UNLESS you explicitly asked for -ffp-contract=fast. I therefore think the better solution is now to not override it in c-opts if unsafe-math-optimizations (which itself is enabled by -ffast-math and/or -Ofast) is enabled. We already have fp-contract=fast on by default, so we should just stop it being turned off unnecessarily instead. I've successfully coded this one and it works as expected when compiled with -Ofast -std=c99. I'll prepare the patch now. Cheers, Ian
Shouldn't unsafe-math-optimizations (re-)enable fp-contract=fast?
Hi there, I see in common.opt that fp-contract=fast is the default for GCC. But then it gets disabled in c-family/c-opts.c if you are using ISO C (e.g. with -std=c99). But surely if you have also specified -funsafe-math-optimizations then it should flip it back onto fast? I see in gcc/opts.c that a few other flags are triggered based on -funsafe-math-optimizations but not -ffp-contract=fast. I think this is a bug and could lead to less optimal code than people would like if they want c99 compliance for reason A but also unsafe math for reason B. What do you think? Cheers, Ian
[PATCH, AArch64] Define __ARM_NEON by default
Hi, This is needed for when people are porting their aarch32 code to aarch64. They will have #ifdef __ARM_NEON (as specified in ACLE) and their intrinsics currently won't get used on aarch64 because it's not defined there by default. This patch defines __ARM_NEON so long as we are not using general regs only. Tested on simple testcase to ensure __ARM_NEON was defined. OK for trunk? Cheers, Ian 2014-02-24 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.h: Define __ARM_NEON by default if we are not using general regs only.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 13c424c..fc21981 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -32,6 +32,9 @@ else \ builtin_define (__AARCH64EL__); \ \ + if (!TARGET_GENERAL_REGS_ONLY) \ + builtin_define (__ARM_NEON); \ + \ switch (aarch64_cmodel) \ { \ case AARCH64_CMODEL_TINY: \
[PATCH, ARM] Support ORN for DImode
Hi, Patterns had previously been added to thumb2.md to support ORN, but only for SImode. This patch adds DImode support, to cover the full 64|64-64 operation and the various 32|64-64 operations (see AND:DI variants that use NOT). The patch comes with its own execution test and looks for correct number of ORN instructions in the assembly. Regressions passed. OK for stage 1? 2014-02-19 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/thumb2.md (*iordi_notdi_di): New pattern. (*iordi_notzesidi): New pattern. (*iordi_notsesidi_di): New pattern. testsuite/ * gcc.target/arm/iordi_notdi-1.c: New test.diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md index 4f247f8..6a71fec 100644 --- a/gcc/config/arm/thumb2.md +++ b/gcc/config/arm/thumb2.md @@ -1366,6 +1366,79 @@ (set_attr type alu_reg)] ) +; Constants for op 2 will never be given to these patterns. +(define_insn_and_split *iordi_notdi_di + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (match_operand:DI 1 s_register_operand 0,r)) + (match_operand:DI 2 s_register_operand r,0)))] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2))) + (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[4] = gen_highpart (SImode, operands[1]); +operands[1] = gen_lowpart (SImode, operands[1]); +operands[5] = gen_highpart (SImode, operands[2]); +operands[2] = gen_lowpart (SImode, operands[2]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + +(define_insn_and_split *iordi_notzesidi_di + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (zero_extend:DI +(match_operand:SI 2 s_register_operand r,r))) + (match_operand:DI 1 s_register_operand 0,?r)))] + TARGET_THUMB2 + # + ; (not (zero_extend...)) means operand0 will always be 0x + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (const_int -1))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[1] = gen_lowpart (SImode, operands[1]); + } + [(set_attr length 4,8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + +(define_insn_and_split *iordi_notsesidi_di + [(set (match_operand:DI 0 s_register_operand =r,r) + (ior:DI (not:DI (sign_extend:DI +(match_operand:SI 2 s_register_operand r,r))) + (match_operand:DI 1 s_register_operand 0,r)))] + TARGET_THUMB2 + # + TARGET_THUMB2 reload_completed + [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1))) + (set (match_dup 3) (ior:SI (not:SI + (ashiftrt:SI (match_dup 2) (const_int 31))) + (match_dup 4)))] + + { +operands[3] = gen_highpart (SImode, operands[0]); +operands[0] = gen_lowpart (SImode, operands[0]); +operands[4] = gen_highpart (SImode, operands[1]); +operands[1] = gen_lowpart (SImode, operands[1]); + } + [(set_attr length 8) + (set_attr predicable yes) + (set_attr predicable_short_it no) + (set_attr type multiple)] +) + (define_insn *orsi_notsi_si [(set (match_operand:SI 0 s_register_operand =r) (ior:SI (not:SI (match_operand:SI 2 s_register_operand r)) diff --git a/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c new file mode 100644 index 000..cda9c0e --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/iordi_notdi-1.c @@ -0,0 +1,54 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +typedef long long s64int; +typedef int s32int; +typedef unsigned long long u64int; +typedef unsigned int u32int; + +s64int +iordi_notdi (s64int a, s64int b) +{ + return (a | ~b); +} + +s64int +iordi_notzesidi (s64int a, u32int b) +{ + return (a | ~(u64int) b); +} + +s64int +iordi_notsesidi (s64int a, s32int b) +{ + return (a | ~(s64int) b); +} + +int main () +{ + s64int a64 = 0xdeadbeefll; + s64int b64 = 0x4f4f0112ll; + + u32int c32 = 0x01124f4f; + s32int d32 = 0xabbaface; + + s64int z = iordi_notdi (a64, b64); + if (z != 0xb0b0feedll) +abort (); + + z = iordi_notzesidi (a64, c32); + if (z != 0xfeedb0b0ll) +abort (); + + z = iordi_notsesidi (a64, d32); + if (z != 0xdeadbeef54450531ll) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler-times orn\t 5 { target arm_thumb2 } } } */ + +/* { dg-final { cleanup-saved-temps } } */
RE: [PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard
The pr59858.c testcase explicitly sets -msoft-float which is incompatible with our -mfloat-abi=hard variant. This patch therefore should not be run if you have -mfloat-abi=hard. Tested with both variations for arm-none-eabi build. OK for commit? Cheers, Ian 2014-02-13 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard. pr59858-skip-if-hard-float-patch-v2.txt diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c b/gcc/testsuite/gcc.target/arm/pr59858.c index 463bd38..1e03203 100644 --- a/gcc/testsuite/gcc.target/arm/pr59858.c +++ b/gcc/testsuite/gcc.target/arm/pr59858.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -march=armv5te -marm -mthumb-interwork -Wall - Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno- asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack- protector -Os -g -feliminate-unused-debug-types -funit-at-a-time - fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno- tree-dominator-opts -fno-strength-reduce -fPIC -w } */ +/* { dg-skip-if Test is not compatible with hard-float { *-*-* } { -mfloat-abi=hard } { } } */ typedef enum { REG_ENOSYS = -1, This won't work if hard-float is the default. Take a look at the way other tests check for this. Hi Richard, The test does actually pass if it is hard float by default. My comment on the skip line was misleading, because the precise issue is when someone specifies -mfloat-abi=hard on the command line. I've fixed up that comment in the attached patch now. I've also reduced the number of command-line options passed (without affecting the code generated) in the patch and changed -msoft-float into -mfloat-abi=soft, since the former is deprecated and maps to the latter anyway. OK for commit? Cheers, Iandiff --git a/gcc/testsuite/gcc.target/arm/pr59858.c b/gcc/testsuite/gcc.target/arm/pr59858.c index 463bd38..a944b9a 100644 --- a/gcc/testsuite/gcc.target/arm/pr59858.c +++ b/gcc/testsuite/gcc.target/arm/pr59858.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ -/* { dg-options -march=armv5te -marm -mthumb-interwork -Wall -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector -Os -g -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts -fno-strength-reduce -fPIC -w } */ +/* { dg-options -march=armv5te -fno-builtin -mfloat-abi=soft -mthumb -fno-stack-protector -Os -fno-tree-loop-optimize -fno-tree-dominator-opts -fPIC -w } */ +/* { dg-skip-if Incompatible command line options: -mfloat-abi=soft -mfloat-abi=hard { *-*-* } { -mfloat-abi=hard } { } } */ typedef enum { REG_ENOSYS = -1,
[PATCH, ARM] Skip pr59858.c test for -mfloat-abi=hard
Hi, The pr59858.c testcase explicitly sets -msoft-float which is incompatible with our -mfloat-abi=hard variant. This patch therefore should not be run if you have -mfloat-abi=hard. Tested with both variations for arm-none-eabi build. OK for commit? Cheers, Ian 2014-02-13 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.target/arm/pr59858.c: Skip test if -mfloat-abi=hard.diff --git a/gcc/testsuite/gcc.target/arm/pr59858.c b/gcc/testsuite/gcc.target/arm/pr59858.c index 463bd38..1e03203 100644 --- a/gcc/testsuite/gcc.target/arm/pr59858.c +++ b/gcc/testsuite/gcc.target/arm/pr59858.c @@ -1,5 +1,6 @@ /* { dg-do compile } */ /* { dg-options -march=armv5te -marm -mthumb-interwork -Wall -Wstrict-prototypes -Wstrict-aliasing -funsigned-char -fno-builtin -fno-asm -msoft-float -std=gnu99 -mlittle-endian -mthumb -fno-stack-protector -Os -g -feliminate-unused-debug-types -funit-at-a-time -fmerge-all-constants -fstrict-aliasing -fno-tree-loop-optimize -fno-tree-dominator-opts -fno-strength-reduce -fPIC -w } */ +/* { dg-skip-if Test is not compatible with hard-float { *-*-* } { -mfloat-abi=hard } { } } */ typedef enum { REG_ENOSYS = -1,
[PATCH] Make pr59597 test PIC-friendly
PR59597 reinstated some code to cancel unnecessary jump threading, and brought with it a testcase to check that the cancelling happened. http://gcc.gnu.org/ml/gcc-patches/2014-01/msg01448.html With PIC enabled for arm and aarch64, the unnecessary jump threading already never took place, so there is nothing to cancel, leading the test case to fail. My suspicion is that similar issues will happen for other architectures too. This patch changes the called function to be static, so that jump threading and the resulting cancellation happen for PIC variants too. OK for stage 4 or wait for stage 1? Cheers, Ian 2014-02-05 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.dg/tree-ssa/pr59597.c: Make called function static so that expected outcome works for PIC variants too.diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c index 814d299..bc9d730 100644 --- a/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr59597.c @@ -8,7 +8,8 @@ typedef unsigned int u32; u32 f[NNN], t[NNN]; -u16 Calc_crc8(u8 data, u16 crc ) +static u16 +Calc_crc8 (u8 data, u16 crc) { u8 i=0,x16=0,carry=0; for (i = 0; i 8; i++) @@ -31,7 +32,9 @@ u16 Calc_crc8(u8 data, u16 crc ) } return crc; } -int main (int argc, char argv[]) + +int +main (int argc, char argv[]) { int i, j; u16 crc; for (j = 0; j 1000; j++)
[PATCH, ARM] Suppress Redundant Flag Setting for Cortex-A15
Hi there! An existing optimisation for Thumb-2 converts t32 encodings to t16 encodings to reduce codesize, at the expense of causing redundant flag setting for ADD, AND, etc. This redundant flag setting can have negative performance impact on cortex-a15. This patch introduces two new tuning options so that the conversion from t32 to t16, which takes place in thumb2_reorg, can be suppressed for cortex-a15. To maintain some of the original benefit (reduced codesize), the suppression is only done where the enclosing basic block is deemed worthy of optimising for speed. This tested with no regressions and performance has improved for the workloads tested on cortex-a15. (It might be beneficial to other processors too, but that has not been investigated yet.) OK for stage 1? Cheers, Ian 2014-01-24 Ian Bolton ian.bol...@arm.com gcc/ * config/arm/arm-protos.h (tune_params): New struct members. * config/arm/arm.c: Initialise tune_params per processor. (thumb2_reorg): Suppress conversion from t32 to t16 when optimizing for speed, based on new tune_params. diff --git a/gcc/config/arm/arm-protos.h b/gcc/config/arm/arm-protos.h index 13874ee..74645ee 100644 --- a/gcc/config/arm/arm-protos.h +++ b/gcc/config/arm/arm-protos.h @@ -272,6 +272,11 @@ struct tune_params const struct cpu_vec_costs* vec_costs; /* Prefer Neon for 64-bit bitops. */ bool prefer_neon_for_64bits; + /* Prefer 32-bit encoding instead of flag-setting 16-bit encoding. */ + bool disparage_flag_setting_t16_encodings; + /* Prefer 32-bit encoding instead of 16-bit encoding where subset of flags + would be set. */ + bool disparage_partial_flag_setting_t16_encodings; }; extern const struct tune_params *current_tune; diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index fc81bf6..1ebaf84 100644 --- a/gcc/config/arm/arm.c +++ b/gcc/config/arm/arm.c @@ -1481,7 +1481,8 @@ const struct tune_params arm_slowmul_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_fastmul_tune = @@ -1497,7 +1498,8 @@ const struct tune_params arm_fastmul_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; /* StrongARM has early execution of branches, so a sequence that is worth @@ -1516,7 +1518,8 @@ const struct tune_params arm_strongarm_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_xscale_tune = @@ -1532,7 +1535,8 @@ const struct tune_params arm_xscale_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_9e_tune = @@ -1548,7 +1552,8 @@ const struct tune_params arm_9e_tune = false, /* Prefer LDRD/STRD. */ {true, true},/* Prefer non short circuit. */ arm_default_vec_cost,/* Vectorizer costs. */ - false /* Prefer Neon for 64-bits bitops. */ + false,/* Prefer Neon for 64-bits bitops. */ + false, false /* Prefer 32-bit encodings. */ }; const struct tune_params arm_v6t2_tune = @@ -1564,7 +1569,8
RE: [PATCH, ARM] Implement __builtin_trap
Hi, Currently, on ARM, you have to either call abort() or raise(SIGTRAP) to achieve a handy crash. This patch allows you to instead call __builtin_trap() which is much more efficient at falling over because it becomes just a single instruction that will trap for you. Two testcases have been added (for ARM and Thumb) and both pass. Note: This is a modified version of a patch originally submitted by Mark Mitchell back in 2010, which came in response to PR target/59091. http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) OK for trunk? Cheers, Ian 2013-12-04 Ian Bolton ian.bol...@arm.com Mark Mitchell m...@codesourcery.com gcc/ * config/arm/arm.md (trap): New pattern. * config/arm/types.md: Added a type for trap. testsuite/ * gcc.target/arm/builtin-trap.c: New test. * gcc.target/arm/thumb-builtin-trap.c: Likewise. aarch32-builtin-trap-v2.txt This needs to set the conds attribute to unconditional. Otherwise the ARM backend might try to turn this into a conditional instruction. R. Thanks, Richard. I fixed it up, tested it and committed as trivial difference compared to what was approved already.diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index dd73366..934b859 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9927,6 +9927,23 @@ (set_attr type mov_reg)] ) +(define_insn trap + [(trap_if (const_int 1) (const_int 0))] + + * + if (TARGET_ARM) +return \.inst\\t0xe7f000f0\; + else +return \.inst\\t0xdeff\; + + [(set (attr length) + (if_then_else (eq_attr is_thumb yes) + (const_int 2) + (const_int 4))) + (set_attr type trap) + (set_attr conds unconditional)] +) + ;; Patterns to allow combination of arithmetic, cond code and shifts diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 1c4b9e3..6351f08 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -152,6 +152,7 @@ ; store2 store 2 words to memory from arm registers. ; store3 store 3 words to memory from arm registers. ; store4 store 4 (or more) words to memory from arm registers. +; trap cause a trap in the kernel. ; udiv unsigned division. ; umaal unsigned multiply accumulate accumulate long. ; umlal unsigned multiply accumulate long. @@ -645,6 +646,7 @@ store2,\ store3,\ store4,\ + trap,\ udiv,\ umaal,\ umlal,\ diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c b/gcc/testsuite/gcc.target/arm/builtin-trap.c new file mode 100644 index 000..4ff8d25 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c new file mode 100644 index 000..22e90e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -mthumb } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xdeff } } */
[PATCH, ARM] Implement __builtin_trap
Hi, Currently, on ARM, you have to either call abort() or raise(SIGTRAP) to achieve a handy crash. This patch allows you to instead call __builtin_trap() which is much more efficient at falling over because it becomes just a single instruction that will trap for you. Two testcases have been added (for ARM and Thumb) and both pass. Note: This is a modified version of a patch originally submitted by Mark Mitchell back in 2010, which came in response to PR target/59091. http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00639.html http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59091 The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) OK for trunk? Cheers, Ian 2013-12-04 Ian Bolton ian.bol...@arm.com Mark Mitchell m...@codesourcery.com gcc/ * config/arm/arm.md (trap): New pattern. * config/arm/types.md: Added a type for trap. testsuite/ * gcc.target/arm/builtin-trap.c: New test. * gcc.target/arm/thumb-builtin-trap.c: Likewise. diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index dd73366..3b7a827 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -9927,6 +9927,22 @@ (set_attr type mov_reg)] ) +(define_insn trap + [(trap_if (const_int 1) (const_int 0))] + + * + if (TARGET_ARM) +return \.inst\\t0xe7f000f0\; + else +return \.inst\\t0xdeff\; + + [(set (attr length) + (if_then_else (eq_attr is_thumb yes) + (const_int 2) + (const_int 4))) + (set_attr type trap)] +) + ;; Patterns to allow combination of arithmetic, cond code and shifts diff --git a/gcc/config/arm/types.md b/gcc/config/arm/types.md index 1c4b9e3..6351f08 100644 --- a/gcc/config/arm/types.md +++ b/gcc/config/arm/types.md @@ -152,6 +152,7 @@ ; store2 store 2 words to memory from arm registers. ; store3 store 3 words to memory from arm registers. ; store4 store 4 (or more) words to memory from arm registers. +; trap cause a trap in the kernel. ; udiv unsigned division. ; umaal unsigned multiply accumulate accumulate long. ; umlal unsigned multiply accumulate long. @@ -645,6 +646,7 @@ store2,\ store3,\ store4,\ + trap,\ udiv,\ umaal,\ umlal,\ diff --git a/gcc/testsuite/gcc.target/arm/builtin-trap.c b/gcc/testsuite/gcc.target/arm/builtin-trap.c new file mode 100644 index 000..4ff8d25 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-trap.c @@ -0,0 +1,10 @@ +/* { dg-do compile } */ +/* { dg-require-effective-target arm32 } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xe7f000f0 { target { arm_nothumb } } } } */ diff --git a/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c new file mode 100644 index 000..22e90e7 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/thumb-builtin-trap.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -mthumb } */ +/* { dg-require-effective-target arm_thumb1_ok } */ + +void +trap () +{ + __builtin_trap (); +} + +/* { dg-final { scan-assembler 0xdeff } } */
RE: [PATCH, ARM] Implement __builtin_trap
On Wed, 4 Dec 2013, Ian Bolton wrote: The main update, other than cosmetic differences, is that we've chosen the same ARM encoding as LLVM for practical purposes. (The Thumb encoding in Mark's patch already matched LLVM.) Do the encodings match what plain udf does in recent-enough gas (too recent for us to assume it in GCC or glibc for now), or is it something else? Hi Joseph, Yes, these encodings match the UDF instruction that is defined in the most recent edition of the ARM architecture reference manual. Thumb: 0xde00 | imm8 (we chose 0xff for the imm8) ARM: 0xe7f000f0 | (imm12 8) | imm4 (we chose to use 0 for both imms) So as not to break old versions of gas that don't recognise UDF, the encoding is output directly. Apologies if I have over-explained there! Cheers, Ian
[PATCH, AArch64] Improve handling of constants destined for FP_REGS
(This patch supercedes this one: http://gcc.gnu.org/ml/gcc-patches/2013-07/msg01462.html) The movdi_aarch64 pattern allows moving a constant into an FP_REG, but has the constraint Dd, which is stricter than the constraint N for moving a constant into a CORE_REG. This is due to restricted values allowed for MOVI instruction. Due to the predicate allowing any constant that is valid for the CORE_REGs, we can run into situations where IRA/reload has decided to use FP_REGs but the value is not actually valid for MOVI. This patch makes use of TARGET_PREFERRED_RELOAD_CLASS to ensure that NO_REGS (which leads to literal pool) is returned, when the immediate can't be put directly into FP_REGS. A testcase is included. Linux regressions all came back good. OK for trunk? Cheers, Ian 2013-09-04 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_preferred_reload_class): Return NO_REGS for immediate that can't be moved directly into FP_REGS. testsuite/ * gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index aed035a..2c07ccf 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4236,10 +4236,18 @@ aarch64_class_max_nregs (reg_class_t regclass, enum machine_mode mode) } static reg_class_t -aarch64_preferred_reload_class (rtx x ATTRIBUTE_UNUSED, reg_class_t regclass) +aarch64_preferred_reload_class (rtx x, reg_class_t regclass) { - return ((regclass == POINTER_REGS || regclass == STACK_REG) - ? GENERAL_REGS : regclass); + if (regclass == POINTER_REGS || regclass == STACK_REG) +return GENERAL_REGS; + + /* If it's an integer immediate that MOVI can't handle, then + FP_REGS is not an option, so we return NO_REGS instead. */ + if (CONST_INT_P (x) reg_class_subset_p (regclass, FP_REGS) + !aarch64_simd_imm_scalar_p (x, GET_MODE (x))) +return NO_REGS; + + return regclass; } void diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c b/gcc/testsuite/gcc.target/aarch64/movdi_1.c new file mode 100644 index 000..a22378d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c @@ -0,0 +1,26 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-inline } */ + +#include arm_neon.h + +void +foo1 (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0x80004cf3dffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +} + +void +foo2 (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0xdffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +}
[PATCH, AArch64] Add secondary reload for immediates into FP_REGS
Our movdi_aarch64 pattern allows moving a constant into an FP_REG, but has the constraint Dd, which is stricter than the one for moving a constant into a CORE_REG. This is due to restricted values allowed for MOVI instructions. Due to the predicate for the pattern allowing any constant that is valid for the CORE_REGs, we can run into situations where IRA/reload has decided to use FP_REGs but the value is not actually valid for MOVI. This patch introduces a secondary reload to handle this case. Supplied with testcase that highlighted original problem. Tested on Linux GNU regressions. OK for trunk? Cheers, Ian 2013-07-30 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_secondary_reload)): Handle constant into FP_REGs that is not valid for MOVI. testsuite/ * gcc.target/aarch64/movdi_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 9941d7c..f16988e 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -4070,6 +4070,15 @@ aarch64_secondary_reload (bool in_p ATTRIBUTE_UNUSED, rtx x, if (rclass == FP_REGS (mode == TImode || mode == TFmode) CONSTANT_P(x)) return CORE_REGS; + /* Only a subset of the DImode immediate values valid for CORE_REGS are + valid for FP_REGS. Where we have an immediate value that isn't valid + for FP_REGS, and RCLASS is FP_REGS, we return CORE_REGS to cause the + value to be generated into there first and later copied to FP_REGS to be + used. */ + if (rclass == FP_REGS mode == DImode CONST_INT_P (x) + !aarch64_simd_imm_scalar_p (x, GET_MODE (x))) +return CORE_REGS; + return NO_REGS; } diff --git a/gcc/testsuite/gcc.target/aarch64/movdi_1.c b/gcc/testsuite/gcc.target/aarch64/movdi_1.c new file mode 100644 index 000..1decd99 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movdi_1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -fno-inline } */ + +#include arm_neon.h + +void +foo (uint64_t *a) +{ + uint64x1_t val18; + uint32x2_t val19; + uint64x1_t val20; + val19 = vcreate_u32 (0x80004cf3dffbUL); + val20 = vrsra_n_u64 (val18, vreinterpret_u64_u32 (val19), 34); + vst1_u64 (a, val20); +}
[PATCH, AArch64] Support NEG in vector registers for DI and SI mode
Support added for scalar NEG instruction in vector registers. Execution testcase included. Tested on usual GCC Linux regressions. OK for trunk? Cheers, Ian 2013-07-23 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64-simd.md (negmode2): Offer alternative that uses vector registers. testsuite/ * gcc.target/aarch64/neg_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e88e5be..d76056c 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2004,12 +2004,17 @@ ) (define_insn negmode2 - [(set (match_operand:GPI 0 register_operand =r) - (neg:GPI (match_operand:GPI 1 register_operand r)))] + [(set (match_operand:GPI 0 register_operand =r,w) + (neg:GPI (match_operand:GPI 1 register_operand r,w)))] - neg\\t%w0, %w1 + @ + neg\\t%w0, %w1 + neg\\t%rtn0vas, %rtn1vas [(set_attr v8type alu) - (set_attr mode MODE)] + (set_attr simd_type *,simd_negabs) + (set_attr simd *,yes) + (set_attr mode MODE) + (set_attr simd_mode MODE)] ) ;; zero_extend version of above diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md index 8e40c5d..7acbcfd 100644 --- a/gcc/config/aarch64/iterators.md +++ b/gcc/config/aarch64/iterators.md @@ -277,6 +277,12 @@ (V2DI ) (V2SF ) (V4SF ) (V2DF )]) +;; Register Type Name and Vector Arrangement Specifier for when +;; we are doing scalar for DI and SIMD for SI (ignoring all but +;; lane 0). +(define_mode_attr rtn [(DI d) (SI )]) +(define_mode_attr vas [(DI ) (SI .2s)]) + ;; Map a floating point mode to the appropriate register name prefix (define_mode_attr s [(SF s) (DF d)]) diff --git a/gcc/testsuite/gcc.target/aarch64/neg_1.c b/gcc/testsuite/gcc.target/aarch64/neg_1.c new file mode 100644 index 000..04b0fdd --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/neg_1.c @@ -0,0 +1,67 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern void abort (void); + +long long +neg64 (long long a) +{ + /* { dg-final { scan-assembler neg\tx\[0-9\]+ } } */ + return 0 - a; +} + +long long +neg64_in_dreg (long long a) +{ + /* { dg-final { scan-assembler neg\td\[0-9\]+, d\[0-9\]+ } } */ + register long long x asm (d8) = a; + register long long y asm (d9); + asm volatile ( : : w (x)); + y = 0 - x; + asm volatile ( : : w (y)); + return y; +} + +int +neg32 (int a) +{ + /* { dg-final { scan-assembler neg\tw\[0-9\]+ } } */ + return 0 - a; +} + +int +neg32_in_sreg (int a) +{ + /* { dg-final { scan-assembler neg\tv\[0-9\]+\.2s, v\[0-9\]+\.2s } } */ + register int x asm (s8) = a; + register int y asm (s9); + asm volatile ( : : w (x)); + y = 0 - x; + asm volatile ( : : w (y)); + return y; +} + +int +main (void) +{ + long long a; + int b; + a = 61; + b = 313; + + if (neg64 (a) != -61) +abort (); + + if (neg64_in_dreg (a) != -61) +abort (); + + if (neg32 (b) != -313) +abort (); + + if (neg32_in_sreg (b) != -313) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Add vabs_s64 intrinsic
This patch implements the following intrinsic: int64x1_t vabs_s64 (int64x1 a) It uses __builtin_llabs(), which will lead to abs Dn, Dm being generated for this now that my other patch has been committed. Test case added to scalar_intrinsics.c. OK for trunk? Cheers, Ian 2013-07-12 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/arm_neon.h (vabs_s64): New function. testsuite/ * gcc.target/aarch64/scalar_intrinsics.c (test_vabs_s64): Added new test.Index: gcc/config/aarch64/arm_neon.h === --- gcc/config/aarch64/arm_neon.h (revision 200594) +++ gcc/config/aarch64/arm_neon.h (working copy) @@ -17886,6 +17886,12 @@ vabsq_f64 (float64x2_t __a) return __builtin_aarch64_absv2df (__a); } +__extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) +vabs_s64 (int64x1_t a) +{ + return __builtin_llabs (a); +} + /* vadd */ __extension__ static __inline int64x1_t __attribute__ ((__always_inline__)) Index: gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c === --- gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(revision 200594) +++ gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c(working copy) @@ -32,6 +32,18 @@ test_vaddd_s64_2 (int64x1_t a, int64x1_t vqaddd_s64 (a, d)); } +/* { dg-final { scan-assembler-times \\tabs\\td\[0-9\]+, d\[0-9\]+ 1 } } */ + +int64x1_t +test_vabs_s64 (int64x1_t a) +{ + uint64x1_t res; + force_simd (a); + res = vabs_s64 (a); + force_simd (res); + return res; +} + /* { dg-final { scan-assembler-times \\tcmeq\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+ 1 } } */ uint64x1_t
[PATCH, AArch64] Support abs standard pattern for DI mode
Hi, I'm adding support for abs standard pattern name for DI mode, via the ABS instruction in FP registers and the EOR/SUB combo in GP registers. Regression tests for Linux and bare-metal all passed. OK for trunk? Cheers, Ian 2013-06-25 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64-simd.md (absdi2): Support abs for DI mode. testsuite/ * gcc.target/aarch64/abs_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e88e5be..3700977 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2003,6 +2003,38 @@ (set_attr mode SI)] ) +(define_insn_and_split absdi2 + [(set (match_operand:DI 0 register_operand =r,w) + (abs:DI (match_operand:DI 1 register_operand r,w))) + (clobber (match_scratch:DI 2 =r,X))] + + @ + # + abs\\t%d0, %d1 + reload_completed +GP_REGNUM_P (REGNO (operands[0])) +GP_REGNUM_P (REGNO (operands[1])) + [(const_int 0)] + { +emit_insn (gen_rtx_SET (VOIDmode, operands[2], + gen_rtx_XOR (DImode, +gen_rtx_ASHIFTRT (DImode, + operands[1], + GEN_INT (63)), +operands[1]))); +emit_insn (gen_rtx_SET (VOIDmode, + operands[0], + gen_rtx_MINUS (DImode, + operands[2], + gen_rtx_ASHIFTRT (DImode, +operands[1], +GEN_INT (63); +DONE; + } + [(set_attr v8type alu) + (set_attr mode DI)] +) + (define_insn negmode2 [(set (match_operand:GPI 0 register_operand =r) (neg:GPI (match_operand:GPI 1 register_operand r)))] diff --git a/gcc/testsuite/gcc.target/aarch64/abs_1.c b/gcc/testsuite/gcc.target/aarch64/abs_1.c new file mode 100644 index 000..938bc84 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/abs_1.c @@ -0,0 +1,53 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fno-inline --save-temps } */ + +extern long long llabs (long long); +extern void abort (void); + +long long +abs64 (long long a) +{ + /* { dg-final { scan-assembler eor\t } } */ + /* { dg-final { scan-assembler sub\t } } */ + return llabs (a); +} + +long long +abs64_in_dreg (long long a) +{ + /* { dg-final { scan-assembler abs\td\[0-9\]+, d\[0-9\]+ } } */ + register long long x asm (d8) = a; + register long long y asm (d9); + asm volatile ( : : w (x)); + y = llabs (x); + asm volatile ( : : w (y)); + return y; +} + +int +main (void) +{ + volatile long long ll0 = 0LL, ll1 = 1LL, llm1 = -1LL; + + if (abs64 (ll0) != 0LL) +abort (); + + if (abs64 (ll1) != 1LL) +abort (); + + if (abs64 (llm1) != 1LL) +abort (); + + if (abs64_in_dreg (ll0) != 0LL) +abort (); + + if (abs64_in_dreg (ll1) != 1LL) +abort (); + + if (abs64_in_dreg (llm1) != 1LL) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Update insv_1.c test for Big Endian
Hi, The insv_1.c test case I added recently was not compatible with big endian. I attempted to fix with #ifdefs but dejagnu thinks all dg directives in a file, regardless of #ifdefs, are applicable, so I had to instead make a new test and add a new effective target to show when each test is supported. I've tested these two tests on little and big. All was OK. OK for trunk? Cheers, Ian 2013-06-24 Ian Bolton ian.bol...@arm.com * gcc.target/config/aarch64/insv_1.c: Update to show it doesn't work on big endian. * gcc.target/config/aarch64/insv_2.c: New test for big endian. * lib/target-supports.exp: Define aarch64_little_endian.diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c index bc8928d..6e3c7f0 100644 --- a/gcc/testsuite/gcc.target/aarch64/insv_1.c +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -1,5 +1,6 @@ -/* { dg-do run } */ +/* { dg-do run { target aarch64*-*-* } } */ /* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target aarch64_little_endian } */ extern void abort (void); diff --git a/gcc/testsuite/gcc.target/aarch64/insv_2.c b/gcc/testsuite/gcc.target/aarch64/insv_2.c new file mode 100644 index 000..a7691a3 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_2.c @@ -0,0 +1,85 @@ +/* { dg-do run { target aarch64*-*-* } } */ +/* { dg-options -O2 --save-temps -fno-inline } */ +/* { dg-require-effective-target aarch64_big_endian } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 56, 8 } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 43, 5 } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 16 } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 272678883688448 } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -272678883688449 } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp index a80078a..aca4215 100644 --- a/gcc/testsuite/lib/target-supports.exp +++ b/gcc/testsuite/lib/target-supports.exp @@ -2105,6 +2105,15 @@ proc check_effective_target_aarch64_big_endian { } { }] } +# Return 1 if this is a AArch64 target supporting little endian +proc check_effective_target_aarch64_little_endian { } { +return [check_no_compiler_messages aarch64_little_endian assembly { +#if !defined(__aarch64__) || defined(__AARCH64EB__) +#error FOO +#endif +}] +} + # Return 1 is this is an arm target using 32-bit instructions proc check_effective_target_arm32 { } { return [check_no_compiler_messages arm32 assembly {
[AArch64, PATCH 1/5] Improve MOVI handling (Change interface of aarch64_simd_valid_immediate)
(This patch is the first of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is pretty simple - just altering an interface, so we can later remove an unnecessary wrapper function. OK for trunk? Cheers, Ian 13-06-03 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Change return type to bool for prototype. (aarch64_legitimate_constant_p): Check for true instead of not -1. (aarch64_simd_valid_immediate): Fix up each return to return a bool. (aarch64_simd_immediate_valid_for_move): Update retval for bool.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12a7055..05ff5fa 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -103,7 +103,7 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode, static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_override_options_after_change (void); -static int aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, +static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, int *, unsigned char *, int *, int *); static bool aarch64_vector_mode_supported_p (enum machine_mode); static unsigned bit_count (unsigned HOST_WIDE_INT); @@ -5153,7 +5153,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, rtx x) we now decompose CONST_INTs according to expand_mov_immediate. */ if ((GET_CODE (x) == CONST_VECTOR aarch64_simd_valid_immediate (x, mode, false, - NULL, NULL, NULL, NULL, NULL) != -1) + NULL, NULL, NULL, NULL, NULL)) || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x)) return !targetm.cannot_force_const_mem (mode, x); @@ -6144,11 +6144,8 @@ aarch64_vect_float_const_representable_p (rtx x) return aarch64_float_const_representable_p (x0); } -/* TODO: This function returns values similar to those - returned by neon_valid_immediate in gcc/config/arm/arm.c - but the API here is different enough that these magic numbers - are not used. It should be sufficient to return true or false. */ -static int +/* Return true for valid and false for invalid. */ +static bool aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, rtx *modconst, int *elementwidth, unsigned char *elementchar, @@ -6184,24 +6181,21 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, if (!(simd_imm_zero || aarch64_vect_float_const_representable_p (op))) - return -1; + return false; - if (modconst) - *modconst = CONST_VECTOR_ELT (op, 0); + if (modconst) + *modconst = CONST_VECTOR_ELT (op, 0); - if (elementwidth) - *elementwidth = elem_width; + if (elementwidth) + *elementwidth = elem_width; - if (elementchar) - *elementchar = sizetochar (elem_width); + if (elementchar) + *elementchar = sizetochar (elem_width); - if (shift) - *shift = 0; + if (shift) + *shift = 0; - if (simd_imm_zero) - return 19; - else - return 18; + return true; } /* Splat vector constant out into a byte vector. */ @@ -6299,7 +6293,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, if (immtype == -1 || (immtype = 12 immtype = 15) || immtype == 18) -return -1; +return false; if (elementwidth) @@ -6351,7 +6345,7 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, } } - return immtype; + return (immtype = 0); #undef CHECK } @@ -6369,11 +6363,11 @@ aarch64_simd_immediate_valid_for_move (rtx op, enum machine_mode mode, int tmpwidth; unsigned char tmpwidthc; int tmpmvn = 0, tmpshift = 0; - int retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst, + bool retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst, tmpwidth, tmpwidthc, tmpmvn, tmpshift); - if (retval == -1) + if (!retval) return 0; if (modconst)
[AArch64, PATCH 3/5] Improve MOVI handling (Don't update RTX operand in-place)
(This patch is the third of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is focused on cleaning up aarch64_simd_valid_immediate, with better use of arguments and no in-place modification of RTX operands. Specifically, I've changed the set of pointers that are passed in (it's now a struct) and the caller prints out the immediate value directly instead of letting operand[1] get fudged. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (simd_immediate_info): Struct to hold information completed by aarch64_simd_valid_immediate. (aarch64_legitimate_constant_p): Update arguments. (aarch64_simd_valid_immediate): Work with struct rather than many pointers. (aarch64_simd_scalar_immediate_valid_for_move): Update arguments. (aarch64_simd_make_constant): Update arguments. (aarch64_output_simd_mov_immediate): Work with struct rather than many pointers. Output immediate directly rather than as operand. * config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate): Update prototype. * config/aarch64/constraints.md (Dn): Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d1de14e..083ce91 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -156,8 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode); bool aarch64_simd_imm_zero_p (rtx, enum machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode); bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool); -bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, - int *, unsigned char *, int *, int *); +bool aarch64_simd_valid_immediate (rtx, enum machine_mode, bool, + struct simd_immediate_info *); bool aarch64_symbolic_address_p (rtx); bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context, enum aarch64_symbol_type *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 5f97efe..d83e645 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -87,6 +87,14 @@ struct aarch64_address_info { enum aarch64_symbol_type symbol_type; }; +struct simd_immediate_info { + rtx value; + int shift; + int element_width; + unsigned char element_char; + bool mvn; +}; + /* The current code model. */ enum aarch64_code_model aarch64_cmodel; @@ -5150,8 +5158,7 @@ aarch64_legitimate_constant_p (enum machine_mode mode, rtx x) /* This could probably go away because we now decompose CONST_INTs according to expand_mov_immediate. */ if ((GET_CODE (x) == CONST_VECTOR -aarch64_simd_valid_immediate (x, mode, false, - NULL, NULL, NULL, NULL, NULL)) +aarch64_simd_valid_immediate (x, mode, false, NULL)) || CONST_INT_P (x) || aarch64_valid_floating_const (mode, x)) return !targetm.cannot_force_const_mem (mode, x); @@ -6144,10 +6151,8 @@ aarch64_vect_float_const_representable_p (rtx x) /* Return true for valid and false for invalid. */ bool -aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, - rtx *modconst, int *elementwidth, - unsigned char *elementchar, - int *mvn, int *shift) +aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, + struct simd_immediate_info *info) { #define CHECK(STRIDE, ELSIZE, CLASS, TEST, SHIFT, NEG) \ matches = 1; \ @@ -6181,17 +6186,14 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, || aarch64_vect_float_const_representable_p (op))) return false; - if (modconst) - *modconst = CONST_VECTOR_ELT (op, 0); - - if (elementwidth) - *elementwidth = elem_width; - - if (elementchar) - *elementchar = sizetochar (elem_width); - - if (shift) - *shift = 0; + if (info) + { + info-value = CONST_VECTOR_ELT (op, 0); + info-element_width = elem_width; + info-element_char = sizetochar (elem_width); + info-mvn = false; + info-shift = 0; + } return true; } @@ -6293,21 +6295,13 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, || immtype == 18) return false; - - if (elementwidth) -*elementwidth = elsize; - - if (elementchar) -*elementchar = elchar; - - if (mvn) -*mvn = emvn; - - if (shift) -*shift = eshift; - - if (modconst) + if (info) { + info-element_width = elsize
[AArch64, PATCH 2/5] Improve MOVI handling (Remove wrapper function)
(This patch is the second of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) This one is also very simple - removing a wrapper function that was an unnecessary level of indirection. OK for trunk? Cheers, Ian 13-06-03 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): No longer static. (aarch64_simd_immediate_valid_for_move): Remove. (aarch64_simd_scalar_immediate_valid_for_move): Update call. (aarch64_simd_make_constant): Update call. (aarch64_output_simd_mov_immediate): Update call. * config/aarch64/aarch64-protos.h (aarch64_simd_valid_immediate): Add prototype. * config/aarch64/constraints.md (Dn): Update call.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 91fcde8..d1de14e 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -156,6 +156,8 @@ bool aarch64_simd_imm_scalar_p (rtx x, enum machine_mode mode); bool aarch64_simd_imm_zero_p (rtx, enum machine_mode); bool aarch64_simd_scalar_immediate_valid_for_move (rtx, enum machine_mode); bool aarch64_simd_shift_imm_p (rtx, enum machine_mode, bool); +bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, + int *, unsigned char *, int *, int *); bool aarch64_symbolic_address_p (rtx); bool aarch64_symbolic_constant_p (rtx, enum aarch64_symbol_context, enum aarch64_symbol_type *); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 05ff5fa..aec59b0 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -103,8 +103,6 @@ static bool aarch64_vfp_is_call_or_return_candidate (enum machine_mode, static void aarch64_elf_asm_constructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_elf_asm_destructor (rtx, int) ATTRIBUTE_UNUSED; static void aarch64_override_options_after_change (void); -static bool aarch64_simd_valid_immediate (rtx, enum machine_mode, int, rtx *, -int *, unsigned char *, int *, int *); static bool aarch64_vector_mode_supported_p (enum machine_mode); static unsigned bit_count (unsigned HOST_WIDE_INT); static bool aarch64_const_vec_all_same_int_p (rtx, @@ -6145,7 +6143,7 @@ aarch64_vect_float_const_representable_p (rtx x) } /* Return true for valid and false for invalid. */ -static bool +bool aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, rtx *modconst, int *elementwidth, unsigned char *elementchar, @@ -6349,45 +6347,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, int inverse, #undef CHECK } -/* Return TRUE if rtx X is legal for use as either a AdvSIMD MOVI instruction - (or, implicitly, MVNI) immediate. Write back width per element - to *ELEMENTWIDTH, and a modified constant (whatever should be output - for a MOVI instruction) in *MODCONST. */ -int -aarch64_simd_immediate_valid_for_move (rtx op, enum machine_mode mode, - rtx *modconst, int *elementwidth, - unsigned char *elementchar, - int *mvn, int *shift) -{ - rtx tmpconst; - int tmpwidth; - unsigned char tmpwidthc; - int tmpmvn = 0, tmpshift = 0; - bool retval = aarch64_simd_valid_immediate (op, mode, 0, tmpconst, -tmpwidth, tmpwidthc, -tmpmvn, tmpshift); - - if (!retval) -return 0; - - if (modconst) -*modconst = tmpconst; - - if (elementwidth) -*elementwidth = tmpwidth; - - if (elementchar) -*elementchar = tmpwidthc; - - if (mvn) -*mvn = tmpmvn; - - if (shift) -*shift = tmpshift; - - return 1; -} - static bool aarch64_const_vec_all_same_int_p (rtx x, HOST_WIDE_INT minval, @@ -6492,9 +6451,8 @@ aarch64_simd_scalar_immediate_valid_for_move (rtx op, enum machine_mode mode) gcc_assert (!VECTOR_MODE_P (mode)); vmode = aarch64_preferred_simd_mode (mode); rtx op_v = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (op)); - int retval = aarch64_simd_immediate_valid_for_move (op_v, vmode, 0, - NULL, NULL, NULL, NULL); - return retval; + return aarch64_simd_valid_immediate (op_v, vmode, 0, NULL, + NULL, NULL, NULL, NULL); } /* Construct and return a PARALLEL RTX vector. */ @@ -6722,8 +6680,8 @@ aarch64_simd_make_constant (rtx vals) gcc_unreachable (); if (const_vec != NULL_RTX - aarch64_simd_immediate_valid_for_move (const_vec, mode, NULL, NULL, - NULL, NULL, NULL
[AArch64, PATCH 4/5] Improve MOVI handling (Other minor clean-up)
(This patch is the fourth of five, where the first 4 do some clean-up and the last fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) I think the changelog says it all here. Nothing major, just tidying up. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (simd_immediate_info): Remove element_char member. (sizetochar): Return signed char. (aarch64_simd_valid_immediate): Remove elchar and other unnecessary variables. (aarch64_output_simd_mov_immediate): Take rtx instead of rtx. Calculate element_char as required. * config/aarch64/aarch64-protos.h: Update and move prototype for aarch64_output_simd_mov_immediate. * config/aarch64/aarch64-simd.md (*aarch64_simd_movmode): Update arguments.diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index 083ce91..d21a2f5 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode); bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context, enum machine_mode); +char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned); bool aarch64_pad_arg_upward (enum machine_mode, const_tree); bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool); bool aarch64_regno_ok_for_base_p (int, bool); @@ -258,6 +259,4 @@ extern void aarch64_split_combinev16qi (rtx operands[3]); extern void aarch64_expand_vec_perm (rtx target, rtx op0, rtx op1, rtx sel); extern bool aarch64_expand_vec_perm_const (rtx target, rtx op0, rtx op1, rtx sel); - -char* aarch64_output_simd_mov_immediate (rtx *, enum machine_mode, unsigned); #endif /* GCC_AARCH64_PROTOS_H */ diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md index 04fbdbd..e5990d4 100644 --- a/gcc/config/aarch64/aarch64-simd.md +++ b/gcc/config/aarch64/aarch64-simd.md @@ -409,7 +409,7 @@ case 4: return ins\t%0.d[0], %1; case 5: return mov\t%0, %1; case 6: - return aarch64_output_simd_mov_immediate (operands[1], + return aarch64_output_simd_mov_immediate (operands[1], MODEmode, 64); default: gcc_unreachable (); } @@ -440,7 +440,7 @@ case 5: return #; case 6: - return aarch64_output_simd_mov_immediate (operands[1], MODEmode, 128); + return aarch64_output_simd_mov_immediate (operands[1], MODEmode, 128); default: gcc_unreachable (); } diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index d83e645..001f9c5 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -91,7 +91,6 @@ struct simd_immediate_info { rtx value; int shift; int element_width; - unsigned char element_char; bool mvn; }; @@ -6102,7 +6101,7 @@ aarch64_mangle_type (const_tree type) } /* Return the equivalent letter for size. */ -static unsigned char +static char sizetochar (int size) { switch (size) @@ -6163,7 +6162,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, { \ immtype = (CLASS); \ elsize = (ELSIZE); \ - elchar = sizetochar (elsize);\ eshift = (SHIFT);\ emvn = (NEG);\ break; \ @@ -6172,25 +6170,20 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse, unsigned int i, elsize = 0, idx = 0, n_elts = CONST_VECTOR_NUNITS (op); unsigned int innersize = GET_MODE_SIZE (GET_MODE_INNER (mode)); unsigned char bytes[16]; - unsigned char elchar = 0; int immtype = -1, matches; unsigned int invmask = inverse ? 0xff : 0; int eshift, emvn; if (GET_MODE_CLASS (mode) == MODE_VECTOR_FLOAT) { - bool simd_imm_zero = aarch64_simd_imm_zero_p (op, mode); - int elem_width = GET_MODE_BITSIZE (GET_MODE (CONST_VECTOR_ELT (op, 0))); - - if (!(simd_imm_zero - || aarch64_vect_float_const_representable_p (op))) + if (! (aarch64_simd_imm_zero_p (op, mode) +|| aarch64_vect_float_const_representable_p (op))) return false; if (info) { info-value = CONST_VECTOR_ELT (op, 0); - info-element_width = elem_width; - info-element_char = sizetochar (elem_width); + info-element_width = GET_MODE_BITSIZE (GET_MODE (info-value)); info-mvn = false; info-shift = 0; } @@ -6298,7 +6291,6 @@ aarch64_simd_valid_immediate (rtx op, enum machine_mode mode, bool inverse
[AArch64, PATCH 5/5] Improve MOVI handling (Fix invalid assembler bug)
(This patch is the last of five, where the first 4 did some clean-up and this one fixes a bug with scalar MOVI. The bug fix without the clean-up was particularly ugly!) GCC will currently generator invalid assembler for MOVI, if the value in question needs to be shifted. For example: prog.s:270: Error: immediate value out of range -128 to 255 at operand 2 -- `movi v16.4h,1024' The correct assembler for the example should be: movi v16.4h, 0x4, lsl 8 The fix involves calling into a function to output the instruction, rather than just leaving for aarch64_print_operand, as is done for vector immediates. Regression runs have passed for Linux and bare-metal. OK for trunk? Cheers, Ian 2013-06-03 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*movmode_aarch64): Call into function to generate MOVI instruction. * config/aarch64/aarch64.c (aarch64_simd_container_mode): New function. (aarch64_preferred_simd_mode): Turn into wrapper. (aarch64_output_scalar_simd_mov_immediate): New function. * config/aarch64/aarch64-protos.h: Add prototype for above. testsuite/ * gcc.target/aarch64/movi_1.c: New test. diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h index d21a2f5..0dface1 100644 --- a/gcc/config/aarch64/aarch64-protos.h +++ b/gcc/config/aarch64/aarch64-protos.h @@ -148,6 +148,7 @@ bool aarch64_legitimate_pic_operand_p (rtx); bool aarch64_move_imm (HOST_WIDE_INT, enum machine_mode); bool aarch64_mov_operand_p (rtx, enum aarch64_symbol_context, enum machine_mode); +char *aarch64_output_scalar_simd_mov_immediate (rtx, enum machine_mode); char *aarch64_output_simd_mov_immediate (rtx, enum machine_mode, unsigned); bool aarch64_pad_arg_upward (enum machine_mode, const_tree); bool aarch64_pad_reg_upward (enum machine_mode, const_tree, bool); diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 001f9c5..0ea05d8 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -5988,32 +5988,57 @@ aarch64_vector_mode_supported_p (enum machine_mode mode) return false; } -/* Return quad mode as the preferred SIMD mode. */ +/* Return appropriate SIMD container + for MODE within a vector of WIDTH bits. */ static enum machine_mode -aarch64_preferred_simd_mode (enum machine_mode mode) +aarch64_simd_container_mode (enum machine_mode mode, unsigned width) { + gcc_assert (width == 64 || width == 128); if (TARGET_SIMD) -switch (mode) - { - case DFmode: -return V2DFmode; - case SFmode: -return V4SFmode; - case SImode: -return V4SImode; - case HImode: -return V8HImode; - case QImode: -return V16QImode; - case DImode: - return V2DImode; -break; - - default:; - } +{ + if (width == 128) + switch (mode) + { + case DFmode: + return V2DFmode; + case SFmode: + return V4SFmode; + case SImode: + return V4SImode; + case HImode: + return V8HImode; + case QImode: + return V16QImode; + case DImode: + return V2DImode; + default: + break; + } + else + switch (mode) + { + case SFmode: + return V2SFmode; + case SImode: + return V2SImode; + case HImode: + return V4HImode; + case QImode: + return V8QImode; + default: + break; + } +} return word_mode; } +/* Return 128-bit container as the preferred SIMD mode for MODE. */ +static enum machine_mode +aarch64_preferred_simd_mode (enum machine_mode mode) +{ + return aarch64_simd_container_mode (mode, 128); +} + /* Return the bitmask of possible vector sizes for the vectorizer to iterate over. */ static unsigned int @@ -7280,6 +7305,18 @@ aarch64_output_simd_mov_immediate (rtx const_vector, return templ; } +char* +aarch64_output_scalar_simd_mov_immediate (rtx immediate, + enum machine_mode mode) +{ + enum machine_mode vmode; + + gcc_assert (!VECTOR_MODE_P (mode)); + vmode = aarch64_simd_container_mode (mode, 64); + rtx v_op = aarch64_simd_gen_const_vector_dup (vmode, INTVAL (immediate)); + return aarch64_output_simd_mov_immediate (v_op, vmode, 64); +} + /* Split operands into moves from op[1] + op[2] into op[0]. */ void diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e1ec48f..458239e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -774,17 +774,34 @@ (match_operand:SHORT 1 general_operand r,M,Dhq,m, m,rZ,*w,*w, r,*w))] (register_operand (operands[0], MODEmode) || aarch64_reg_or_zero (operands[1], MODEmode)) - @ - mov\\t%w0, %w1
RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern
On 05/20/2013 11:55 AM, Ian Bolton wrote: I improved this patch during the work I did on the recent insv_imm patch (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html). Thanks, you cleaned up almost everything on which I would have commented with the previous patch revision. The only thing left is: + else if (!register_operand (value, MODEmode)) +operands[3] = force_reg (MODEmode, value); Checking register_operand before force_reg is unnecessary; you're not saving a function call, and force_reg will itself perform the register check. Thanks for the review, Richard. Latest patch is attached, which fixes this. Linux and bare-metal regression runs successful. OK for trunk? Cheers, Ian 2013-05-30 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_regmode): New define_insn. testsuite/ * gcc.target/aarch64/insv_1.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 2bdbfa9..89db092 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3163,6 +3163,50 @@ (set_attr mode MODE)] ) +;; Bitfield Insert (insv) +(define_expand insvmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand) + (match_operand 1 const_int_operand) + (match_operand 2 const_int_operand)) + (match_operand:GPI 3 general_operand))] + +{ + unsigned HOST_WIDE_INT width = UINTVAL (operands[1]); + unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]); + rtx value = operands[3]; + + if (width == 0 || (pos + width) GET_MODE_BITSIZE (MODEmode)) +FAIL; + + if (CONST_INT_P (value)) +{ + unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 width) - 1; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if ((UINTVAL (value) mask) == 0 + || (UINTVAL (value) mask) == mask) + FAIL; + + /* 16-bit aligned 16-bit wide insert is handled by insv_imm. */ + if (width == 16 (pos % 16) == 0) + DONE; +} + operands[3] = force_reg (MODEmode, value); +}) + +(define_insn *insv_regmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r) + (match_operand 1 const_int_operand n) + (match_operand 2 const_int_operand n)) + (match_operand:GPI 3 register_operand r))] + !(UINTVAL (operands[1]) == 0 + || (UINTVAL (operands[2]) + UINTVAL (operands[1]) + GET_MODE_BITSIZE (MODEmode))) + bfi\\t%w0, %w3, %2, %1 + [(set_attr v8type bfm) + (set_attr mode MODE)] +) + (define_insn *optabALLX:mode_shft_GPI:mode [(set (match_operand:GPI 0 register_operand =r) (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c new file mode 100644 index 000..bc8928d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -0,0 +1,84 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8 } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5 } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 32 } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 2031616 } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -2031617 } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Fix invalid assembler in scalar_intrinsics.c test
The test file scalar_intrinsics.c (in gcc.target/aarch64) is currently compile-only. If you attempt to make it run, as opposed to just generate assembler, you can't because it won't assemble. There are two issues causing trouble here: 1) Use of invalid instruction mov d0, d1. It should be mov d0, v1.d[0]. 2) The vdupd_lane_s64 and vdupd_lane_u64 calls are being given a lane that is out of range, which causes invalid assembler output. This patch fixes both, so that we can build on this to make executable test cases for scalar intrinsics. OK for trunk? Cheers, Ian 2013-05-22 Ian Bolton ian.bol...@arm.com testsuite/ * gcc.target/aarch64/scalar_intrinsics.c (force_simd): Use a valid instruction. (test_vdupd_lane_s64): Pass a valid lane argument. (test_vdupd_lane_u64): Likewise.diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c index 7427c62..16537ce 100644 --- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c +++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c @@ -4,7 +4,7 @@ #include arm_neon.h /* Used to force a variable to a SIMD register. */ -#define force_simd(V1) asm volatile (mov %d0, %d1 \ +#define force_simd(V1) asm volatile (mov %d0, %1.d[0] \ : =w(V1) \ : w(V1)\ : /* No clobbers */); @@ -228,13 +228,13 @@ test_vdups_lane_u32 (uint32x4_t a) int64x1_t test_vdupd_lane_s64 (int64x2_t a) { - return vdupd_lane_s64 (a, 2); + return vdupd_lane_s64 (a, 1); } uint64x1_t test_vdupd_lane_u64 (uint64x2_t a) { - return vdupd_lane_u64 (a, 2); + return vdupd_lane_u64 (a, 1); } /* { dg-final { scan-assembler-times \\tcmtst\\td\[0-9\]+, d\[0-9\]+, d\[0-9\]+ 2 } } */
RE: [PATCH, AArch64] Support BFI instruction and insv standard pattern
Hi, This patch implements the BFI variant of BFM. In doing so, it also implements the insv standard pattern. I've regression tested on bare-metal and linux. It comes complete with its own compilation and execution testcase. OK for trunk? Cheers, Ian 2013-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_regmode): New define_insn. testsuite/ * gcc.target/aarch64/bfm_1.c: New test. (This patch did not yet get commit approval.) I improved this patch during the work I did on the recent insv_imm patch (http://gcc.gnu.org/ml/gcc-patches/2013-05/msg01007.html). I also renamed the testcase. Regression testing completed successfully. OK for trunk? Cheers, Ian 2013-05-20 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_regmode): New define_insn. testsuite/ * gcc.target/aarch64/insv_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b27bcda..e5d6950 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3164,6 +3164,52 @@ (set_attr mode MODE)] ) +;; Bitfield Insert (insv) +(define_expand insvmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand) + (match_operand 1 const_int_operand) + (match_operand 2 const_int_operand)) + (match_operand:GPI 3 general_operand))] + +{ + unsigned HOST_WIDE_INT width = UINTVAL (operands[1]); + unsigned HOST_WIDE_INT pos = UINTVAL (operands[2]); + rtx value = operands[3]; + + if (width == 0 || (pos + width) GET_MODE_BITSIZE (MODEmode)) +FAIL; + + if (CONST_INT_P (value)) +{ + unsigned HOST_WIDE_INT mask = ((unsigned HOST_WIDE_INT)1 width) - 1; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if ((UINTVAL (value) mask) == 0 + || (UINTVAL (value) mask) == mask) + FAIL; + + /* Force the constant into a register, unless this is a 16-bit aligned +16-bit wide insert, which is handled by insv_imm. */ + if (width != 16 || (pos % 16) != 0) + operands[3] = force_reg (MODEmode, value); +} + else if (!register_operand (value, MODEmode)) +operands[3] = force_reg (MODEmode, value); +}) + +(define_insn *insv_regmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r) + (match_operand 1 const_int_operand n) + (match_operand 2 const_int_operand n)) + (match_operand:GPI 3 register_operand r))] + !(UINTVAL (operands[1]) == 0 + || (UINTVAL (operands[2]) + UINTVAL (operands[1]) + GET_MODE_BITSIZE (MODEmode))) + bfi\\t%w0, %w3, %2, %1 + [(set_attr v8type bfm) + (set_attr mode MODE)] +) + (define_insn *optabALLX:mode_shft_GPI:mode [(set (match_operand:GPI 0 register_operand =r) (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/insv_1.c b/gcc/testsuite/gcc.target/aarch64/insv_1.c new file mode 100644 index 000..0977e15 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/insv_1.c @@ -0,0 +1,84 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; + unsigned int sixteen: 16; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 0, 8 } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tx\[0-9\]+, x\[0-9\]+, 16, 5 } } */ + a.five = 7; + return a; +} + +bitfield +movk (bitfield a) +{ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x1d6b, lsl 32 } } */ + a.sixteen = 7531; + return a; +} + +bitfield +set1 (bitfield a) +{ + /* { dg-final { scan-assembler orr\tx\[0-9\]+, x\[0-9\]+, 2031616 } } */ + a.five = 0x1f; + return a; +} + +bitfield +set0 (bitfield a) +{ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -2031617 } } */ + a.five = 0; + return a; +} + + +int +main (int argc, char** argv) +{ + static bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + bitfield d = movk (c); + + if (d.eight != 3) +abort (); + + if (d.five != 7) +abort (); + + if (d.sixteen != 7531) +abort (); + + d = set1 (d); + if (d.five != 0x1f) +abort (); + + d = set0 (d); + if (d.five != 0) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Allow insv_imm to handle bigger immediates via masking to 16-bits
The MOVK instruction is currently not used when operand 2 is more than 16 bits, which leads to sub-optimal code. This patch improves those situations by removing the check and instead masking down to 16 bits within the new X format specifier I added recently. OK for trunk? Cheers, Ian 2013-05-17 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_print_operand): Change the X format specifier to only display bottom 16 bits. * config/aarch64/aarch64.md (insv_immmode): Allow any-sized immediate to match for operand 2, since it will be masked.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index b57416c..1bdfd85 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3424,13 +3424,13 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'X': - /* Print integer constant in hex. */ + /* Print bottom 16 bits of integer constant in hex. */ if (GET_CODE (x) != CONST_INT) { output_operand_lossage (invalid operand for '%%%c', code); return; } - asm_fprintf (f, 0x%wx, UINTVAL (x)); + asm_fprintf (f, 0x%wx, UINTVAL (x) 0x); break; case 'w': diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b27bcda..403d717 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -858,9 +858,8 @@ (const_int 16) (match_operand:GPI 1 const_int_operand n)) (match_operand:GPI 2 const_int_operand n))] - INTVAL (operands[1]) GET_MODE_BITSIZE (MODEmode) -INTVAL (operands[1]) % 16 == 0 -UINTVAL (operands[2]) = 0x + UINTVAL (operands[1]) GET_MODE_BITSIZE (MODEmode) +UINTVAL (operands[1]) % 16 == 0 movk\\t%w0, %X2, lsl %1 [(set_attr v8type movk) (set_attr mode MODE)]
[PATCH, AArch64] Support BFI instruction and insv standard pattern
Hi, This patch implements the BFI variant of BFM. In doing so, it also implements the insv standard pattern. I've regression tested on bare-metal and linux. It comes complete with its own compilation and execution testcase. OK for trunk? Cheers, Ian 2013-05-08 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (insv): New define_expand. (*insv_regmode): New define_insn. testsuite/ * gcc.target/aarch64/bfm_1.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 330f78c..b730ed0 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -3118,6 +3118,53 @@ (set_attr mode MODE)] ) +;; Bitfield Insert (insv) +(define_expand insvmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand) + (match_operand 1 const_int_operand) + (match_operand 2 const_int_operand)) + (match_operand:GPI 3 general_operand))] + +{ + HOST_WIDE_INT mask = ((HOST_WIDE_INT)1 INTVAL (operands[1])) - 1; + + if (GET_MODE_BITSIZE (MODEmode) BITS_PER_WORD + || INTVAL (operands[1]) 1 + || INTVAL (operands[1]) = GET_MODE_BITSIZE (MODEmode) + || INTVAL (operands[2]) 0 + || (INTVAL (operands[2]) + INTVAL (operands[1])) + GET_MODE_BITSIZE (MODEmode)) +FAIL; + + /* Prefer AND/OR for inserting all zeros or all ones. */ + if (CONST_INT_P (operands[3]) + ((INTVAL (operands[3]) mask) == 0 + || (INTVAL (operands[3]) mask) == mask)) +FAIL; + + if (!register_operand (operands[3], MODEmode)) +operands[3] = force_reg (MODEmode, operands[3]); + + /* Intentional fall-through, which will lead to below pattern + being matched. */ +}) + +(define_insn *insv_regmode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r) + (match_operand 1 const_int_operand n) + (match_operand 2 const_int_operand n)) + (match_operand:GPI 3 register_operand r))] + !(GET_MODE_BITSIZE (MODEmode) BITS_PER_WORD + || INTVAL (operands[1]) 1 + || INTVAL (operands[1]) = GET_MODE_BITSIZE (MODEmode) + || INTVAL (operands[2]) 0 + || (INTVAL (operands[2]) + INTVAL (operands[1])) + GET_MODE_BITSIZE (MODEmode)) + bfi\\t%w0, %w3, %2, %1 + [(set_attr v8type bfm) + (set_attr mode MODE)] +) + (define_insn *optabALLX:mode_shft_GPI:mode [(set (match_operand:GPI 0 register_operand =r) (ashift:GPI (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/bfm_1.c b/gcc/testsuite/gcc.target/aarch64/bfm_1.c new file mode 100644 index 000..d9a73a1 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/bfm_1.c @@ -0,0 +1,46 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +typedef struct bitfield +{ + unsigned short eight: 8; + unsigned short four: 4; + unsigned short five: 5; + unsigned short seven: 7; +} bitfield; + +bitfield +bfi1 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tw\[0-9\]+, w\[0-9\]+, 0, 8 } } */ + a.eight = 3; + return a; +} + +bitfield +bfi2 (bitfield a) +{ + /* { dg-final { scan-assembler bfi\tw\[0-9\]+, w\[0-9\]+, 16, 5 } } */ + a.five = 7; + return a; +} + +int +main (int argc, char** argv) +{ + bitfield a; + bitfield b = bfi1 (a); + bitfield c = bfi2 (b); + + if (c.eight != 3) +abort (); + + if (c.five != 7) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Testcases for TST instruction
I previously fixed a bug with the patterns that generate TST. I added these testcases to make our regression testing more solid. They've been running on our internal branch for about a month. OK to commit to trunk? Cheers, Ian 2013-05-02 Ian Bolton ian.bol...@arm.com * gcc.target/aarch64/tst_1.c: New test. * gcc.target/aarch64/tst_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/tst_1.c === --- gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/tst_1.c(revision 0) @@ -0,0 +1,150 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +tst_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-times tst\tw\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +tst_si_test2 (int a, int b, int c) +{ + int d = a 0x; + + /* { dg-final { scan-assembler tst\tw\[0-9\]+, -1717986919 } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +tst_si_test3 (int a, int b, int c) +{ + int d = a (b 3); + + /* { dg-final { scan-assembler tst\tw\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d == 0) +return 12; + else +return 18; +} + +typedef long long s64; + +s64 +tst_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a b; + + /* { dg-final { scan-assembler-times tst\tx\[0-9\]+, x\[0-9\]+ 2 } } */ + if (d == 0) +return 12; + else +return 18; +} + +s64 +tst_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a 0xll; + + /* { dg-final { scan-assembler tst\tx\[0-9\]+, -6148914691236517206 } } */ + if (d == 0) +return 12; + else +return 18; +} + +s64 +tst_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a (b 3); + + /* { dg-final { scan-assembler tst\tx\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d == 0) +return 12; + else +return 18; +} + +int +main () +{ + int x; + s64 y; + + x = tst_si_test1 (29, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test1 (5, 2, 20); + if (x != 12) +abort (); + + x = tst_si_test2 (29, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test2 (1024, 2, 20); + if (x != 12) +abort (); + + x = tst_si_test3 (35, 4, 5); + if (x != 18) +abort (); + + x = tst_si_test3 (5, 2, 20); + if (x != 12) +abort (); + + y = tst_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != 18) +abort (); + + y = tst_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 12) +abort (); + + y = tst_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != 18) +abort (); + + y = tst_di_test2 (0x540004100ll, + 0x32004ll, + 0x805050205ll); + if (y != 12) +abort (); + + y = tst_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != 18) +abort (); + + y = tst_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != 12) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/tst_2.c === --- gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/tst_2.c(revision 0) @@ -0,0 +1,156 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +tst_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d = 0) +return 12; + else +return 18; +} + +int +tst_si_test2 (int a, int b, int c) +{ + int d = a 0x; + + /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + if (d = 0) +return 12; + else +return 18; +} + +int +tst_si_test3 (int a, int b, int c) +{ + int d = a (b 3); + + /* { dg-final { scan-assembler-not tst\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d = 0) +return 12; + else +return 18; +} + +typedef long long s64; + +s64 +tst_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a b; + + /* { dg-final { scan-assembler-not tst\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 2 } } */ + if (d = 0) +return 12; + else +return 18; +} + +s64 +tst_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a 0xll
RE: [PATCH, AArch64] Testcases for ANDS instruction
From: Richard Earnshaw This rule + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0- 9\]+ } } */ Will match anything that this rule + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0- 9\]+, lsl 3 } } */ matches (though not vice versa). Similarly for the x register variants. Thanks for the review. I've fixed this up in the attached patch, by counting the number of matches for the first rule and expecting it to match additional times to cover the overlap with the lsl based rule. I've also renamed the testcases in line with the suggested GCC testcase naming convention. OK for commit? Cheers, Ian 2013-05-01 Ian Bolton ian.bol...@arm.com * gcc.target/aarch64/ands_1.c: New test. * gcc.target/aarch64/ands_2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands_1.c === --- gcc/testsuite/gcc.target/aarch64/ands_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands_1.c (revision 0) @@ -0,0 +1,151 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-times ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0xff; + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, 255 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a (b 3); + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a b; + + /* { dg-final { scan-assembler-times ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a 0xff; + + /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, 255 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a (b 3); + + /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort (); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort (); + + x = ands_si_test2 (29, 4, 5); + if (x != 38) +abort (); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort (); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort (); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort (); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll 0xff) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test2 (0x130002900ll, + 0x32004ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll (0x06408ll 3)) + + 0x06408ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/ands_2.c === --- gcc/testsuite/gcc.target/aarch64/ands_2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands_2.c (revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0x; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + /* { dg-final { scan
RE: [PATCH, AArch64] Support BICS instruction in the backend
From: Marcus Shawcroft [mailto:marcus.shawcr...@gmail.com] + /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0- 9\]+ } } */ + /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ Ian, These two patterns have the same issue Richard just highlighted on your other patch, ie the first pattern will also match anything matched by the second pattern. /Marcus I've fixed the rules in the testcases and renamed the files to match naming conventions in the latest patch (attached). OK to commit? Cheers, Ian 2013-05-01 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*and_one_cmplmode3_compare0): New pattern. (*and_one_cmplsi3_compare0_uxtw): Likewise. (*and_one_cmpl_SHIFT:optabmode3_compare0): Likewise. (*and_one_cmpl_SHIFT:optabsi3_compare0_uxtw): Likewise. testsuite/ * gcc.target/aarch64/bics_1.c: New test. * gcc.target/aarch64/bics_2.c: Likewise.
[PATCH, AArch64] Fix for LDR/STR to/from S and D registers
This is a fix for this patch: http://gcc.gnu.org/ml/gcc-patches/2013-04/msg01621.html If someone compiles with -mgeneral-regs-only then those instructions shouldn't be used. We can enforce that by adding the fp attribute to the relevant alternatives in the patterns. Regression tests all good. OK for trunk? Cheers, Ian 2013-05-01 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.md (movsi_aarch64): Only allow to/from S reg when fp attribute set. (movdi_aarch64): Only allow to/from D reg when fp attribute set.Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 198456) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -825,7 +825,7 @@ (define_insn *movsi_aarch64 fmov\\t%s0, %s1 [(set_attr v8type move,alu,load1,load1,store1,store1,fmov,fmov,fmov) (set_attr mode SI) - (set_attr fp *,*,*,*,*,*,yes,yes,yes)] + (set_attr fp *,*,*,yes,*,yes,yes,yes,yes)] ) (define_insn *movdi_aarch64 @@ -850,7 +850,7 @@ (define_insn *movdi_aarch64 movi\\t%d0, %1 [(set_attr v8type move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov) (set_attr mode DI) - (set_attr fp *,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes)] + (set_attr fp *,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,yes)] ) (define_insn insv_immmode
RE: [PATCH, AArch64] Support BICS instruction in the backend
Can we have the patch attached ? OK Index: gcc/testsuite/gcc.target/aarch64/bics_1.c === --- gcc/testsuite/gcc.target/aarch64/bics_1.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics_1.c (revision 0) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-times bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a ~b; + + /* { dg-final { scan-assembler-times bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 2 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a ~(b 3); + + /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, + ~0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll ~(~0x06408ll 3)) + + ~0x06408ll + 0x505050505ll)) +abort (); + + y = bics_di_test2 (0x130002900ll, + ~0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/bics_2.c === --- gcc/testsuite/gcc.target/aarch64/bics_2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics_2.c (revision 0) @@ -0,0 +1,111 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ 2 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a ~b; + + /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + /* { dg-final { scan-assembler-times bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ 2 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a ~(b 3); + + /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, + ~0x06408ll, + 0x505050505ll); +
[PATCH, AArch64] Testcases for ANDS instruction
I made some testcases to go with my implementation of ANDS in the backend, but Naveen Hurugalawadi got the ANDS patterns in before me! I'm now just left with the testcases, but they are still worth adding, so here they are. Tests are working correctly as of current trunk. OK to commit? Cheers, Ian 2013-04-26 Ian Bolton ian.bol...@arm.com * gcc.target/aarch64/ands.c: New test. * gcc.target/aarch64/ands2.c: LikewiseIndex: gcc/testsuite/gcc.target/aarch64/ands2.c === --- gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands2.c(revision 0) @@ -0,0 +1,157 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0x; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a (b 3); + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a b; + + /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a 0xll; + + /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, -6148914691236517206 } } */ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, -6148914691236517206 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a (b 3); + + /* { dg-final { scan-assembler-not ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler and\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort (); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort (); + + x = ands_si_test2 (29, 4, 5); + if (x != 34) +abort (); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort (); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort (); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort (); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll 0xll) + 0x32004ll + 0x505050505ll)) +abort (); + + y = ands_di_test2 (0x540004100ll, + 0x32004ll, + 0x805050205ll); + if (y != (0x540004100ll + 0x805050205ll)) +abort (); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll (0x06408ll 3)) + + 0x06408ll + 0x505050505ll)) +abort (); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/ands.c === --- gcc/testsuite/gcc.target/aarch64/ands.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/ands.c (revision 0) @@ -0,0 +1,151 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0xff
[PATCH, AArch64] Support BICS instruction in the backend
With these patterns, we can now generate BICS in the appropriate places. I've included test cases. This has been run on linux and bare-metal regression tests. OK to commit? Cheers, Ian 2013-04-26 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*and_one_cmplmode3_compare0): New pattern. (*and_one_cmplsi3_compare0_uxtw): Likewise. (*and_one_cmpl_SHIFT:optabmode3_compare0): Likewise. (*and_one_cmpl_SHIFT:optabsi3_compare0_uxtw): Likewise. testsuite/ * gcc.target/aarch64/bics.c: New test. * gcc.target/aarch64/bics2.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/bics.c === --- gcc/testsuite/gcc.target/aarch64/bics.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics.c (revision 0) @@ -0,0 +1,107 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a ~b; + + /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a ~(b 3); + + /* { dg-final { scan-assembler bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20); + if (x != 116) +abort (); + + y = bics_di_test1 (0x13029ll, + ~0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + ~0x32004ll + 0x505050505ll)) +abort (); + + y = bics_di_test1 (0x5000500050005ll, + ~0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort (); + + y = bics_di_test2 (0x13029ll, + ~0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll ~(~0x06408ll 3)) + + ~0x06408ll + 0x505050505ll)) +abort (); + + y = bics_di_test2 (0x130002900ll, + ~0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/bics2.c === --- gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0) +++ gcc/testsuite/gcc.target/aarch64/bics2.c(revision 0) @@ -0,0 +1,111 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +int +bics_si_test1 (int a, int b, int c) +{ + int d = a ~b; + + /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +bics_si_test2 (int a, int b, int c) +{ + int d = a ~(b 3); + + /* { dg-final { scan-assembler-not bics\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler bic\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +bics_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a ~b; + + /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +s64 +bics_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a ~(b 3); + + /* { dg-final { scan-assembler-not bics\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + /* { dg-final { scan-assembler bic\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +main () +{ + int x; + s64 y; + + x = bics_si_test1 (29, ~4, 5); + if (x != ((29 4) + ~4 + 5)) +abort (); + + x = bics_si_test1 (5, ~2, 20); + if (x != 25) +abort (); + + x = bics_si_test2 (35, ~4, 5); + if (x != ((35 ~(~4 3)) + ~4 + 5)) +abort (); + + x = bics_si_test2 (96, ~2, 20
[PATCH, AArch64] Support LDR/STR to/from S and D registers
This patch allows us to load to and store from the S and D registers, which helps with doing scalar operations in those registers. This has been regression tested on bare-metal and linux. OK for trunk? Cheers, Ian 2013-04-26 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.md (movsi_aarch64): Support LDR/STR from/to S register. (movdi_aarch64): Support LDR/STR from/to D register.Index: gcc/config/aarch64/aarch64.md === --- gcc/config/aarch64/aarch64.md (revision 198231) +++ gcc/config/aarch64/aarch64.md (working copy) @@ -808,26 +808,28 @@ (define_expand movmode ) (define_insn *movsi_aarch64 - [(set (match_operand:SI 0 nonimmediate_operand =r,r,r,m, *w, r,*w) - (match_operand:SI 1 aarch64_mov_operand r,M,m,rZ,rZ,*w,*w))] + [(set (match_operand:SI 0 nonimmediate_operand =r,r,r,*w,m, m,*w, r,*w) + (match_operand:SI 1 aarch64_mov_operand r,M,m, m,rZ,*w,rZ,*w,*w))] (register_operand (operands[0], SImode) || aarch64_reg_or_zero (operands[1], SImode)) @ mov\\t%w0, %w1 mov\\t%w0, %1 ldr\\t%w0, %1 + ldr\\t%s0, %1 str\\t%w1, %0 + str\\t%s1, %0 fmov\\t%s0, %w1 fmov\\t%w0, %s1 fmov\\t%s0, %s1 - [(set_attr v8type move,alu,load1,store1,fmov,fmov,fmov) + [(set_attr v8type move,alu,load1,load1,store1,store1,fmov,fmov,fmov) (set_attr mode SI) - (set_attr fp *,*,*,*,yes,yes,yes)] + (set_attr fp *,*,*,*,*,*,yes,yes,yes)] ) (define_insn *movdi_aarch64 - [(set (match_operand:DI 0 nonimmediate_operand =r,k,r,r,r,m, r, r, *w, r,*w,w) - (match_operand:DI 1 aarch64_mov_operand r,r,k,N,m,rZ,Usa,Ush,rZ,*w,*w,Dd))] + [(set (match_operand:DI 0 nonimmediate_operand =r,k,r,r,r,*w,m, m,r, r, *w, r,*w,w) + (match_operand:DI 1 aarch64_mov_operand r,r,k,N,m, m,rZ,*w,Usa,Ush,rZ,*w,*w,Dd))] (register_operand (operands[0], DImode) || aarch64_reg_or_zero (operands[1], DImode)) @ @@ -836,16 +838,18 @@ (define_insn *movdi_aarch64 mov\\t%x0, %1 mov\\t%x0, %1 ldr\\t%x0, %1 + ldr\\t%d0, %1 str\\t%x1, %0 + str\\t%d1, %0 adr\\t%x0, %a1 adrp\\t%x0, %A1 fmov\\t%d0, %x1 fmov\\t%x0, %d1 fmov\\t%d0, %d1 movi\\t%d0, %1 - [(set_attr v8type move,move,move,alu,load1,store1,adr,adr,fmov,fmov,fmov,fmov) + [(set_attr v8type move,move,move,alu,load1,load1,store1,store1,adr,adr,fmov,fmov,fmov,fmov) (set_attr mode DI) - (set_attr fp *,*,*,*,*,*,*,*,yes,yes,yes,yes)] + (set_attr fp *,*,*,*,*,*,*,*,*,*,yes,yes,yes,yes)] ) (define_insn insv_immmode
RE: [PATCH, AArch64] Make MOVK output operand 2 in hex
Since this is a bug fix, I'll need to backport to 4.8. Is that OK? Cheers, Ian OK /Marcus On 20 March 2013 17:21, Ian Bolton ian.bol...@arm.com wrote: MOVK should not be generated with a negative immediate, which the assembler rightfully rejects. This patch makes MOVK output its 2nd operand in hex instead. Tested on bare-metal and linux. OK for trunk? Cheers, Ian 2013-03-20 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_print_operand): New format specifier for printing a constant in hex. * config/aarch64/aarch64.md (insv_immmode): Use the X format specifier for printing second operand. testsuite/ * gcc.target/aarch64/movk.c: New test.
[PATCH, AArch64] Enable Redundant Extension Elimination by default at 02 or higher
This patch enables Redundant Extension Elimination pass for AArch64. Testing shows no regressions on linux and bare-metal. In terms of performance impact, it reduces code-size for some benchmarks and makes no difference on others. OK to commit to trunk? Cheers, Ian 2013-04-24 Ian Bolton ian.bol...@arm.com * common/config/aarch64/aarch64-common.c: Enable REE pass at O2 or higher by default. Index: gcc/common/config/aarch64/aarch64-common.c === --- gcc/common/config/aarch64/aarch64-common.c (revision 198231) +++ gcc/common/config/aarch64/aarch64-common.c (working copy) @@ -44,6 +44,8 @@ static const struct default_options aarc { /* Enable section anchors by default at -O1 or higher. */ { OPT_LEVELS_1_PLUS, OPT_fsection_anchors, NULL, 1 }, +/* Enable redundant extension instructions removal at -O2 and higher. */ +{ OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, { OPT_LEVELS_NONE, 0, NULL, 0 } };
[PATCH AArch64] Make omit-frame-pointer work correctly
Currently, if you compile with -fomit-frame-pointer, the frame record and frame pointer are still maintained (i.e. There is no way to get the behaviour you are asking for!). This patch fixes that. It also makes sure that if you ask for no frame pointers in leaf functions then they are not generated there unless LR gets clobbered in the leaf for some reason. (I have testcases here to check for that.) OK to commit to trunk? Cheers, Ian 2013-03-28 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (aarch64_can_eliminate): Only keep frame record when required. testsuite/ * gcc.target/aarch64/inc/asm-adder-clobber-lr.c: New test. * gcc.target/aarch64/inc/asm-adder-no-clobber-lr.c: Likewise. * gcc.target/aarch64/test-framepointer-1.c: Likewise. * gcc.target/aarch64/test-framepointer-2.c: Likewise. * gcc.target/aarch64/test-framepointer-3.c: Likewise. * gcc.target/aarch64/test-framepointer-4.c: Likewise. * gcc.target/aarch64/test-framepointer-5.c: Likewise. * gcc.target/aarch64/test-framepointer-6.c: Likewise. * gcc.target/aarch64/test-framepointer-7.c: Likewise. * gcc.target/aarch64/test-framepointer-8.c: Likewise.Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-2.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline --save-temps } */ + +#include asm-adder-no-clobber-lr.c + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer. + LR is not being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-6.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fomit-frame-pointer -mno-omit-leaf-frame-pointer -fno-inline --save-temps } */ + +#include asm-adder-clobber-lr.c + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is false, but irrelevant due to omit-frame-pointer. + LR is being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-3.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline --save-temps } */ + +#include asm-adder-no-clobber-lr.c + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer. + LR is not being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c === --- gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/test-framepointer-7.c (revision 0) @@ -0,0 +1,15 @@ +/* { dg-do run } */ +/* { dg-options -O2 -fomit-frame-pointer -momit-leaf-frame-pointer -fno-inline --save-temps } */ + +#include asm-adder-clobber-lr.c + +/* omit-frame-pointer is TRUE. + omit-leaf-frame-pointer is true, but irrelevant due to omit-frame-pointer. + LR is being clobbered in the leaf. + + Since we asked to have no frame pointers anywhere, we expect no frame + record in main or the leaf. */ + +/* { dg-final { scan-assembler-not stp\tx29, x30, \\\[sp, -\[0-9\]+\\\]! } } */ + +/* { dg-final { cleanup-saved-temps } } */ Index: gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c === --- gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c (revision 0) +++ gcc/testsuite/gcc.target/aarch64/asm-adder-clobber-lr.c (revision 0) @@ -0,0 +1,24 @@ +extern void abort (void
[PATCH, AArch64] Make MOVK output operand 2 in hex
MOVK should not be generated with a negative immediate, which the assembler rightfully rejects. This patch makes MOVK output its 2nd operand in hex instead. Tested on bare-metal and linux. OK for trunk? Cheers, Ian 2013-03-20 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_print_operand): New format specifier for printing a constant in hex. * config/aarch64/aarch64.md (insv_immmode): Use the X format specifier for printing second operand. testsuite/ * gcc.target/aarch64/movk.c: New test.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 1404a70..5e51630 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3365,6 +3365,16 @@ aarch64_print_operand (FILE *f, rtx x, char code) REGNO (x) - V0_REGNUM + (code - 'S')); break; +case 'X': + /* Print integer constant in hex. */ + if (GET_CODE (x) != CONST_INT) + { + output_operand_lossage (invalid operand for '%%%c', code); + return; + } + asm_fprintf (f, 0x%x, UINTVAL (x)); + break; + case 'w': case 'x': /* Print a general register name or the zero register (32-bit or diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 40e66db..9c89413 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -850,8 +850,8 @@ (match_operand:GPI 2 const_int_operand n))] INTVAL (operands[1]) GET_MODE_BITSIZE (MODEmode) INTVAL (operands[1]) % 16 == 0 -INTVAL (operands[2]) = 0x - movk\\t%w0, %2, lsl %1 +UINTVAL (operands[2]) = 0x + movk\\t%w0, %X2, lsl %1 [(set_attr v8type movk) (set_attr mode MODE)] ) diff --git a/gcc/testsuite/gcc.target/aarch64/movk.c b/gcc/testsuite/gcc.target/aarch64/movk.c new file mode 100644 index 000..e4b2209 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/movk.c @@ -0,0 +1,31 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps -fno-inline } */ + +extern void abort (void); + +long long int +dummy_number_generator () +{ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xefff, lsl 16 } } */ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xc4cc, lsl 32 } } */ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0xfffe, lsl 48 } } */ + return -346565474575675; +} + +int +main (void) +{ + + long long int num = dummy_number_generator (); + if (num 0) +abort (); + + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x4667, lsl 16 } } */ + /* { dg-final { scan-assembler movk\tx\[0-9\]+, 0x7a3d, lsl 32 } } */ + if (num / 69313094915135 != -5) +abort (); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
RE: [PING^1] [AArch64] Implement Bitwise AND and Set Flags
Please consider this as a reminder to review the patch posted at following link:- http://gcc.gnu.org/ml/gcc-patches/2013-01/msg01374.html The patch is slightly modified to use CC_NZ mode instead of CC. Please review the patch and let me know if its okay? Hi Naveen, With the CC_NZ fix, the patch looks good apart from one thing: the second set in each pattern should have the =r,rk constraint rather than just =r,r. That said, I've attached a patch that provides more thorough test cases, including execute ones. When you get commit approval (which will be after GCC goes into stage 1 again) then I can add in the test cases. You might as well run them now though, for more confidence in your work. BTW, I have an implementation of BICS that's been waiting for GCC to hit stage 1. I'll send that out for review soon. NOTE: I do not have maintainer powers here, so you need someone else to give the OK to your patch. Cheers, Ian diff --git a/gcc/testsuite/gcc.target/aarch64/ands1.c b/gcc/testsuite/gcc.target/aarch64/ands1.c new file mode 100644 index 000..e2bf956 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ands1.c @@ -0,0 +1,150 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0xff; + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, 255 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test3 (int a, int b, int c) +{ + int d = a (b 3); + + /* { dg-final { scan-assembler ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +typedef long long s64; + +s64 +ands_di_test1 (s64 a, s64 b, s64 c) +{ + s64 d = a b; + + /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+ } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test2 (s64 a, s64 b, s64 c) +{ + s64 d = a 0xff; + + /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, 255 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +s64 +ands_di_test3 (s64 a, s64 b, s64 c) +{ + s64 d = a (b 3); + + /* { dg-final { scan-assembler ands\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, lsl 3 } } */ + if (d == 0) +return a + c; + else +return b + d + c; +} + +int main () +{ + int x; + s64 y; + + x = ands_si_test1 (29, 4, 5); + if (x != 13) +abort(); + + x = ands_si_test1 (5, 2, 20); + if (x != 25) +abort(); + + x = ands_si_test2 (29, 4, 5); + if (x != 38) +abort(); + + x = ands_si_test2 (1024, 2, 20); + if (x != 1044) +abort(); + + x = ands_si_test3 (35, 4, 5); + if (x != 41) +abort(); + + x = ands_si_test3 (5, 2, 20); + if (x != 25) +abort(); + + y = ands_di_test1 (0x13029ll, + 0x32004ll, + 0x505050505ll); + + if (y != ((0x13029ll 0x32004ll) + 0x32004ll + 0x505050505ll)) +abort(); + + y = ands_di_test1 (0x5000500050005ll, + 0x2111211121112ll, + 0x02020ll); + if (y != 0x5000500052025ll) +abort(); + + y = ands_di_test2 (0x13029ll, + 0x32004ll, + 0x505050505ll); + if (y != ((0x13029ll 0xff) + 0x32004ll + 0x505050505ll)) +abort(); + + y = ands_di_test2 (0x130002900ll, + 0x32004ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort(); + + y = ands_di_test3 (0x13029ll, + 0x06408ll, + 0x505050505ll); + if (y != ((0x13029ll (0x06408ll 3)) + + 0x06408ll + 0x505050505ll)) +abort(); + + y = ands_di_test3 (0x130002900ll, + 0x08808ll, + 0x505050505ll); + if (y != (0x130002900ll + 0x505050505ll)) +abort(); + + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ands2.c b/gcc/testsuite/gcc.target/aarch64/ands2.c new file mode 100644 index 000..c778a54 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ands2.c @@ -0,0 +1,156 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps } */ + +extern void abort (void); + +int +ands_si_test1 (int a, int b, int c) +{ + int d = a b; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + /* { dg-final { scan-assembler and\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+ } } */ + if (d = 0) +return a + c; + else +return b + d + c; +} + +int +ands_si_test2 (int a, int b, int c) +{ + int d = a 0x; + + /* { dg-final { scan-assembler-not ands\tw\[0-9\]+, w\[0-9\]+, -1717986919 } } */ + /* {
[PATCH, AArch64] Support EXTR in backend
We couldn't generate EXTR for AArch64 ... until now! This patch includes the pattern and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*extrmode5_insn): New pattern. (*extrsi5_insn_uxtw): Likewise. testsuite/ * gcc.target/aarch64/extr.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 73d86a7..ef1c0f3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2703,6 +2703,34 @@ (set_attr mode MODE)] ) +(define_insn *extrmode5_insn + [(set (match_operand:GPI 0 register_operand =r) + (ior:GPI (ashift:GPI (match_operand:GPI 1 register_operand r) +(match_operand 3 const_int_operand n)) +(lshiftrt:GPI (match_operand:GPI 2 register_operand r) + (match_operand 4 const_int_operand n] + UINTVAL (operands[3]) GET_MODE_BITSIZE (MODEmode) + (UINTVAL (operands[3]) + UINTVAL (operands[4]) == GET_MODE_BITSIZE (MODEmode)) + extr\\t%w0, %w1, %w2, %4 + [(set_attr v8type shift) + (set_attr mode MODE)] +) + +;; zero_extend version of the above +(define_insn *extrsi5_insn_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI +(ior:SI (ashift:SI (match_operand:SI 1 register_operand r) + (match_operand 3 const_int_operand n)) +(lshiftrt:SI (match_operand:SI 2 register_operand r) + (match_operand 4 const_int_operand n)] + UINTVAL (operands[3]) 32 + (UINTVAL (operands[3]) + UINTVAL (operands[4]) == 32) + extr\\t%w0, %w1, %w2, %4 + [(set_attr v8type shift) + (set_attr mode SI)] +) + (define_insn *ANY_EXTEND:optabGPI:mode_ashlSHORT:mode [(set (match_operand:GPI 0 register_operand =r) (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/extr.c b/gcc/testsuite/gcc.target/aarch64/extr.c new file mode 100644 index 000..a78dd8d --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/extr.c @@ -0,0 +1,34 @@ +/* { dg-options -O2 --save-temps } */ +/* { dg-do run } */ + +extern void abort (void); + +int +test_si (int a, int b) +{ + /* { dg-final { scan-assembler extr\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+, 27\n } } */ + return (a 5) | ((unsigned int) b 27); +} + +long long +test_di (long long a, long long b) +{ + /* { dg-final { scan-assembler extr\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+, 45\n } } */ + return (a 19) | ((unsigned long long) b 45); +} + +int +main () +{ + int v; + long long w; + v = test_si (0x0004, 0x3000); + if (v != 0x0086) +abort(); + w = test_di (0x0001040040040004ll, 0x00700500ll); + if (w != 0x2002002000200380ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Support ROR in backend
We couldn't generate ROR (preferred alias of EXTR when both source registers are the same) for AArch64, when rotating by an immediate, ... until now! This patch includes the pattern and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*rormode3_insn): New pattern. (*rorsi3_insn_uxtw): Likewise. testsuite/ * gcc.target/aarch64/ror.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ef1c0f3..367c0e3 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2731,6 +2731,34 @@ (set_attr mode SI)] ) +(define_insn *rormode3_insn + [(set (match_operand:GPI 0 register_operand =r) + (rotate:GPI (match_operand:GPI 1 register_operand r) + (match_operand 2 const_int_operand n)))] + UINTVAL (operands[2]) GET_MODE_BITSIZE (MODEmode) +{ + operands[3] = GEN_INT (sizen - UINTVAL (operands[2])); + return ror\\t%w0, %w1, %3; +} + [(set_attr v8type shift) + (set_attr mode MODE)] +) + +;; zero_extend version of the above +(define_insn *rorsi3_insn_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI +(rotate:SI (match_operand:SI 1 register_operand r) + (match_operand 2 const_int_operand n] + UINTVAL (operands[2]) 32 +{ + operands[3] = GEN_INT (32 - UINTVAL (operands[2])); + return ror\\t%w0, %w1, %3; +} + [(set_attr v8type shift) + (set_attr mode SI)] +) + (define_insn *ANY_EXTEND:optabGPI:mode_ashlSHORT:mode [(set (match_operand:GPI 0 register_operand =r) (ANY_EXTEND:GPI diff --git a/gcc/testsuite/gcc.target/aarch64/ror.c b/gcc/testsuite/gcc.target/aarch64/ror.c new file mode 100644 index 000..4d266f0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ror.c @@ -0,0 +1,34 @@ +/* { dg-options -O2 --save-temps } */ +/* { dg-do run } */ + +extern void abort (void); + +int +test_si (int a) +{ + /* { dg-final { scan-assembler ror\tw\[0-9\]+, w\[0-9\]+, 27\n } } */ + return (a 5) | ((unsigned int) a 27); +} + +long long +test_di (long long a) +{ + /* { dg-final { scan-assembler ror\tx\[0-9\]+, x\[0-9\]+, 45\n } } */ + return (a 19) | ((unsigned long long) a 45); +} + +int +main () +{ + int v; + long long w; + v = test_si (0x0203050); + if (v != 0x4060a00) +abort(); + w = test_di (0x020506010304ll); + if (w != 0x102830081820ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] Support SBC in the backend
We couldn't generate SBC for AArch64 ... until now! This really patch includes the main pattern, a zero_extend form of it and a test. Full regression testing for Linux and bare-metal passed. OK for trunk stage-1? Thanks, Ian 2013-03-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (*submode3_carryin): New pattern. (*subsi3_carryin_uxtw): Likewise. testsuite/ * gcc.target/aarch64/sbc.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 4358b44..c99e188 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1790,6 +1790,34 @@ (set_attr mode SI)] ) +(define_insn *submode3_carryin + [(set +(match_operand:GPI 0 register_operand =r) +(minus:GPI (minus:GPI + (match_operand:GPI 1 register_operand r) + (ltu:GPI (reg:CC CC_REGNUM) (const_int 0))) + (match_operand:GPI 2 register_operand r)))] + + sbc\\t%w0, %w1, %w2 + [(set_attr v8type adc) + (set_attr mode MODE)] +) + +;; zero_extend version of the above +(define_insn *subsi3_carryin_uxtw + [(set +(match_operand:DI 0 register_operand =r) +(zero_extend:DI + (minus:SI (minus:SI + (match_operand:SI 1 register_operand r) + (ltu:SI (reg:CC CC_REGNUM) (const_int 0))) + (match_operand:SI 2 register_operand r] + + sbc\\t%w0, %w1, %w2 + [(set_attr v8type adc) + (set_attr mode SI)] +) + (define_insn *sub_uxtmode_multp2 [(set (match_operand:GPI 0 register_operand =rk) (minus:GPI (match_operand:GPI 4 register_operand r) diff --git a/gcc/testsuite/gcc.target/aarch64/sbc.c b/gcc/testsuite/gcc.target/aarch64/sbc.c new file mode 100644 index 000..e479910 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/sbc.c @@ -0,0 +1,41 @@ +/* { dg-do run } */ +/* { dg-options -O2 --save-temps } */ + +extern void abort (void); + +typedef unsigned int u32int; +typedef unsigned long long u64int; + +u32int +test_si (u32int w1, u32int w2, u32int w3, u32int w4) +{ + u32int w0; + /* { dg-final { scan-assembler sbc\tw\[0-9\]+, w\[0-9\]+, w\[0-9\]+\n } } */ + w0 = w1 - w2 - (w3 w4); + return w0; +} + +u64int +test_di (u64int x1, u64int x2, u64int x3, u64int x4) +{ + u64int x0; + /* { dg-final { scan-assembler sbc\tx\[0-9\]+, x\[0-9\]+, x\[0-9\]+\n } } */ + x0 = x1 - x2 - (x3 x4); + return x0; +} + +int +main () +{ + u32int x; + u64int y; + x = test_si (7, 8, 12, 15); + if (x != -2) +abort(); + y = test_di (0x987654321ll, 0x123456789ll, 0x345345345ll, 0x123123123ll); + if (y != 0x8641fdb98ll) +abort(); + return 0; +} + +/* { dg-final { cleanup-saved-temps } } */
[PATCH, AArch64] AND operation should use CC_NZ mode
The mode for AND should really be CC_NZ, so I fixed that up and in the TST patterns that (erroneously) expected it to be CC mode. It has been tested on linux and bare-metal. OK to commit to trunk (as bug fix)? Thanks. Ian 13-02-01 Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_select_cc_mode): Return correct CC mode for AND. * config/aarch64/aarch64.md (*andmode3nr_compare0): Fixed to use CC_NZ. (*and_SHIFT:optabmode3nr_compare0): Likewise. - diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 03b1361..2b09669 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -3076,7 +3076,7 @@ aarch64_select_cc_mode (RTX_CODE code, rtx x, rtx y) if ((GET_MODE (x) == SImode || GET_MODE (x) == DImode) y == const0_rtx (code == EQ || code == NE || code == LT || code == GE) - (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS)) + (GET_CODE (x) == PLUS || GET_CODE (x) == MINUS || GET_CODE (x) == AND)) return CC_NZmode; /* A compare with a shifted operand. Because of canonicalization, diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 36267c9..c4c152f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2470,8 +2470,8 @@ ) (define_insn *andmode3nr_compare0 - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ (and:GPI (match_operand:GPI 0 register_operand %r,r) (match_operand:GPI 1 aarch64_logical_operand r,lconst)) (const_int 0)))] @@ -2481,8 +2481,8 @@ (set_attr mode MODE)]) (define_insn *and_SHIFT:optabmode3nr_compare0 - [(set (reg:CC CC_REGNUM) - (compare:CC + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ (and:GPI (SHIFT:GPI (match_operand:GPI 0 register_operand r) (match_operand:QI 1 aarch64_shift_imm_mode n))
[PATCH, AArch64] Make zero_extends explicit for some SImode patterns
Greetings! I've made zero_extend versions of SI mode patterns that write to W registers in order to make the implicit zero_extend that they do explicit, so GCC can be smarter about when it actually needs to plant a zero_extend (uxtw). If that sounds familiar, it's because this patch continues the work of one already committed. :) This has been regression-tested for linux and bare-metal. OK for trunk and backport to ARM/aarch64-4.7-branch? Cheers, Ian 2013-01-15 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (*cstoresi_neg_uxtw): New pattern. (*cmovsi_insn_uxtw): New pattern. (*optabsi3_uxtw): New pattern. (*LOGICAL:optab_SHIFT:optabsi3_uxtw): New pattern. (*optabsi3_insn_uxtw): New pattern. (*bswapsi2_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index ec65b3c..8dd6c22 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1,5 +1,5 @@ ;; Machine description for AArch64 architecture. -;; Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc. +;; Copyright (C) 2009, 2010, 2011, 2012, 2013 Free Software Foundation, Inc. ;; Contributed by ARM Ltd. ;; ;; This file is part of GCC. @@ -2193,6 +2193,18 @@ (set_attr mode MODE)] ) +;; zero_extend version of the above +(define_insn *cstoresi_insn_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI +(match_operator:SI 1 aarch64_comparison_operator + [(match_operand 2 cc_register ) (const_int 0)])))] + + cset\\t%w0, %m1 + [(set_attr v8type csel) + (set_attr mode SI)] +) + (define_insn *cstoremode_neg [(set (match_operand:ALLI 0 register_operand =r) (neg:ALLI (match_operator:ALLI 1 aarch64_comparison_operator @@ -2203,6 +2215,18 @@ (set_attr mode MODE)] ) +;; zero_extend version of the above +(define_insn *cstoresi_neg_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI +(neg:SI (match_operator:SI 1 aarch64_comparison_operator + [(match_operand 2 cc_register ) (const_int 0)]] + + csetm\\t%w0, %m1 + [(set_attr v8type csel) + (set_attr mode SI)] +) + (define_expand cmovmode6 [(set (match_operand:GPI 0 register_operand ) (if_then_else:GPI @@ -2257,6 +2281,30 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *cmovsi_insn_uxtw + [(set (match_operand:DI 0 register_operand =r,r,r,r,r,r,r) + (zero_extend:DI +(if_then_else:SI + (match_operator 1 aarch64_comparison_operator + [(match_operand 2 cc_register ) (const_int 0)]) + (match_operand:SI 3 aarch64_reg_zero_or_m1_or_1 rZ,rZ,UsM,rZ,Ui1,UsM,Ui1) + (match_operand:SI 4 aarch64_reg_zero_or_m1_or_1 rZ,UsM,rZ,Ui1,rZ,UsM,Ui1] + !((operands[3] == const1_rtx operands[4] == constm1_rtx) + || (operands[3] == constm1_rtx operands[4] == const1_rtx)) + ;; Final two alternatives should be unreachable, but included for completeness + @ + csel\\t%w0, %w3, %w4, %m1 + csinv\\t%w0, %w3, wzr, %m1 + csinv\\t%w0, %w4, wzr, %M1 + csinc\\t%w0, %w3, wzr, %m1 + csinc\\t%w0, %w4, wzr, %M1 + mov\\t%w0, -1 + mov\\t%w0, 1 + [(set_attr v8type csel) + (set_attr mode SI)] +) + (define_insn *cmovmode_insn [(set (match_operand:GPF 0 register_operand =w) (if_then_else:GPF @@ -2369,6 +2417,17 @@ [(set_attr v8type logic,logic_imm) (set_attr mode MODE)]) +;; zero_extend version of above +(define_insn *optabsi3_uxtw + [(set (match_operand:DI 0 register_operand =r,rk) + (zero_extend:DI + (LOGICAL:SI (match_operand:SI 1 register_operand %r,r) +(match_operand:SI 2 aarch64_logical_operand r,K] + + logical\\t%w0, %w1, %w2 + [(set_attr v8type logic,logic_imm) + (set_attr mode SI)]) + (define_insn *LOGICAL:optab_SHIFT:optabmode3 [(set (match_operand:GPI 0 register_operand =r) (LOGICAL:GPI (SHIFT:GPI @@ -2380,6 +2439,19 @@ [(set_attr v8type logic_shift) (set_attr mode MODE)]) +;; zero_extend version of above +(define_insn *LOGICAL:optab_SHIFT:optabsi3_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI +(LOGICAL:SI (SHIFT:SI + (match_operand:SI 1 register_operand r) + (match_operand:QI 2 aarch64_shift_imm_si n)) +(match_operand:SI 3 register_operand r] + + LOGICAL:logical\\t%w0, %w3, %w1, SHIFT:shift %2 + [(set_attr v8type logic_shift) + (set_attr mode SI)]) + (define_insn one_cmplmode2 [(set (match_operand:GPI 0 register_operand =r) (not:GPI (match_operand:GPI 1 register_operand r)))] @@ -2591,6 +2663,18 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *optabsi3_insn_uxtw + [(set (match_operand:DI 0 register_operand =r) + (zero_extend:DI (SHIFT:SI +(match_operand:SI 1 register_operand r
RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns
Hi Richard, + add\\t%w0, %w2, %w, suxtSHORT:size ^^^ %w1 Got spot. I guess that pattern hasn't fired yet then! I'll fix it. This patch significantly reduces the number of redundant uxtw instructions seen in a variety of programs. (There are further patterns that can be done, but I have them in a separate patch that's still in development.) What do you get if you enable flag_ree, as we do for x86_64? In theory this should avoid even more extensions... C.f. common/config/i386/i386-common.c: static const struct default_options ix86_option_optimization_table[] = { /* Enable redundant extension instructions removal at -O2 and higher. */ { OPT_LEVELS_2_PLUS, OPT_free, NULL, 1 }, I should have said that I am indeed running with REE enabled. It has some impact (about 70 further UXTW removed from the set of binaries I've been building) and seems to mostly be good across basic blocks within the same function. As far as I can tell, there is no downside to REE, so I think it should be enabled by default for O2 or higher on AArch64 too. I'll prepare a new patch ...
RE: [PATCH, AArch64] Make zero_extends explicit for common SImode patterns
Hi Richard, + add\\t%w0, %w2, %w, suxtSHORT:size ^^^ %w1 Got spot. I guess that pattern hasn't fired yet then! I'll fix it. Now fixed in v3. I should have said that I am indeed running with REE enabled. It has some impact (about 70 further UXTW removed from the set of binaries I've been building) and seems to mostly be good across basic blocks within the same function. As far as I can tell, there is no downside to REE, so I think it should be enabled by default for O2 or higher on AArch64 too. I'm going to enable REE in a separate patch. Is this one OK to commit here and backport to ARM/aarch64-4.7-branch? Thanks, Ian diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a9a8b5f..d5c0206 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1271,6 +1273,22 @@ (set_attr mode SI)] ) +;; zero_extend version of above +(define_insn *addsi3_aarch64_uxtw + [(set +(match_operand:DI 0 register_operand =rk,rk,rk) +(zero_extend:DI (plus:SI + (match_operand:SI 1 register_operand %rk,rk,rk) + (match_operand:SI 2 aarch64_plus_operand I,r,J] + + @ + add\\t%w0, %w1, %2 + add\\t%w0, %w1, %w2 + sub\\t%w0, %w1, #%n2 + [(set_attr v8type alu) + (set_attr mode SI)] +) + (define_insn *adddi3_aarch64 [(set (match_operand:DI 0 register_operand =rk,rk,rk,!w) @@ -1304,6 +1322,23 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *addsi3_compare0_uxtw + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ +(plus:SI (match_operand:SI 1 register_operand %r,r) + (match_operand:SI 2 aarch64_plus_operand rI,J)) +(const_int 0))) + (set (match_operand:DI 0 register_operand =r,r) + (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2] + + @ + adds\\t%w0, %w1, %w2 + subs\\t%w0, %w1, #%n2 + [(set_attr v8type alus) + (set_attr mode SI)] +) + (define_insn *addmode3nr_compare0 [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ @@ -1340,6 +1375,19 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *add_shift_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI + (ASHIFT:SI (match_operand:SI 1 register_operand r) + (match_operand:QI 2 aarch64_shift_imm_si n)) + (match_operand:SI 3 register_operand r] + + add\\t%w0, %w3, %w1, shift %2 + [(set_attr v8type alu_shift) + (set_attr mode SI)] +) + (define_insn *add_mul_imm_mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r) @@ -1361,6 +1409,17 @@ (set_attr mode GPI:MODE)] ) +;; zero_extend version of above +(define_insn *add_optabSHORT:mode_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 register_operand r)) + (match_operand:GPI 2 register_operand r] + + add\\t%w0, %w2, %w1, suxtSHORT:size + [(set_attr v8type alu_ext) + (set_attr mode SI)] +) + (define_insn *add_optabALLX:mode_shft_GPI:mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (ashift:GPI (ANY_EXTEND:GPI @@ -1373,6 +1432,19 @@ (set_attr mode GPI:MODE)] ) +;; zero_extend version of above +(define_insn *add_optabSHORT:mode_shft_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI + (match_operand:SHORT 1 register_operand r)) + (match_operand 2 aarch64_imm3 Ui3)) + (match_operand:SI 3 register_operand r] + + add\\t%w0, %w3, %w1, suxtSHORT:size %2 + [(set_attr v8type alu_ext) + (set_attr mode SI)] +) + (define_insn *add_optabALLX:mode_mult_GPI:mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (mult:GPI (ANY_EXTEND:GPI @@ -1385,6 +1457,19 @@ (set_attr mode GPI:MODE)] ) +;; zero_extend version of above +(define_insn *add_optabSHORT:mode_mult_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (mult:SI (ANY_EXTEND:SI +(match_operand:SHORT 1 register_operand r)) + (match_operand 2 aarch64_pwr_imm3 Up3)) + (match_operand:SI 3 register_operand r] + + add\\t%w0, %w3, %w1, suxtSHORT:size %p2 + [(set_attr v8type alu_ext) + (set_attr mode SI)] +) + (define_insn *add_optabmode_multp2 [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (ANY_EXTRACT:GPI @@ -1399,6 +1484,21 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *add_optabsi_multp2_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (ANY_EXTRACT:SI + (mult:SI (match_operand:SI 1 register_operand r) +
[PATCH, AArch64] Make zero_extends explicit for common SImode patterns
Season's greetings to you! :) I've made zero_extend versions of SI mode patterns that write to W registers in order to make the implicit zero_extend that they do explicit, so GCC can be smarter about when it actually needs to plant a zero_extend (uxtw). This patch significantly reduces the number of redundant uxtw instructions seen in a variety of programs. (There are further patterns that can be done, but I have them in a separate patch that's still in development.) OK for trunk and backport to ARM/aarch64-4.7-branch? Cheers, Ian 2012-12-13 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (*addsi3_aarch64_uxtw): New pattern. (*addsi3_compare0_uxtw): New pattern. (*add_shift_si_uxtw): New pattern. (*add_optabSHORT:mode_si_uxtw): New pattern. (*add_optabSHORT:mode_shft_si_uxtw): New pattern. (*add_optabSHORT:mode_mult_si_uxtw): New pattern. (*add_optabsi_multp2_uxtw): New pattern. (*addsi3_carryin_uxtw): New pattern. (*addsi3_carryin_alt1_uxtw): New pattern. (*addsi3_carryin_alt2_uxtw): New pattern. (*addsi3_carryin_alt3_uxtw): New pattern. (*add_uxtsi_multp2_uxtw): New pattern. (*subsi3_uxtw): New pattern. (*subsi3_compare0_uxtw): New pattern. (*sub_shift_si_uxtw): New pattern. (*sub_mul_imm_si_uxtw): New pattern. (*sub_optabSHORT:mode_si_uxtw): New pattern. (*sub_optabSHORT:mode_shft_si_uxtw): New pattern. (*sub_optabsi_multp2_uxtw): New pattern. (*sub_uxtsi_multp2_uxtw): New pattern. (*negsi2_uxtw): New pattern. (*negsi2_compare0_uxtw): New pattern. (*neg_shift_si2_uxtw): New pattern. (*neg_mul_imm_si2_uxtw): New pattern. (*mulsi3_uxtw): New pattern. (*maddsi_uxtw): New pattern. (*msubsi_uxtw): New pattern. (*mulsi_neg_uxtw): New pattern. (*su_optabdivsi3_uxtw): New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a9a8b5f..d5c0206 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1271,6 +1273,22 @@ (set_attr mode SI)] ) +;; zero_extend version of above +(define_insn *addsi3_aarch64_uxtw + [(set +(match_operand:DI 0 register_operand =rk,rk,rk) +(zero_extend:DI (plus:SI + (match_operand:SI 1 register_operand %rk,rk,rk) + (match_operand:SI 2 aarch64_plus_operand I,r,J] + + @ + add\\t%w0, %w1, %2 + add\\t%w0, %w1, %w2 + sub\\t%w0, %w1, #%n2 + [(set_attr v8type alu) + (set_attr mode SI)] +) + (define_insn *adddi3_aarch64 [(set (match_operand:DI 0 register_operand =rk,rk,rk,!w) @@ -1304,6 +1322,23 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *addsi3_compare0_uxtw + [(set (reg:CC_NZ CC_REGNUM) + (compare:CC_NZ +(plus:SI (match_operand:SI 1 register_operand %r,r) + (match_operand:SI 2 aarch64_plus_operand rI,J)) +(const_int 0))) + (set (match_operand:DI 0 register_operand =r,r) + (zero_extend:DI (plus:SI (match_dup 1) (match_dup 2] + + @ + adds\\t%w0, %w1, %w2 + subs\\t%w0, %w1, #%n2 + [(set_attr v8type alus) + (set_attr mode SI)] +) + (define_insn *addmode3nr_compare0 [(set (reg:CC_NZ CC_REGNUM) (compare:CC_NZ @@ -1340,6 +1375,19 @@ (set_attr mode MODE)] ) +;; zero_extend version of above +(define_insn *add_shift_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI + (ASHIFT:SI (match_operand:SI 1 register_operand r) + (match_operand:QI 2 aarch64_shift_imm_si n)) + (match_operand:SI 3 register_operand r] + + add\\t%w0, %w3, %w1, shift %2 + [(set_attr v8type alu_shift) + (set_attr mode SI)] +) + (define_insn *add_mul_imm_mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (mult:GPI (match_operand:GPI 1 register_operand r) @@ -1361,6 +1409,17 @@ (set_attr mode GPI:MODE)] ) +;; zero_extend version of above +(define_insn *add_optabSHORT:mode_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (ANY_EXTEND:SI (match_operand:SHORT 1 register_operand r)) + (match_operand:GPI 2 register_operand r] + + add\\t%w0, %w2, %w, suxtSHORT:size + [(set_attr v8type alu_ext) + (set_attr mode SI)] +) + (define_insn *add_optabALLX:mode_shft_GPI:mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (ashift:GPI (ANY_EXTEND:GPI @@ -1373,6 +1432,19 @@ (set_attr mode GPI:MODE)] ) +;; zero_extend version of above +(define_insn *add_optabSHORT:mode_shft_si_uxtw + [(set (match_operand:DI 0 register_operand =rk) + (zero_extend:DI (plus:SI (ashift:SI (ANY_EXTEND:SI + (match_operand:SHORT 1 register_operand r)) + (match_operand 2 aarch64_imm3 Ui3
RE: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation
It turned out that this patch depended on another one from earlier, so I have backported that to ARM/aarch64-4.7-branch too. http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00452.html Cheers, Ian -Original Message- From: Ian Bolton [mailto:ian.bol...@arm.com] Sent: 23 November 2012 18:09 To: gcc-patches@gcc.gnu.org Subject: [PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation I had already committed my testcase for this for aarch64, but it depends on this patch that doesn't yet exist in 4.7, so I backported to our ARM/aarch64-4.7-branch. Cheers, Ian From: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19 c8192d0d099d157145 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index be34843..ce08fce 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2012-09-26 Christophe Lyon christophe.l...@linaro.org + + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field. + (execute_optimize_bswap): Add support for builtin_bswap16. + 2012-09-26 Richard Guenther rguent...@suse.de * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 3aad841..7c96949 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2012-09-26 Christophe Lyon christophe.l...@linaro.org + + * gcc.target/arm/builtin-bswap16-1.c: New testcase. + 2012-09-25 Segher Boessenkool seg...@kernel.crashing.org PR target/51274 diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c new file mode 100644 index 000..6920f00 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-require-effective-target arm_arch_v6_ok } */ +/* { dg-add-options arm_arch_v6 } */ +/* { dg-final { scan-assembler-not orr\[ \t\] } } */ + +unsigned short swapu16_1 (unsigned short x) +{ + return (x 8) | (x 8); +} + +unsigned short swapu16_2 (unsigned short x) +{ + return (x 8) | (x 8); +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 16ff397..d9f4e9e 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -154,6 +154,9 @@ static struct static struct { + /* Number of hand-written 16-bit bswaps found. */ + int found_16bit; + /* Number of hand-written 32-bit bswaps found. */ int found_32bit; @@ -1803,9 +1806,9 @@ static unsigned int execute_optimize_bswap (void) { basic_block bb; - bool bswap32_p, bswap64_p; + bool bswap16_p, bswap32_p, bswap64_p; bool changed = false; - tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; + tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; if (BITS_PER_UNIT != 8) return 0; @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void) if (sizeof (HOST_WIDEST_INT) 8) return 0; + bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) + optab_handler (bswap_optab, HImode) != CODE_FOR_nothing); bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) optab_handler (bswap_optab, SImode) != CODE_FOR_nothing); bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) (optab_handler (bswap_optab, DImode) != CODE_FOR_nothing || (bswap32_p word_mode == SImode))); - if (!bswap32_p !bswap64_p) + if (!bswap16_p !bswap32_p !bswap64_p) return 0; /* Determine the argument type of the builtins. The code later on assumes that the return and argument type are the same. */ + if (bswap16_p) +{ + tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); +} + if (bswap32_p) { tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void) switch (type_size) { + case 16: + if (bswap16_p) + { + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap_type = bswap16_type; + } + break; case 32: if (bswap32_p) { @@ -1890,7 +1908,9 @@ execute_optimize_bswap (void) continue; changed = true; - if (type_size == 32) + if (type_size == 16) + bswap_stats.found_16bit++; + else if (type_size == 32) bswap_stats.found_32bit++; else bswap_stats.found_64bit++; @@ -1935,6 +1955,8 @@ execute_optimize_bswap (void) } } + statistics_counter_event (cfun, 16-bit bswap implementations found, + bswap_stats.found_16bit); statistics_counter_event (cfun, 32-bit bswap implementations found
[PATCH, AArch64 4.7] Backport of __builtin_bswap16 optimisation
I had already committed my testcase for this for aarch64, but it depends on this patch that doesn't yet exist in 4.7, so I backported to our ARM/aarch64-4.7-branch. Cheers, Ian From: http://gcc.gnu.org/git/?p=gcc.git;a=commitdiff;h=f811051bf87b1de7804c19c8192 d0d099d157145 diff --git a/gcc/ChangeLog b/gcc/ChangeLog index be34843..ce08fce 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,8 @@ +2012-09-26 Christophe Lyon christophe.l...@linaro.org + + * tree-ssa-math-opts.c (bswap_stats): Add found_16bit field. + (execute_optimize_bswap): Add support for builtin_bswap16. + 2012-09-26 Richard Guenther rguent...@suse.de * tree.h (DECL_IS_BUILTIN): Compare LOCATION_LOCUS. diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 3aad841..7c96949 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,7 @@ +2012-09-26 Christophe Lyon christophe.l...@linaro.org + + * gcc.target/arm/builtin-bswap16-1.c: New testcase. + 2012-09-25 Segher Boessenkool seg...@kernel.crashing.org PR target/51274 diff --git a/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c new file mode 100644 index 000..6920f00 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/builtin-bswap16-1.c @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ +/* { dg-require-effective-target arm_arch_v6_ok } */ +/* { dg-add-options arm_arch_v6 } */ +/* { dg-final { scan-assembler-not orr\[ \t\] } } */ + +unsigned short swapu16_1 (unsigned short x) +{ + return (x 8) | (x 8); +} + +unsigned short swapu16_2 (unsigned short x) +{ + return (x 8) | (x 8); +} diff --git a/gcc/tree-ssa-math-opts.c b/gcc/tree-ssa-math-opts.c index 16ff397..d9f4e9e 100644 --- a/gcc/tree-ssa-math-opts.c +++ b/gcc/tree-ssa-math-opts.c @@ -154,6 +154,9 @@ static struct static struct { + /* Number of hand-written 16-bit bswaps found. */ + int found_16bit; + /* Number of hand-written 32-bit bswaps found. */ int found_32bit; @@ -1803,9 +1806,9 @@ static unsigned int execute_optimize_bswap (void) { basic_block bb; - bool bswap32_p, bswap64_p; + bool bswap16_p, bswap32_p, bswap64_p; bool changed = false; - tree bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; + tree bswap16_type = NULL_TREE, bswap32_type = NULL_TREE, bswap64_type = NULL_TREE; if (BITS_PER_UNIT != 8) return 0; @@ -1813,17 +1816,25 @@ execute_optimize_bswap (void) if (sizeof (HOST_WIDEST_INT) 8) return 0; + bswap16_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP16) + optab_handler (bswap_optab, HImode) != CODE_FOR_nothing); bswap32_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP32) optab_handler (bswap_optab, SImode) != CODE_FOR_nothing); bswap64_p = (builtin_decl_explicit_p (BUILT_IN_BSWAP64) (optab_handler (bswap_optab, DImode) != CODE_FOR_nothing || (bswap32_p word_mode == SImode))); - if (!bswap32_p !bswap64_p) + if (!bswap16_p !bswap32_p !bswap64_p) return 0; /* Determine the argument type of the builtins. The code later on assumes that the return and argument type are the same. */ + if (bswap16_p) +{ + tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap16_type = TREE_VALUE (TYPE_ARG_TYPES (TREE_TYPE (fndecl))); +} + if (bswap32_p) { tree fndecl = builtin_decl_explicit (BUILT_IN_BSWAP32); @@ -1863,6 +1874,13 @@ execute_optimize_bswap (void) switch (type_size) { + case 16: + if (bswap16_p) + { + fndecl = builtin_decl_explicit (BUILT_IN_BSWAP16); + bswap_type = bswap16_type; + } + break; case 32: if (bswap32_p) { @@ -1890,7 +1908,9 @@ execute_optimize_bswap (void) continue; changed = true; - if (type_size == 32) + if (type_size == 16) + bswap_stats.found_16bit++; + else if (type_size == 32) bswap_stats.found_32bit++; else bswap_stats.found_64bit++; @@ -1935,6 +1955,8 @@ execute_optimize_bswap (void) } } + statistics_counter_event (cfun, 16-bit bswap implementations found, + bswap_stats.found_16bit); statistics_counter_event (cfun, 32-bit bswap implementations found, bswap_stats.found_32bit); statistics_counter_event (cfun, 64-bit bswap implementations found,
[PATCH AArch64] Fix faulty commit of testsuite/gcc.target/aarch64/csinc-2.c
A commit I did earlier in the week got truncated somehow, leading to a broken testcase for AArch64 target. I've just commited this fix as obvious on trunk and the arm/aarch64-4.7-branch. Cheers Ian Index: gcc/testsuite/gcc.target/aarch64/csinc-2.c === --- gcc/testsuite/gcc.target/aarch64/csinc-2.c (revision 193571) +++ gcc/testsuite/gcc.target/aarch64/csinc-2.c (revision 193572) @@ -12,3 +12,7 @@ typedef long long s64; s64 foo2 (s64 a, s64 b) +{ + return (a == b) ? 7 : 1; + /* { dg-final { scan-assembler csinc\tx\[0-9\].*xzr } } */ +}
[PATCH AArch64] Implement bswaphi2 with rev16
This patch implements the standard pattern bswaphi2 for AArch64. Regression tests all pass. OK for trunk and backport to arm/aarch64-4.7-branch? Cheers, Ian 2012-11-16 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (bswaphi2): New pattern. * gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c: New test. * gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c: New test. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..22c7103 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2340,6 +2340,15 @@ (set_attr mode MODE)] ) +(define_insn bswaphi2 + [(set (match_operand:HI 0 register_operand =r) +(bswap:HI (match_operand:HI 1 register_operand r)))] + + rev16\\t%w0, %w1 + [(set_attr v8type rev) + (set_attr mode HI)] +) + ;; --- ;; Floating-point intrinsics ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c new file mode 100644 index 000..a6706e6 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-1.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +/* { dg-final { scan-assembler-times rev16\\t 2 } } */ + +/* rev16 */ +short +swaps16 (short x) +{ + return __builtin_bswap16 (x); +} + +/* rev16 */ +unsigned short +swapu16 (unsigned short x) +{ + return __builtin_bswap16 (x); +} diff --git a/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c new file mode 100644 index 000..6018b48 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/builtin-bswap-2.c @@ -0,0 +1,18 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +/* { dg-final { scan-assembler-times rev16\\t 2 } } */ + +/* rev16 */ +unsigned short +swapu16_1 (unsigned short x) +{ + return (x 8) | (x 8); +} + +/* rev16 */ +unsigned short +swapu16_2 (unsigned short x) +{ + return (x 8) | (x 8); +}
[PATCH arm/aarch64-4.7] Fix up Changelogs
Some changes had been added to gcc/ChangeLog and gcc/testsuite/Changelog when they should have been recorded in the gcc/Changelog.aarch64 and gcc/testsuite/Changelog.aarch64 files instead. Committed as obvious. Cheers, Ian
[PATCH,AArch64] Optimise comparison where intermediate result not used
Hi all, When we perform an addition but only use the result for a comparison, we can save an instruction. Consider this function: int foo (int a, int b) { return ((a + b) == 0) ? 1 : 7; } Here is the original output: foo: add w0, w0, w1 cmp w0, wzr mov w1, 7 mov w0, 1 csel w0, w1, w0, ne ret Now we get this: foo: cmn w0, w1 mov w1, 7 mov w0, 1 cselw0, w1, w0, ne ret :) I added other testcases for this and also some for adds and subs, which were investigated as part of this work. OK for trunk? Cheers, Ian 2012-11-06 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (*compare_negmode): New pattern. * gcc/testsuite/gcc.target/aarch64/cmn.c: New test. * gcc/testsuite/gcc.target/aarch64/adds.c: New test. * gcc/testsuite/gcc.target/aarch64/subs.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..6935192 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1310,6 +1310,17 @@ (set_attr mode MODE)] ) +(define_insn *compare_negmode + [(set (reg:CC CC_REGNUM) + (compare:CC +(match_operand:GPI 0 register_operand r) +(neg:GPI (match_operand:GPI 1 register_operand r] + + cmn\\t%w0, %w1 + [(set_attr v8type alus) + (set_attr mode MODE)] +) + (define_insn *add_shift_mode [(set (match_operand:GPI 0 register_operand =rk) (plus:GPI (ASHIFT:GPI (match_operand:GPI 1 register_operand r) diff --git a/gcc/testsuite/gcc.target/aarch64/adds.c b/gcc/testsuite/gcc.target/aarch64/adds.c new file mode 100644 index 000..aa42321 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/adds.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int z; +int +foo (int x, int y) +{ + int l = x + y; + if (l == 0) +return 5; + + /* { dg-final { scan-assembler adds\tw\[0-9\] } } */ + z = l ; + return 25; +} + +typedef long long s64; + +s64 zz; +s64 +foo2 (s64 x, s64 y) +{ + s64 l = x + y; + if (l 0) +return 5; + + /* { dg-final { scan-assembler adds\tx\[0-9\] } } */ + zz = l ; + return 25; +} diff --git a/gcc/testsuite/gcc.target/aarch64/cmn.c b/gcc/testsuite/gcc.target/aarch64/cmn.c new file mode 100644 index 000..1f06f57 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/cmn.c @@ -0,0 +1,24 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int +foo (int a, int b) +{ + if (a + b) +return 5; + else +return 2; + /* { dg-final { scan-assembler cmn\tw\[0-9\] } } */ +} + +typedef long long s64; + +s64 +foo2 (s64 a, s64 b) +{ + if (a + b) +return 5; + else +return 2; + /* { dg-final { scan-assembler cmn\tx\[0-9\] } } */ +} diff --git a/gcc/testsuite/gcc.target/aarch64/subs.c b/gcc/testsuite/gcc.target/aarch64/subs.c new file mode 100644 index 000..2bf1975 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/subs.c @@ -0,0 +1,30 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int z; +int +foo (int x, int y) +{ + int l = x - y; + if (l == 0) +return 5; + + /* { dg-final { scan-assembler subs\tw\[0-9\] } } */ + z = l ; + return 25; +} + +typedef long long s64; + +s64 zz; +s64 +foo2 (s64 x, s64 y) +{ + s64 l = x - y; + if (l 0) +return 5; + + /* { dg-final { scan-assembler subs\tx\[0-9\] } } */ + zz = l ; + return 25; +}
[PATCH,AArch64] Use CSINC instead of CSEL to return 1
Where a CSEL can return the value 1 as one of the alternatives, it is usually more efficient to use a CSINC than a CSEL (and never less efficient), since the value of 1 can be derived from wzr, rather than needing to set it up in a register first. This patch enables this capability. It has been regression tested on trunk. OK for commit? Cheers, Ian 2012-11-06 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (cmovmode_insn): Emit CSINC when one of the alternatives is constant 1. * gcc/config/aarch64/constraints.md: New constraint. * gcc/config/aarch64/predicates.md: Rename predicate aarch64_reg_zero_or_m1 to aarch64_reg_zero_or_m1_or_1. * gcc/testsuite/gcc.target/aarch64/csinc-2.c: New test. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 6935192..038465e 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -1877,19 +1877,23 @@ ) (define_insn *cmovmode_insn - [(set (match_operand:ALLI 0 register_operand =r,r,r,r) + [(set (match_operand:ALLI 0 register_operand =r,r,r,r,r,r,r) (if_then_else:ALLI (match_operator 1 aarch64_comparison_operator [(match_operand 2 cc_register ) (const_int 0)]) -(match_operand:ALLI 3 aarch64_reg_zero_or_m1 rZ,rZ,UsM,UsM) -(match_operand:ALLI 4 aarch64_reg_zero_or_m1 rZ,UsM,rZ,UsM)))] - - ;; Final alternative should be unreachable, but included for completeness +(match_operand:ALLI 3 aarch64_reg_zero_or_m1_or_1 rZ,rZ,UsM,rZ,Ui1,UsM,Ui1) +(match_operand:ALLI 4 aarch64_reg_zero_or_m1_or_1 rZ,UsM,rZ,Ui1,rZ,UsM,Ui1)))] + !((operands[3] == const1_rtx operands[4] == constm1_rtx) + || (operands[3] == constm1_rtx operands[4] == const1_rtx)) + ;; Final two alternatives should be unreachable, but included for completeness @ csel\\t%w0, %w3, %w4, %m1 csinv\\t%w0, %w3, wzr, %m1 csinv\\t%w0, %w4, wzr, %M1 - mov\\t%w0, -1 + csinc\\t%w0, %w3, wzr, %m1 + csinc\\t%w0, %w4, wzr, %M1 + mov\\t%w0, -1 + mov\\t%w0, 1 [(set_attr v8type csel) (set_attr mode MODE)] ) diff --git a/gcc/config/aarch64/constraints.md b/gcc/config/aarch64/constraints.md index da50a47..780faaa 100644 --- a/gcc/config/aarch64/constraints.md +++ b/gcc/config/aarch64/constraints.md @@ -102,6 +102,11 @@ A constraint that matches the immediate constant -1. (match_test op == constm1_rtx)) +(define_constraint Ui1 + @internal + A constraint that matches the immediate constant +1. + (match_test op == const1_rtx)) + (define_constraint Ui3 @internal A constraint that matches the integers 0...4. diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 328e5cf..aae71c1 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -31,11 +31,12 @@ (ior (match_operand 0 register_operand) (match_test op == const0_rtx -(define_predicate aarch64_reg_zero_or_m1 +(define_predicate aarch64_reg_zero_or_m1_or_1 (and (match_code reg,subreg,const_int) (ior (match_operand 0 register_operand) (ior (match_test op == const0_rtx) -(match_test op == constm1_rtx) +(ior (match_test op == constm1_rtx) + (match_test op == const1_rtx)) (define_predicate aarch64_fp_compare_operand (ior (match_operand 0 register_operand) diff --git a/gcc/testsuite/gcc.target/aarch64/csinc-2.c b/gcc/testsuite/gcc.target/aarch64/csinc-2.c new file mode 100644 index 000..6ed9080 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/csinc-2.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +int +foo (int a, int b) +{ + return (a b) ? 1 : 7; + /* { dg-final { scan-assembler csinc\tw\[0-9\].*wzr } } */ +} + +typedef long long s64; + +s64 +foo2 (s64 a, s64 b) +{ + return (a == b) ? 7 : 1; + /* { dg-final { scan-assembler csinc\tx\[0-9\].*xzr } } */ +}
RE: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only
Subject: [PATCH][AArch64] Restrict usage of SBFIZ to valid range only This fixes an issue where we were generating an SBFIZ with operand 3 outside of the valid range (as determined by the size of the destination register and the amount of shift). My patch checks that the range is valid before allowing the pattern to be used. This has now had full regression testing and all is OK. OK for aarch64-trunk and aarch64-4_7-branch? Cheers, Ian 2012-10-15 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (optabALLX:mode_shft_GPI:mode): Restrict based on op2. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..3bfe6e6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2311,7 +2311,7 @@ (ashift:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 register_operand r)) (match_operand 2 const_int_operand n)))] - + ALLX:sizen = (GPI:sizen - UINTVAL (operands[2])) subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen [(set_attr v8type bfm) (set_attr mode GPI:MODE)] New and improved version is at the end of this email. This has had full regression testing and all is OK. OK for aarch64-trunk and aarch64-4_7-branch? Cheers, Ian 2012-10-16 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (optabALLX:mode_shft_GPI:mode): Restrict operands. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..e77496f 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2311,8 +2311,13 @@ (ashift:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 register_operand r)) (match_operand 2 const_int_operand n)))] - - subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen + UINTVAL (operands[2]) GPI:sizen +{ + operands[3] = (ALLX:sizen = (GPI:sizen - UINTVAL (operands[2]))) + ? GEN_INT (ALLX:sizen) + : GEN_INT (GPI:sizen - UINTVAL (operands[2])); + return subfiz\t%GPI:w0, %GPI:w1, %2, %3; +} [(set_attr v8type bfm) (set_attr mode GPI:MODE)] )
[PATCH][AArch64] Restrict usage of SBFIZ to valid range only
This fixes an issue where we were generating an SBFIZ with operand 3 outside of the valid range (as determined by the size of the destination register and the amount of shift). My patch checks that the range is valid before allowing the pattern to be used. This has now had full regression testing and all is OK. OK for aarch64-trunk and aarch64-4_7-branch? Cheers, Ian 2012-10-15 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md (optabALLX:mode_shft_GPI:mode): Restrict based on op2. - diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index e6086a9..3bfe6e6 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2311,7 +2311,7 @@ (ashift:GPI (ANY_EXTEND:GPI (match_operand:ALLX 1 register_operand r)) (match_operand 2 const_int_operand n)))] - + ALLX:sizen = (GPI:sizen - UINTVAL (operands[2])) subfiz\\t%GPI:w0, %GPI:w1, %2, #ALLX:sizen [(set_attr v8type bfm) (set_attr mode GPI:MODE)]
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
Ok. Having dug a bit deeper I think the main problem is that you're working against yourself by not handling this pattern right from the beginning. You have split the address incorrectly to begin and are now trying to recover after the fact. The following patch seems to do the trick for me, producing (insn 6 5 7 (set (reg:DI 81) (high:DI (const:DI (plus:DI (symbol_ref:DI (arr) [flags 0x80] var_decl 0x7f9bae1105f0 arr) (const_int 12 [0xc]) z.c:8 -1 (nil)) (insn 7 6 8 (set (reg:DI 80) (lo_sum:DI (reg:DI 81) (const:DI (plus:DI (symbol_ref:DI (arr) [flags 0x80] var_decl 0x7f9bae1105f0 arr) (const_int 12 [0xc]) z.c:8 -1 (expr_list:REG_EQUAL (const:DI (plus:DI (symbol_ref:DI (arr) [flags 0x80] var_decl 0x7f9bae1105f0 arr) (const_int 12 [0xc]))) (nil))) right from the .150.expand dump. I'll leave it to you to fully regression test and commit the patch as appropriate. ;-) Thanks so much for this, Richard. I have prepared a new patch heavily based off yours, which really demands its own new email trail, so I shall make a fresh post. Cheers, Ian
[PATCH, AArch64] Handle symbol + offset more effectively
Hi all, This patch corrects what seemed to be a typo in expand_mov_immediate in aarch64.c, where we had || instead of an in our original code. if (offset != const0_rtx (targetm.cannot_force_const_mem (mode, imm) || (can_create_pseudo_p ( // - should have been At any given time, this code would have treated all input the same and will have caused all non-zero offsets to have been forced to temporaries, and made us never run the code in the remainder of the function. In terms of measurable impact, this patch provides a better fix to the problem I was trying to solve with this patch: http://gcc.gnu.org/ml/gcc-patches/2012-08/msg02072.html Almost all credit should go to Richard Henderson for this patch. It is all his, but for a minor change I made to some predicates which now become relevant when we execute more of the expand_mov_immediate function. My testing showed no regressions for bare-metal or linux. OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian 2012-09-25 Richard Henderson r...@redhat.com Ian Bolton ian.bol...@arm.com * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Fix a functional typo and refactor code in switch statement. * config/aarch64/aarch64.md (add_losym): Handle symbol + offset. * config/aarch64/predicates.md (aarch64_tls_ie_symref): Match const. (aarch64_tls_le_symref): Likewise.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 2d7eba7..edeee30 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -652,43 +652,57 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) unsigned HOST_WIDE_INT val; bool subtargets; rtx subtarget; - rtx base, offset; int one_match, zero_match; gcc_assert (mode == SImode || mode == DImode); - /* If we have (const (plus symbol offset)), and that expression cannot - be forced into memory, load the symbol first and add in the offset. */ - split_const (imm, base, offset); - if (offset != const0_rtx - (targetm.cannot_force_const_mem (mode, imm) - || (can_create_pseudo_p ( -{ - base = aarch64_force_temporary (dest, base); - aarch64_emit_move (dest, aarch64_add_offset (mode, NULL, base, INTVAL (offset))); - return; -} - /* Check on what type of symbol it is. */ - if (GET_CODE (base) == SYMBOL_REF || GET_CODE (base) == LABEL_REF) + if (GET_CODE (imm) == SYMBOL_REF + || GET_CODE (imm) == LABEL_REF + || GET_CODE (imm) == CONST) { - rtx mem; - switch (aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)) + rtx mem, base, offset; + enum aarch64_symbol_type sty; + + /* If we have (const (plus symbol offset)), separate out the offset +before we start classifying the symbol. */ + split_const (imm, base, offset); + + sty = aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR); + switch (sty) { case SYMBOL_FORCE_TO_MEM: - mem = force_const_mem (mode, imm); + if (offset != const0_rtx + targetm.cannot_force_const_mem (mode, imm)) + { + gcc_assert(can_create_pseudo_p ()); + base = aarch64_force_temporary (dest, base); + base = aarch64_add_offset (mode, NULL, base, INTVAL (offset)); + aarch64_emit_move (dest, base); + return; + } + mem = force_const_mem (mode, imm); gcc_assert (mem); emit_insn (gen_rtx_SET (VOIDmode, dest, mem)); return; -case SYMBOL_SMALL_TLSGD: -case SYMBOL_SMALL_TLSDESC: -case SYMBOL_SMALL_GOTTPREL: -case SYMBOL_SMALL_TPREL: +case SYMBOL_SMALL_TLSGD: +case SYMBOL_SMALL_TLSDESC: +case SYMBOL_SMALL_GOTTPREL: case SYMBOL_SMALL_GOT: + if (offset != const0_rtx) + { + gcc_assert(can_create_pseudo_p ()); + base = aarch64_force_temporary (dest, base); + base = aarch64_add_offset (mode, NULL, base, INTVAL (offset)); + aarch64_emit_move (dest, base); + return; + } + /* FALLTHRU */ + +case SYMBOL_SMALL_TPREL: case SYMBOL_SMALL_ABSOLUTE: - aarch64_load_symref_appropriately - (dest, imm, aarch64_classify_symbol (base, SYMBOL_CONTEXT_ADR)); + aarch64_load_symref_appropriately (dest, imm, sty); return; default: @@ -696,7 +710,7 @@ aarch64_expand_mov_immediate (rtx dest, rtx imm) } } - if ((CONST_INT_P (imm) aarch64_move_imm (INTVAL (imm), mode))) + if (CONST_INT_P (imm) aarch64_move_imm (INTVAL (imm), mode)) { emit_insn (gen_rtx_SET (VOIDmode, dest, imm)); return; diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index b399ab4..3834558 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2840,7 +2840,7 @@ (lo_sum:DI
[PATCH, AArch64] Implement ctz and clrsb standard patterns
I've implemented the following standard patterns: * clrsb * ctz Regression runs passed and I have added compilation tests for them, and clz as well. (Execution tests are covered by gcc/testsuite/gcc.c-torture/execute/builtin-bitops-1.c.) OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian 2012-09-18 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO. * config/aarch64/aarch64.md (clrsbmode2): New pattern. * config/aarch64/aarch64.md (rbitmode2): New pattern. * config/aarch64/aarch64.md (ctzmode2): New pattern. gcc/testsuite/ * gcc.target/aarch64/clrsb.c: New test. * gcc.target/aarch64/clz.c: New test. * gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. +(UNSPEC_RBIT86) ; Used in aarch64-simd.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr v8type clz) (set_attr mode MODE)]) +(define_insn clrsbmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))] + + cls\\t%w0, %w1 + [(set_attr v8type clz) + (set_attr mode MODE)]) + +(define_insn rbitmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_RBIT))] + + rbit\\t%w0, %w1 + [(set_attr v8type rbit) + (set_attr mode MODE)]) + +(define_expand ctzmode2 + [(match_operand:GPI 0 register_operand) + (match_operand:GPI 1 register_operand)] + + { +emit_insn (gen_rbitmode2 (operands[0], operands[1])); +emit_insn (gen_clzmode2 (operands[0], operands[0])); +DONE; + } +) + (define_insn *andmode3nr_compare0 [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clrsb(x); +} + +/* { dg-final { scan-assembler cls\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clz(x); +} + +/* { dg-final { scan-assembler clz\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ctz(x); +} + +/* { dg-final { scan-assembler rbit\tw } } */ +/* { dg-final { scan-assembler clz\tw } } */ +
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. +(UNSPEC_RBIT86) ; Used in aarch64-simd.md. The comment doesn't appear to be true. Fair point! I will fix that.
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX 83) ; Used in aarch64-simd.md. (UNSPEC_FMIN 84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64-simd.md. +(UNSPEC_RBIT 86) ; Used in aarch64-simd.md. The comment doesn't appear to be true. Fair point! I will fix that. New patch with comment fixed is attached. Now good to commit to aarch64-branch and aarch64-4.7-branch? Cheers, Iandiff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64.md. +(UNSPEC_RBIT86) ; Used in aarch64.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr v8type clz) (set_attr mode MODE)]) +(define_insn clrsbmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))] + + cls\\t%w0, %w1 + [(set_attr v8type clz) + (set_attr mode MODE)]) + +(define_insn rbitmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_RBIT))] + + rbit\\t%w0, %w1 + [(set_attr v8type rbit) + (set_attr mode MODE)]) + +(define_expand ctzmode2 + [(match_operand:GPI 0 register_operand) + (match_operand:GPI 1 register_operand)] + + { +emit_insn (gen_rbitmode2 (operands[0], operands[1])); +emit_insn (gen_clzmode2 (operands[0], operands[0])); +DONE; + } +) + (define_insn *andmode3nr_compare0 [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clrsb(x); +} + +/* { dg-final { scan-assembler cls\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_clz(x); +} + +/* { dg-final { scan-assembler clz\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ctz(x); +} + +/* { dg-final { scan-assembler rbit\tw } } */ +/* { dg-final { scan-assembler clz\tw } } */ +
RE: [PATCH, AArch64] Implement ctz and clrsb standard patterns
New version attached with better formatted test cases. OK for aarch64-branch and aarch64-4.7-branch? Cheers, Ian - 2012-09-18 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.h: Define CTZ_DEFINED_VALUE_AT_ZERO. * config/aarch64/aarch64.md (clrsbmode2): New pattern. * config/aarch64/aarch64.md (rbitmode2): New pattern. * config/aarch64/aarch64.md (ctzmode2): New pattern. gcc/testsuite/ * gcc.target/aarch64/clrsb.c: New test. * gcc.target/aarch64/clz.c: New test. * gcc.target/aarch64/ctz.c: New test.diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 5d121fa..abf96c5 100644 --- a/gcc/config/aarch64/aarch64.h +++ b/gcc/config/aarch64/aarch64.h @@ -703,6 +703,8 @@ do { \ #define CLZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) +#define CTZ_DEFINED_VALUE_AT_ZERO(MODE, VALUE) \ + ((VALUE) = ((MODE) == SImode ? 32 : 64), 2) #define INCOMING_RETURN_ADDR_RTX gen_rtx_REG (Pmode, LR_REGNUM) diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 33815ff..5278957 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -153,6 +153,8 @@ (UNSPEC_CMTST 83) ; Used in aarch64-simd.md. (UNSPEC_FMAX83) ; Used in aarch64-simd.md. (UNSPEC_FMIN84) ; Used in aarch64-simd.md. +(UNSPEC_CLS 85) ; Used in aarch64.md. +(UNSPEC_RBIT86) ; Used in aarch64.md. ] ) @@ -2128,6 +2130,33 @@ [(set_attr v8type clz) (set_attr mode MODE)]) +(define_insn clrsbmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_CLS))] + + cls\\t%w0, %w1 + [(set_attr v8type clz) + (set_attr mode MODE)]) + +(define_insn rbitmode2 + [(set (match_operand:GPI 0 register_operand =r) + (unspec:GPI [(match_operand:GPI 1 register_operand r)] UNSPEC_RBIT))] + + rbit\\t%w0, %w1 + [(set_attr v8type rbit) + (set_attr mode MODE)]) + +(define_expand ctzmode2 + [(match_operand:GPI 0 register_operand) + (match_operand:GPI 1 register_operand)] + + { +emit_insn (gen_rbitmode2 (operands[0], operands[1])); +emit_insn (gen_clzmode2 (operands[0], operands[0])); +DONE; + } +) + (define_insn *andmode3nr_compare0 [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/clrsb.c b/gcc/testsuite/gcc.target/aarch64/clrsb.c new file mode 100644 index 000..a75dfa0 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clrsb.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_clrsb (x); +} + +/* { dg-final { scan-assembler cls\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/clz.c b/gcc/testsuite/gcc.target/aarch64/clz.c new file mode 100644 index 000..66e2d29 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/clz.c @@ -0,0 +1,9 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_clz (x); +} + +/* { dg-final { scan-assembler clz\tw } } */ diff --git a/gcc/testsuite/gcc.target/aarch64/ctz.c b/gcc/testsuite/gcc.target/aarch64/ctz.c new file mode 100644 index 000..15a2473 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ctz.c @@ -0,0 +1,11 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest (unsigned int x) +{ + return __builtin_ctz (x); +} + +/* { dg-final { scan-assembler rbit\tw } } */ +/* { dg-final { scan-assembler clz\tw } } */ +
RE: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns
OK for 4.7 as well? -Original Message- From: Richard Earnshaw Sent: 14 September 2012 18:18 To: Ian Bolton Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, AArch64] Implement fnma, fms and fnms standard patterns On 14/09/12 18:05, Ian Bolton wrote: The following standard pattern names were implemented by simply renaming some existing patterns: * fnma * fms * fnms I have added an extra pattern for when we don't care about signed zero, so we can do -fma (a,b,c) more efficiently. Regression testing all passed. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (fmsubmode4): Renamed to fnmamode4. * config/aarch64/aarch64.md (fnmsubmode4): Renamed to fmsmode4. * config/aarch64/aarch64.md (fnmaddmode4): Renamed to fnmsmode4. * config/aarch64/aarch64.md (*fnmaddmode4): New pattern. testsuite/ * gcc.target/aarch64/fmadd.c: Added extra tests. * gcc.target/aarch64/fnmadd-fastmath.c: New test. OK. R.
RE: [PATCH, AArch64] Implement ffs standard pattern
OK for aarch64-4.7-branch as well? -Original Message- From: Richard Earnshaw Sent: 14 September 2012 18:31 To: Ian Bolton Cc: gcc-patches@gcc.gnu.org Subject: Re: [PATCH, AArch64] Implement ffs standard pattern On 14/09/12 16:26, Ian Bolton wrote: I've implemented the standard pattern ffs, which leads to __builtin_ffs being generated with 4 instructions instead of 5 instructions. Regression tests and my new test pass. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (csinc3mode): Make it into a named pattern. * config/aarch64/aarch64.md (ffsmode2): New pattern. testsuite/ * gcc.target/aarch64/ffs.c: New test. OK. R.
[PATCH, AArch64] Implement ffs standard pattern
I've implemented the standard pattern ffs, which leads to __builtin_ffs being generated with 4 instructions instead of 5 instructions. Regression tests and my new test pass. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (csinc3mode): Make it into a named pattern. * config/aarch64/aarch64.md (ffsmode2): New pattern. testsuite/ * gcc.target/aarch64/ffs.c: New test.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 5278957..dfdba42 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2021,7 +2021,7 @@ [(set_attr v8type csel) (set_attr mode MODE)]) -(define_insn *csinc3mode_insn +(define_insn csinc3mode_insn [(set (match_operand:GPI 0 register_operand =r) (if_then_else:GPI (match_operator:GPI 1 aarch64_comparison_operator @@ -2157,6 +2157,21 @@ } ) +(define_expand ffsmode2 + [(match_operand:GPI 0 register_operand) + (match_operand:GPI 1 register_operand)] + + { +rtx ccreg = aarch64_gen_compare_reg (EQ, operands[1], const0_rtx); +rtx x = gen_rtx_NE (VOIDmode, ccreg, const0_rtx); + +emit_insn (gen_rbitmode2 (operands[0], operands[1])); +emit_insn (gen_clzmode2 (operands[0], operands[0])); +emit_insn (gen_csinc3mode_insn (operands[0], x, ccreg, operands[0], const0_rtx)); +DONE; + } +) + (define_insn *andmode3nr_compare0 [(set (reg:CC CC_REGNUM) (compare:CC diff --git a/gcc/testsuite/gcc.target/aarch64/ffs.c b/gcc/testsuite/gcc.target/aarch64/ffs.c new file mode 100644 index 000..a344761 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/ffs.c @@ -0,0 +1,12 @@ +/* { dg-do compile } */ +/* { dg-options -O2 } */ + +unsigned int functest(unsigned int x) +{ + return __builtin_ffs(x); +} + +/* { dg-final { scan-assembler cmp\tw } } */ +/* { dg-final { scan-assembler rbit\tw } } */ +/* { dg-final { scan-assembler clz\tw } } */ +/* { dg-final { scan-assembler csinc\tw } } */
[PATCH, AArch64] Implement fnma, fms and fnms standard patterns
The following standard pattern names were implemented by simply renaming some existing patterns: * fnma * fms * fnms I have added an extra pattern for when we don't care about signed zero, so we can do -fma (a,b,c) more efficiently. Regression testing all passed. OK to commit? Cheers, Ian 2012-09-14 Ian Bolton ian.bol...@arm.com gcc/ * config/aarch64/aarch64.md (fmsubmode4): Renamed to fnmamode4. * config/aarch64/aarch64.md (fnmsubmode4): Renamed to fmsmode4. * config/aarch64/aarch64.md (fnmaddmode4): Renamed to fnmsmode4. * config/aarch64/aarch64.md (*fnmaddmode4): New pattern. testsuite/ * gcc.target/aarch64/fmadd.c: Added extra tests. * gcc.target/aarch64/fnmadd-fastmath.c: New test. diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index 3fbebf7..33815ff 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2506,7 +2506,7 @@ (set_attr mode MODE)] ) -(define_insn *fmsubmode4 +(define_insn fnmamode4 [(set (match_operand:GPF 0 register_operand =w) (fma:GPF (neg:GPF (match_operand:GPF 1 register_operand w)) (match_operand:GPF 2 register_operand w) @@ -2517,7 +2517,7 @@ (set_attr mode MODE)] ) -(define_insn *fnmsubmode4 +(define_insn fmsmode4 [(set (match_operand:GPF 0 register_operand =w) (fma:GPF (match_operand:GPF 1 register_operand w) (match_operand:GPF 2 register_operand w) @@ -2528,7 +2528,7 @@ (set_attr mode MODE)] ) -(define_insn *fnmaddmode4 +(define_insn fnmsmode4 [(set (match_operand:GPF 0 register_operand =w) (fma:GPF (neg:GPF (match_operand:GPF 1 register_operand w)) (match_operand:GPF 2 register_operand w) @@ -2539,6 +2539,18 @@ (set_attr mode MODE)] ) +;; If signed zeros are ignored, -(a * b + c) = -a * b - c. +(define_insn *fnmaddmode4 + [(set (match_operand:GPF 0 register_operand) + (neg:GPF (fma:GPF (match_operand:GPF 1 register_operand) + (match_operand:GPF 2 register_operand) + (match_operand:GPF 3 register_operand] + !HONOR_SIGNED_ZEROS (MODEmode) TARGET_FLOAT + fnmadd\\t%s0, %s1, %s2, %s3 + [(set_attr v8type fmadd) + (set_attr mode MODE)] +) + ;; --- ;; Floating-point conversions ;; --- diff --git a/gcc/testsuite/gcc.target/aarch64/fmadd.c b/gcc/testsuite/gcc.target/aarch64/fmadd.c index 3b4..39975db 100644 --- a/gcc/testsuite/gcc.target/aarch64/fmadd.c +++ b/gcc/testsuite/gcc.target/aarch64/fmadd.c @@ -4,15 +4,52 @@ extern double fma (double, double, double); extern float fmaf (float, float, float); -double test1 (double x, double y, double z) +double test_fma1 (double x, double y, double z) { return fma (x, y, z); } -float test2 (float x, float y, float z) +float test_fma2 (float x, float y, float z) { return fmaf (x, y, z); } +double test_fnma1 (double x, double y, double z) +{ + return fma (-x, y, z); +} + +float test_fnma2 (float x, float y, float z) +{ + return fmaf (-x, y, z); +} + +double test_fms1 (double x, double y, double z) +{ + return fma (x, y, -z); +} + +float test_fms2 (float x, float y, float z) +{ + return fmaf (x, y, -z); +} + +double test_fnms1 (double x, double y, double z) +{ + return fma (-x, y, -z); +} + +float test_fnms2 (float x, float y, float z) +{ + return fmaf (-x, y, -z); +} + /* { dg-final { scan-assembler-times fmadd\td\[0-9\] 1 } } */ /* { dg-final { scan-assembler-times fmadd\ts\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fmsub\td\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fmsub\ts\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fnmsub\td\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fnmsub\ts\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fnmadd\td\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fnmadd\ts\[0-9\] 1 } } */ + diff --git a/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c new file mode 100644 index 000..9c115df --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/fnmadd-fastmath.c @@ -0,0 +1,19 @@ +/* { dg-do compile } */ +/* { dg-options -O2 -ffast-math } */ + +extern double fma (double, double, double); +extern float fmaf (float, float, float); + +double test_fma1 (double x, double y, double z) +{ + return - fma (x, y, z); +} + +float test_fma2 (float x, float y, float z) +{ + return - fmaf (x, y, z); +} + +/* { dg-final { scan-assembler-times fnmadd\td\[0-9\] 1 } } */ +/* { dg-final { scan-assembler-times fnmadd\ts\[0-9\] 1 } } */ +
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
Can you send me the test case you were looking at for this? See attached. (Most of it is superfluous, but the point is that we are not using the address to do a memory access.) Cheers, Ian constant-test1.c Description: Binary data
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
On 2012-08-31 07:49, Ian Bolton wrote: +(define_split + [(set (match_operand:DI 0 register_operand =r) + (const:DI (plus:DI (match_operand:DI 1 aarch64_valid_symref S) + (match_operand:DI 2 const_int_operand i] + + [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1) + (match_dup 2) + (set (match_dup 0) (lo_sum:DI (match_dup 0) +(const:DI (plus:DI (match_dup 1) + (match_dup 2)] + +) You ought not need this as a separate split, since (CONST ...) should be handled exactly like (SYMBOL_REF). I see in combine.c that it does get done for a MEM (which is how my earlier patch worked), but not when it's being used for other reasons (hence the title of this email). See below for current code from find_split_point: case MEM: #ifdef HAVE_lo_sum /* If we have (mem (const ..)) or (mem (symbol_ref ...)), split it using LO_SUM and HIGH. */ if (GET_CODE (XEXP (x, 0)) == CONST || GET_CODE (XEXP (x, 0)) == SYMBOL_REF) { enum machine_mode address_mode = targetm.addr_space.address_mode (MEM_ADDR_SPACE (x)); SUBST (XEXP (x, 0), gen_rtx_LO_SUM (address_mode, gen_rtx_HIGH (address_mode, XEXP (x, 0)), XEXP (x, 0))); return XEXP (XEXP (x, 0), 0); } #endif If I don't use my split pattern, I could alter combine to remove the requirement that parent is a MEM. What do you think? Also note that constraints (=r etc) aren't used for splits. If I keep the pattern, I will remove the constraints. Thanks for the pointers in this regard. Cheers, Ian
RE: [PATCH, AArch64] Allow symbol+offset even if not being used for memory access
From: Richard Henderson [mailto:r...@redhat.com] On 09/06/2012 08:06 AM, Ian Bolton wrote: If I don't use my split pattern, I could alter combine to remove the requirement that parent is a MEM. What do you think? I merely question the calling out of CONST as special. Either you've got some pattern that handles SYMBOL_REF the same way, or you're missing something. Oh, I understand now. Thanks for clarifying. Some digging has shown me that the transformation keys off the equivalence, as highlighted below. It's always phrased in terms of a const and never a symbol_ref. after ud_dce: 6 r82:DI=high(`arr') 7 r81:DI=r82:DI+low(`arr') REG_DEAD: r82:DI REG_EQUAL: `arr' 8 r80:DI=r81:DI+0xc REG_DEAD: r81:DI REG_EQUAL: const(`arr'+0xc) - this equivalence after combine: 7 r80:DI=high(const(`arr'+0xc)) 8 r80:DI=r80:DI+low(const(`arr'+0xc)) REG_EQUAL: const(`arr'+0xc) - this equivalence Based on that, and assuming I remove the constraints on the pattern, would you say the patch is worthy of commit? Thanks, Ian
[PATCH, AArch64] Allow symbol+offset even if not being used for memory access
Hi, This patch builds on a previous one that allowed symbol+offset as symbol references for memory accesses. It allows us to have symbol+offset even when no memory access is apparent. It reduces codesize for cases such as this one: int arr[100]; uint64_t foo (uint64_t a) { uint64_t const z = 1234567ll32+7; uint64_t const y = (uint64_t) arr[3]; return y + a + z; } Before the patch, the code looked like this: adrpx2, arr mov x1, 74217034874880 add x2, x2, :lo12:arr add x2, x2, 12 movkx1, 2411, lsl 48 add x1, x2, x1 add x0, x1, x0 ret Now, it looks like this: adrpx1, arr+12 mov x2, 74217034874880 movkx2, 2411, lsl 48 add x1, x1, :lo12:arr+12 add x1, x1, x2 add x0, x1, x0 ret Testing shows no regressions. OK to commit? 2012-08-31 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.md: New pattern.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md index a00d3f0..de9c927 100644 --- a/gcc/config/aarch64/aarch64.md +++ b/gcc/config/aarch64/aarch64.md @@ -2795,7 +2795,7 @@ (lo_sum:DI (match_operand:DI 1 register_operand r) (match_operand 2 aarch64_valid_symref S)))] - add\\t%0, %1, :lo12:%2 + add\\t%0, %1, :lo12:%a2 [(set_attr v8type alu) (set_attr mode DI)] @@ -2890,6 +2890,20 @@ [(set_attr length 0)] ) +(define_split + [(set (match_operand:DI 0 register_operand =r) + (const:DI (plus:DI (match_operand:DI 1 aarch64_valid_symref S) + (match_operand:DI 2 const_int_operand i] + + [(set (match_dup 0) (high:DI (const:DI (plus:DI (match_dup 1) + (match_dup 2) + (set (match_dup 0) (lo_sum:DI (match_dup 0) +(const:DI (plus:DI (match_dup 1) + (match_dup 2)] + +) + + ;; AdvSIMD Stuff (include aarch64-simd.md)
[PATCH, AArch64] Allow symbol+offset as symbolic constant expression
Hi, This patch reduces codesize for cases such as this one: int arr[100]; int foo () { return arr[10]; } Before the patch, the code looked like this: adrp x0, arr add x0, x0, :lo12:arr ldr w0, [x0,40] Now, it looks like this: adrp x0, arr+40 ldr w0, [x0,#:lo12:arr+40] Some workloads have seen up to 1K reduction in code size. OK to commit? Cheers, Ian 2012-07-06 Ian Bolton ian.bol...@arm.com * gcc/config/aarch64/aarch64.c (aarch64_print_operand): Use aarch64_classify_symbolic_expression for classifying operands. * gcc/config/aarch64/aarch64.c (aarch64_classify_symbolic_expression): New function. * gcc/config/aarch64/aarch64.c (aarch64_symbolic_constant_p): New function. * gcc/config/aarch64/predicates.md (aarch64_valid_symref): Symbol with constant offset is a valid symbol reference.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 542c1e0..53c238a 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -2820,6 +2820,17 @@ aarch64_symbolic_address_p (rtx x) return GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF; } +/* Classify the base of symbolic expression X, given that X appears in + context CONTEXT. */ +static enum aarch64_symbol_type +aarch64_classify_symbolic_expression (rtx x, enum aarch64_symbol_context context) +{ + rtx offset; + split_const (x, x, offset); + return aarch64_classify_symbol (x, context); +} + + /* Return TRUE if X is a legitimate address for accessing memory in mode MODE. */ static bool @@ -3227,7 +3238,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) if (GET_CODE (x) == HIGH) x = XEXP (x, 0); - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_GOT: asm_fprintf (asm_out_file, :got:); @@ -3256,7 +3267,7 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'L': - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_GOT: asm_fprintf (asm_out_file, :lo12:); @@ -3285,7 +3296,8 @@ aarch64_print_operand (FILE *f, rtx x, char code) break; case 'G': - switch (aarch64_classify_symbol (x, SYMBOL_CONTEXT_ADR)) + + switch (aarch64_classify_symbolic_expression (x, SYMBOL_CONTEXT_ADR)) { case SYMBOL_SMALL_TPREL: asm_fprintf (asm_out_file, :tprel_hi12:); @@ -4746,6 +4758,8 @@ aarch64_classify_tls_symbol (rtx x) } } +/* Return the method that should be used to access SYMBOL_REF or + LABEL_REF X in context CONTEXT. */ enum aarch64_symbol_type aarch64_classify_symbol (rtx x, enum aarch64_symbol_context context ATTRIBUTE_UNUSED) @@ -4817,7 +4831,23 @@ aarch64_classify_symbol (rtx x, return SYMBOL_FORCE_TO_MEM; } +/* Return true if X is a symbolic constant that can be used in context + CONTEXT. If it is, store the type of the symbol in *SYMBOL_TYPE. */ + +bool +aarch64_symbolic_constant_p (rtx x, enum aarch64_symbol_context context, +enum aarch64_symbol_type *symbol_type) +{ + rtx offset; + split_const (x, x, offset); + if (GET_CODE (x) == SYMBOL_REF || GET_CODE (x) == LABEL_REF) +*symbol_type = aarch64_classify_symbol (x, context); + else +return false; + /* No checking of offset at this point. */ + return true; +} bool aarch64_constant_address_p (rtx x) diff --git a/gcc/config/aarch64/predicates.md b/gcc/config/aarch64/predicates.md index 7089e8b..328e5cf 100644 --- a/gcc/config/aarch64/predicates.md +++ b/gcc/config/aarch64/predicates.md @@ -114,8 +114,12 @@ (match_test mode == DImode CONSTANT_ADDRESS_P (op (define_predicate aarch64_valid_symref - (and (match_code symbol_ref, label_ref) - (match_test aarch64_classify_symbol (op, SYMBOL_CONTEXT_ADR) != SYMBOL_FORCE_TO_MEM))) + (match_code const, symbol_ref, label_ref) +{ + enum aarch64_symbol_type symbol_type; + return (aarch64_symbolic_constant_p (op, SYMBOL_CONTEXT_ADR, symbol_type) + symbol_type != SYMBOL_FORCE_TO_MEM); +}) (define_predicate aarch64_tls_ie_symref (match_code symbol_ref, label_ref)
Using -save-temps and @file should also save the intermediate @file used by the driver?
Does anyone have some thoughts they'd like to share on this: When you compile anything using @file support, the driver assumes @file (at_file_supplied is true) is allowed and may pass options to the linker via @file using a *temporary* file. When -save-temps is also used, the temporary @file passed to the linker should also be saved. Saving the temporary @file passed to the linker allows a developer to re-run just the collect2/ld command. On trunk this means that gcc/gcc.c (create_at_file) should honour the save_temps_flag, saving the temporary @file for later analysis or use. From: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44273
RE: Move cgraph_node_set and varpool_node_set out of GGC and make them use pointer_map
Hi, I always considered the cgrpah_node_set/varpool_node_set to be overengineered but they also turned out to be quite ineffective since we do quite a lot of queries into them during stremaing out. This patch moves them to pointer_map, like I did for streamer cache. While doing so I needed to get the structure out of GGC memory since pointer_map is not ggc firendly. This is not a deal at all, because the sets can only point to live cgraph/varpool entries anyway. Pointing to removed ones would lead to spectacular failures in any case. Bootstrapped/regtested x86_64-linux, OK? Honza * cgraph.h (cgraph_node_set_def, varpool_node_set_def): Move out of GTY; replace hash by pointer map. (cgraph_node_set_element_def, cgraph_node_set_element, const_cgraph_node_set_element, varpool_node_set_element_def, varpool_node_set_element, const_varpool_node_set_element): Remove. (free_cgraph_node_set, free_varpool_node_set): New function. (cgraph_node_set_size, varpool_node_set_size): Use vector size. * tree-emutls.c: Free varpool node set. * ipa-utils.c (cgraph_node_set_new, cgraph_node_set_add, cgraph_node_set_remove, cgraph_node_set_find, dump_cgraph_node_set, debug_cgraph_node_set, free_cgraph_node_set, varpool_node_set_new, varpool_node_set_add, varpool_node_set_remove, varpool_node_set_find, dump_varpool_node_set, free_varpool_node_set, debug_varpool_node_set): Move here from ipa.c; implement using pointer_map * ipa.c (cgraph_node_set_new, cgraph_node_set_add, cgraph_node_set_remove, cgraph_node_set_find, dump_cgraph_node_set, debug_cgraph_node_set, varpool_node_set_new, varpool_node_set_add, varpool_node_set_remove, varpool_node_set_find, dump_varpool_node_set, debug_varpool_node_set): Move to ipa-uitls.c. * lto/lto.c (ltrans_partition_def): Remove GTY annotations. (ltrans_partitions): Move to heap. (new_partition): Update. (free_ltrans_partitions): New function. (lto_wpa_write_files): Use it. * passes.c (ipa_write_summaries): Update. This causes cross and native build of ARM Linux toolchain to fail: gcc -c -g -O2 -DIN_GCC -DCROSS_DIRECTORY_STRUCTURE -W -Wall -Wwrite- strings -Wcast-qual -Wstrict-prototypes -Wmissing-prototypes -Wmissing- format-attribute -Wold-style-definition -Wc++-compat -fno-common - DHAVE_CONFIG_H -I. -Ilto - I/work/source/gcc - I/work/source/gcc/lto - I/work/source/gcc/../include - I/work/source/gcc/../libcpp/include - I/work/source/gcc/../libdecnumber - I/work/source/gcc/../libdecnumber/dpd -I../libdecnumber /work/source/gcc/lto/lto.c -o lto/lto.o /work/source/gcc/lto/lto.c:1163: warning: function declaration isn't a prototype /work/source/gcc/lto/lto.c: In function 'free_ltrans_partitions': /work/source/gcc/lto/lto.c:1163: warning: old-style function definition /work/source/gcc/lto/lto.c:1168: error: 'struct ltrans_partition_def' has no member named 'cgraph' /work/source/gcc/lto/lto.c:1168: error: 'set' undeclared (first use in this function) /work/source/gcc/lto/lto.c:1168: error: (Each undeclared identifier is reported only once /work/source/gcc/lto/lto.c:1168: error: for each function it appears in.) /work/source/gcc/lto/lto.c:1171: warning: implicit declaration of function 'VEC_latrans_partition_heap_free' make[2]: *** [lto/lto.o] Error 1 make[2]: *** Waiting for unfinished jobs rm gcov.pod gfdl.pod cpp.pod fsf-funding.pod gcc.pod make[2]: Leaving directory `/work/cross-build/trunk-r173334- thumb/arm-none-linux-gnueabi/obj/gcc1/gcc' make[1]: *** [all-gcc] Error 2 make[1]: Leaving directory `/work/cross-build/trunk-r173334- thumb/arm-none-linux-gnueabi/obj/gcc1' make: *** [all] Error 2 + exit But I see you fixed it up soon after (r173336), so no action is required now, but I figured it was worth letting people know anyway. Cheers, Ian
RE: Link error
Phung Nguyen wrote: I am trying to build cross compiler for xc16x. I built successfully binutils 2.18; gcc 4.0 and newlib 1.18. Everything is fine when compiling a simple C file without any library call. It is also fine when making a simple call to printf like printf(Hello world). However, i got error message from linker when call printf(i=%i,i); I don't know the answer, but I think you are more likely to get one if you post to gcc-h...@gcc.gnu.org. The gcc@gcc.gnu.org list is for people developing gcc, rather than only building or using it. I hope you find your answer soon. Best regards, Ian
Question about tree-switch-conversion.c
I am in the process of fixing PR44328 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44328) The problem is that gen_inbound_check in tree-switch-conversion.c subtracts info.range_min from info.index_expr, which can cause the MIN and MAX values for info.index_expr to become invalid. For example: typedef enum { FIRST = 0, SECOND, THIRD, FOURTH } ExampleEnum; int dummy (const ExampleEnum e) { int mode = 0; switch (e) { case SECOND: mode = 20; break; case THIRD: mode = 30; break; case FOURTH: mode = 40; break; } return mode; } tree-switch-conversion would like to create a lookup table for this, so that SECOND maps to entry 0, THIRD maps to entry 1 and FOURTH maps to entry 2. It achieves this by subtracting SECOND from index_expr. The problem is that after the subtraction, the type of the result can have a value outside the range 0-3. Later, when tree-vrp.c sees the inbound check as being = 2 with a possible range for the type as 0-3, it converts the =2 into a != 3, which is totally wrong. If e==FIRST, then we can end up looking for entry 255 in the lookup table! I think the solution is to update the type of the result of the subtraction to show that it is no longer in the range 0-3, but I have had trouble implementing this. The attached patch (based off 4.5 branch) shows my current approach, but I ran into LTO issues: lto1: internal compiler error: in lto_get_pickled_tree, at lto-streamer-in.c I am guessing this is because the debug info for the type does not match the new range I have set for it. Is there a *right* way to update the range such that LTO doesn't get unhappy? (Maybe a cast with fold_convert_loc would be right?) pr44328.gcc4.5.fix.patch Description: Binary data
Making BB Reorder Smarter for -Os
Bugzilla 41004 calls for a more -Os-friendly algorithm for BB Reorder, and I'm hoping I can meet this challenge. But I need your help! If you have any existing ideas or thoughts that could help me get closer to a sensible heuristic sooner, then please post them to this list. In the mean time, here's my progress so far: My first thought was to limit BB Reorder to hot functions, as identified by gcov, so we could get maximum execution time benefit for minimised code size impact. Based on benchmarking I've done so far, this looks to be a winner, but it only works when profile data is available. Without profile data, we face two main problems: 1) We cannot easily estimate how many times we will execute our more efficient code-layout, so we can't do an accurate trade-off versus the code size increase. 2) The traces that BB Reorder constructs will be based on static branch prediction, rather than true dynamic flows of execution, so the new layout may not be the best one in practice. We can address #1 by tagging functions as hot (using attributes), but that may not always be possible and it does not guarantee that we will get minimal codesize increases, which is the main aim of this work. I'm not sure how #2 can be addressed, so I'm planning to sidestep it completely, since the problem isn't really the performance pay-off but the codesize increase that usually comes with each new layout of a function that BB Reorder makes. My current plan is to characterise a function within find_traces() (looking at things like the number of traces, edge probabilities and frequencies, etc) and only call connect_traces() to effect the reordering change if these characteristics suggest that minimal code disruption will occur and/or maximum performance pay-off. Thanks for reading and I look forward to your input! Cheers, Ian
RE: BB reorder forced off for -Os
We're not able to enable BB reordering with -Os. The behaviour is hard-coded via this if statement in rest_of_handle_reorder_blocks(): if ((flag_reorder_blocks || flag_reorder_blocks_and_partition) /* Don't reorder blocks when optimizing for size because extra jump insns may be created; also barrier may create extra padding. More correctly we should have a block reordering mode that tried to minimize the combined size of all the jumps. This would more or less automatically remove extra jumps, but would also try to use more short jumps instead of long jumps. */ optimize_function_for_speed_p (cfun)) { reorder_basic_blocks (); If you comment out the optimize_function_for_speed_p (cfun) then BB reordering takes places as desired (although this isn't a solution obviously). In a private message Ian indicated that this had a small impact for the ISA he's working with but a significant performance gain. I tried the same thing with the ISA I work on (Ubicom32) and this change typically increased code sizes by between 0.1% and 0.3% but improved performance by anything from 0.8% to 3% so on balance this is definitely winning for most of our users (this for a couple of benchmarks, the Linux kernel, busybox and smbd). It should be noted that commenting out the conditional to do with optimising for speed will make BB reordering come on for all functions, even cold ones, so I think whatever gains have come from making this hacky change could increase further if BB reordering is set to only come on for hot functions when compiling with -Os. (Certainly the code size increases could be minimised, whilst hopefully retaining the performance gains.) Note that I am in no way suggesting this should be the default behaviour for -Os, but that it should be switchable via the flags just like other optimisations are. But, once it is switchable, I expect choosing to turn it on for -Os should not cause universal enabling of BB reordering for every function (as opposed to the current universal disabling of BB reordering for every function), but a sensible half-way point, based on heat, so that you get the performance wins with minimal code size increases on selected functions. Cheers, Ian
BB reorder forced off for -Os
Is there any reason why BB reorder has been disabled in bb-reorder.c for -Os, such that you can't even turn it on with -freorder-blocks? From what I've heard on this list in recent days, BB reorder gives good performance wins such that most people would still want it on even if it did increase code size a little. Cheers, Ian
RE: Understanding Scheduling
Enabling BB-reorder only if profile info is available, is not the right way to go. The compiler really doesn't place blocks in sane places without it -- and it shouldn't have to, either. For example if you split an edge at some point, the last thing you want to worry about, is where the new basic block is going to end up. There are actually a few bugs in Bugzilla about BB-reorder, FWIW. I've done a few searches in Bugzilla and am not sure if I have found the BB reorder bugs you are referring to. The ones I have found are: 16797: Opportunity to remove unnecessary load instructions 41396: missed space optimization related to basic block reorder 21002: RTL prologue and basic-block reordering pessimizes delay-slot filling. (If you can recall any others, I'd appreciate hearing of them.) Based on 41396, it looks like BB reorder is disabled for -Os. But you said in your post above that the compiler really doesn't place blocks in sane places without it, so does that mean that we could probably increase performance for -Os if BB reorder was (improved) and enabled for -Os? Cheers, Ian
Understanding Scheduling
Hi folks! I've moved on from register allocation (see Understanding IRA thread) and onto scheduling. In particular, I am investigating the effectiveness of the sched1 pass on our architecture and the associated interblock-scheduling optimisation. Let's start with sched1 ... For our architecture at least, it seems like Richard Earnshaw is right that sched1 is generally bad when you are using -Os, because it can increase register pressure and cause extra spill/fill code when you move independent instructions in between dependent instructions. For example: LOAD c2,c1[0] LOAD c3,c1[1] ADD c2,c2,c3 # depends on LOAD above it (might stall) LOAD c3,c1[2] ADD c2,c2,c3 # depends on LOAD above it (might stall) LOAD c3,c1[3] ADD c2,c2,c3 # depends on LOAD above it (might stall) LOAD c3,c1[4] ADD c2,c2,c3 # depends on LOAD above it (might stall) might become: LOAD c2,c1[0] LOAD c3,c1[1] LOAD c4,c1[2] # independent of first two LOADS LOAD c5,c1[3] # independent of first two LOADS ADD c2,c2,c3 # not dependent on preceding two insns (avoids stall) LOAD c3,c1[4] ADD c2,c2,c4 # not dependent on preceding three insns (avoids stall) ... This is a nice effect if your LOAD instructions have a latency of 3, so this should lead to performance increases, and indeed this is what I see for some low-reg-pressure Nullstone cases. Turning sched1 off therefore causes a regression on these cases. However, this pipeline-type effect may increase your register pressure such that caller-save regs are required and extra spill/fill code needs to be generated. This happens for some other Nullstone cases, and so it is good to have sched1 turned off for them! It's therefore looking like some kind of clever hybrid is required. I mention all this because I was wondering which other architectures have turned off sched1 for -Os? More importantly, I was wondering if anyone else had considered creating some kind of clever hybrid that only uses sched1 when it will increase performance without increasing register pressure? Or perhaps I could make a heuristic based on the balanced-ness of the tree? (I see sched1 does a lot better if the tree is balanced, since it has more options to play with.) Now onto interblock-scheduling ... As we all know, you can't have interblock-scheduling enabled unless you use the sched1 pass, so if sched1 is off then interblock is irrelevant. For now, let's assume we are going to make some clever hybrid that allows sched1 when we think it will increase performance for Os and we are going to keep sched1 on for O2 and O3. As I understand it, interblock-scheduling enlarges the scope of sched1, such that you can insert independent insns from a completely different block in between dependent insns in this block. As well as potentially amortizing stalls on high latency insns, we also get the chance to do meatier work in the destination block and leave less to do in the source block. I don't know if this is a deliberate effect of interblock-scheduling or if it is just a happy side-effect. Anyway, the reason I mention interblock-scheduling is that I see it doing seemingly intelligent moves, but then the later BB-reorder pass is juggling blocks around such that we end up with extra code inside hot loops! I assume this is because the scheduler and BB-reorderer are largely ignorant of each other, and so good intentions on the part of the former can be scuppered by the latter. I was wondering if anyone else has witnessed this madness on their architecture? Maybe it is a bug with BB-reorder? Or maybe it should only be enabled when function profiling information (e.g. gcov) is available? Or maybe it is not a high-priority thing for anyone to think about because no one uses interblock-scheduling? If anyone can shed some light on the above, I'd greatly appreciate it. For now, I will continue my experiments with selective enabling of sched1 for -Os. Best regards, Ian
RE: Understanding Scheduling
Let's start with sched1 ... For our architecture at least, it seems like Richard Earnshaw is right that sched1 is generally bad when you are using -Os, because it can increase register pressure and cause extra spill/fill code when you move independent instructions in between dependent instructions. Please note that Vladimir Makarov implemented register pressure-aware sched1 for GCC 4.5, activated with -fsched-pressure. I thought I should mention this because your e-mail omits it completely, so it's hard to tell whether you tested it. Hi Alexander, We are on GCC 4.4 at the moment. I don't see us moving up in the near future, but I could apply the patch and see what it does for us. Cheers, Ian
RE: Understanding Scheduling
I mention all this because I was wondering which other architectures have turned off sched1 for -Os? More importantly, I was wondering if anyone else had considered creating some kind of clever hybrid that only uses sched1 when it will increase performance without increasing register pressure? http://gcc.gnu.org/ml/gcc-patches/2009-09/msg3.html Another problem is that sched1 for architectures with few registers can result in reload failure. I tried to fix this in the patch mentioned above but I am not sure it is done for all targets and all possible programs. The right solution for this would be implementing hard register spills in the reload. I don't think we have so few registers that reload failure will occur, so it might be worth me trying this. The mentioned above code does not work for RA based on priority coloring because register pressure calculation for intersected or nested classes has a little sense. Hmm. Thanks for mentioning that. As you might recall, we are using priority coloring at the moment because it yielded better performance than Chaitin-Briggs. Well, the real reason CB was rejected was that we were already using Priority and so moving to CB would cause sufficient disruption such that performance increases and decreases would be inevitable. I did get some gains, but I also got regressions that we couldn't absorb at the time I did the work. I might be revisiting CB soon though, as it did tend to yield smaller code, which is becoming more important to us. If scheduling for the target is very important (as for itanium or in-order execution power6), I'd recommend to look at the selective scheduler. I don't think scheduling is highly important for us, but I will take a look at the selective scheduler. Or perhaps I could make a heuristic based on the balanced-ness of the tree? (I see sched1 does a lot better if the tree is balanced, since it has more options to play with.) The register pressure is already mostly minimized when shed1 starts to work. I guess this is a factor in the unbalancedness of the tree. The more you balance it, the more likely it will get wider and require more registers. But the wider and more balanced, the more options for sched1 and the more chance of a performance win (assuming the increase in reg pressure does not outweigh the scheduling performance win.) Now onto interblock-scheduling ... As we all know, you can't have interblock-scheduling enabled unless you use the sched1 pass, so if sched1 is off then interblock is irrelevant. For now, let's assume we are going to make some clever hybrid that allows sched1 when we think it will increase performance for Os and we are going to keep sched1 on for O2 and O3. As I understand it, interblock-scheduling enlarges the scope of sched1, such that you can insert independent insns from a completely different block in between dependent insns in this block. As well as potentially amortizing stalls on high latency insns, we also get the chance to do meatier work in the destination block and leave less to do in the source block. I don't know if this is a deliberate effect of interblock-scheduling or if it is just a happy side-effect. Anyway, the reason I mention interblock-scheduling is that I see it doing seemingly intelligent moves, but then the later BB-reorder pass is juggling blocks around such that we end up with extra code inside hot loops! I assume this is because the scheduler and BB-reorderer are largely ignorant of each other, and so good intentions on the part of the former can be scuppered by the latter. That is right. It would be nice if somebody solves the problem. Hmm. If we keep sched1 on, then maybe I will be the man to do it! Best regards, Ian
RE: Coloring problem - Pass 0 for finding allocno costs
The problem I see is that for registers 100,101 I get best register class D instead of R - actually they get the same cost and D is chosen (maybe because it is first). Hi Frank. Do D and R overlap? It would be useful to know which regs are in which class, before trying to understand what is going on. Can you paste an example of your define_insn from your MD file to show how operands from D or R are both valid? I ask this because it is possible to express that D is more expensive than R with operand constraints. For general IRA info, you might like to look over my long thread on here called Understanding IRA. Cheers, Ian