Re: [PATCH][AArch64][2/2] (Re)Implement vcopy_lane intrinsics
On 7 June 2016 at 17:56, Kyrill Tkachovwrote: > Ok for trunk? > > Thanks, > Kyrill > > 2016-06-07 Kyrylo Tkachov > James Greenhalgh > > * config/aarch64/arm_neon.h (vcopyq_lane_f32, vcopyq_lane_f64, > vcopyq_lane_p8, vcopyq_lane_p16, vcopyq_lane_s8, vcopyq_lane_s16, > vcopyq_lane_s32, vcopyq_lane_s64, vcopyq_lane_u8, vcopyq_lane_u16, > vcopyq_lane_u32, vcopyq_lane_u64): Reimplement in C. > (vcopy_lane_f32, vcopy_lane_f64, vcopy_lane_p8, vcopy_lane_p16, > vcopy_lane_s8, vcopy_lane_s16, vcopy_lane_s32, vcopy_lane_s64, > vcopy_lane_u8, vcopy_lane_u16, vcopy_lane_u32, vcopy_lane_u64, > vcopy_laneq_f32, vcopy_laneq_f64, vcopy_laneq_p8, vcopy_laneq_p16, > vcopy_laneq_s8, vcopy_laneq_s16, vcopy_laneq_s32, vcopy_laneq_s64, > vcopy_laneq_u8, vcopy_laneq_u16, vcopy_laneq_u32, vcopy_laneq_u64, > vcopyq_laneq_f32, vcopyq_laneq_f64, vcopyq_laneq_p8, vcopyq_laneq_p16, > vcopyq_laneq_s8, vcopyq_laneq_s16, vcopyq_laneq_s32, vcopyq_laneq_s64, > vcopyq_laneq_u8, vcopyq_laneq_u16, vcopyq_laneq_u32, vcopyq_laneq_u64): > New intrinsics. > > 2016-06-07 Kyrylo Tkachov > James Greenhalgh > > * gcc.target/aarch64/vect_copy_lane_1.c: New test. OK, thanks /Marcus
Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence
On 17 May 2016 at 12:02, James Greenhalgh <james.greenha...@arm.com> wrote: > On Tue, May 17, 2016 at 11:32:36AM +0100, Marcus Shawcroft wrote: >> On 17 May 2016 at 10:06, James Greenhalgh <james.greenha...@arm.com> wrote: >> > >> > Hi, >> > >> > This is just a simplification, it probably makes life easier for register >> > allocation in some corner cases and seems the right thing to do. We don't >> > use the internal version elsewhere, so we're safe to delete it and change >> > the types. >> > >> > OK? >> > >> > Bootstrapped on AArch64 with no issues. >> >> Help me understand why this is ok for BE ? > > The reduc_plus_scal_ pattern wants to take a vector and return a scalar > value representing the sum of the lanes of that vector. We want to go > from V2DFmode to DFmode. > > The architectural instruction FADDP writes to a scalar value in the low > bits of the register, leaving zeroes in the upper bits. > > i.e. > > faddp d0, v1.2d > > 128 640 > |0x0| v1.d[0] + v1.d[1] | > > In the current implementation, we use the > aarch64_reduc_plus_internal pattern, which treats the result of > FADDP as a vector of two elements. We then need an extra step to extract > the correct scalar value from that vector. From GCC's point of view the lane > containing the result is either lane 0 (little-endian) or lane 1 > (big-endian), which is why the current code is endian dependent. The extract > operation will always be a NOP move from architectural bits 0-63 to > architectural bits 0-63 - but we never elide the move as future passes can't > be certain that the upper bits are zero (they come out of an UNSPEC so > could be anything). > > However, this is all unneccesary. FADDP does exactly what we want, > regardless of endianness, we just need to model the instruction as writing > the scalar value in the first place. Which is what this patch wires up. > > We probably just missed this optimization in the migration from the > reduc_splus optabs (which required a vector return value) to the > reduc_plus_scal optabs (which require a scalar return value). > > Does that help? Yep. Thanks. OK to commit. /Marcus > Thanks, > James >
Re: [Patch AArch64] Simplify reduc_plus_scal_v2[sd]f sequence
On 17 May 2016 at 10:06, James Greenhalghwrote: > > Hi, > > This is just a simplification, it probably makes life easier for register > allocation in some corner cases and seems the right thing to do. We don't > use the internal version elsewhere, so we're safe to delete it and change > the types. > > OK? > > Bootstrapped on AArch64 with no issues. Help me understand why this is ok for BE ? Cheers /Marcus
Re: [Patch AArch64] Delete ASM_OUTPUT_DEF and fallback to default .set directive
On 17 May 2016 at 10:13, James Greenhalghwrote: > > Hi, > > As in the ARM port [1] , the AArch64 port wants to put out "b = a" to set > an alias. This doesn't cause us any trouble yet, as the AArch64 port doesn't > warn for this construct - but at the same time there is no reason for us > not to put out a .set directive - this seems to have been copied from the > ARM port when section anchor support was added in 2012. Looking through > the chain, we'll get a default definition for ASM_OUTPUT_DEF if SET_ASM_OP > is defined, and we get SET_ASM_OP defined through config/elfos.h for > all the AArch64 targets I can see in config.gcc. So we're safe to drop > this. > > Bootstrapped on aarch64-none-linux-gnu. > > OK? OK /Marcus
Re: [AArch64] Remove AARCH64_EXTRA_TUNE_RECIP_SQRT from Cortex-A57 tuning
On 11 January 2016 at 12:04, James Greenhalghwrote: > 2015-12-11 James Greenhalgh > > * config/aarch64/aarch64.c (cortexa57_tunings): Remove > AARCH64_EXTRA_TUNE_RECIP_SQRT. > OK /Marcus
Re: [Patch AArch64] Restrict 16-bit sqrdml{sa}h instructions to FP_LO_REGS
On 26 January 2016 at 16:04, James Greenhalghwrote: > 2016-01-25 James Greenhalgh > > * config/aarch64/aarch64.md > (arch64_sqrdmlh_lane): Fix register > constraints for operand 3. > (aarch64_sqrdmlh_laneq): Likewise. > OK /Marcus
Re: [Patch AArch64] GCC 6 regression in vector performance. - Fix vector initialization to happen with lane load instructions.
On 20 January 2016 at 15:22, James Greenhalghwrote: > gcc/ > > 2016-01-20 James Greenhalgh > Ramana Radhakrishnan > > * config/aarch64/aarch64.c (aarch64_expand_vector_init): Refactor, > always use lane loads to construct non-constant vectors. > > gcc/testsuite/ > > 2016-01-20 James Greenhalgh > Ramana Radhakrishnan > > * gcc.target/aarch64/vector_initialization_nostack.c: New. > OK /Marcus
Re: [Patch AArch64] Use software sqrt expansion always for -mlow-precision-recip-sqrt
On 11 January 2016 at 11:53, James Greenhalghwrote: > > --- > 2015-12-10 James Greenhalgh > > * config/aarch64/aarch64.c (use_rsqrt_p): Always use software > reciprocal sqrt for -mlow-precision-recip-sqrt. > OK /Marcus
Re: Backport: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609
On 18 December 2015 at 12:13, James Greenhalghwrote: > Looking back at the patch just before I hit commit, the 4.9 backport was > a little different (as we still have a CANNOT_CHANGE_MODE_CLASS there). > We can drop the aarch64-protos.h and aarch64.h changes, and we need to > change the sense of the new check, such that we can return true for the > case added by this patch, and false for the limited number of other safe > cases in 4.9. > > Bootstrapped on aarch64-none-linux-gnu. > > OK? > > Thanks, > James > > --- > gcc/ > > 2015-12-14 James Greenhalgh > > Backport from mainline. > 2015-12-09 James Greenhalgh > > PR rtl-optimization/67609 > * config/aarch64/aarch64.c > (aarch64_cannot_change_mode_class): Don't permit word_mode > subregs of full vector modes. > * config/aarch64/aarch64.md (aarch64_movdi_low): Use > zero_extract rather than truncate. > (aarch64_movdi_high): Likewise. > > gcc/testsuite/ > > 2015-12-14 James Greenhalgh > > Backport from mainline. > 2015-12-09 James Greenhalgh > > PR rtl-optimization/67609 > * gcc.dg/torture/pr67609.c: New. > OK /Marcus
Re: [AARCH64][ACLE] Implement __ARM_FP_FENV_ROUNDING in aarch64 backend.
On 11 January 2016 at 08:12, Bilyan Borisovwrote: > 2015-XX-XX Bilyan Borisov > > * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): New > macro > definition. > > gcc/testsuite/ > > 2015-XX-XX Bilyan Borisov > > * gcc.target/aarch64/fesetround-checking-baremetal.c: New. > * gcc.target/aarch64/fesetround-checking-linux.c: Likewise. for new test cases in gcc.target/aarch64 use the name convention here https://gcc.gnu.org/wiki/TestCaseWriting, in this case: fesetround_checking_baremetal_1.c fesetround_checking_linux_1.c If we are distinguishing between glibc/musl/bionic/uclibc then the use of the name "linux" in the last test case would seem to be inappropriate (all of glibc/musl/bionic and uclibc can be used on top of a linux kernel). Suggest s/linux/glibc/ in the test case name. Cheers /Marcus
Re: [AARCH64][ACLE] Implement __ARM_FP_FENV_ROUNDING in aarch64 backend.
On 11 January 2016 at 10:46, Alan Lawrencewrote: > However, the test doesn't really look at whether we're using glibc vs > musl/bionic/uclibc, only at whether we are targeting -linux-gnu or > -none-elf. Fair point, the test case is not aligned with the implementation. Rather than hang the test on a target triple the other approach would be to try and write a test that uses the functionality that is gated by the predefine. Cheers /Marcus
Re: [AArch64] Simplify TLS pattern by hardcoding relocation modifiers into pattern
On 10 September 2015 at 12:28, Jiong Wangwrote: > > TLS instruction sequences are always with fixed format, there is no need > to use operand modifier, we can hardcode the relocation modifiers into > instruction pattern, all those redundant checks in aarch64_print_operand > can be removed. > > OK for trunk? > > 2015-09-10 Jiong Wang > > gcc/ > * config/aarch64/aarch64.md (ldr_got_tiny): Hardcode relocation > modifers. > (tlsgd_small): Likewise. > (tlsgd_tiny): Likewise. > (tlsie_small_): Likewise. > (tlsie_small_sidi): Likewise. > (tlsie_tiny_): Likewise. > (tlsie_tiny_sidi): Likewise. > (tlsle12_): Likewise. > (tlsle24_): Likewise. > (tlsdesc_small_): Likewise. > (tlsdesc_small_pseudo_): Likewise. > (tlsdesc_tiny_): Likewise. > (tlsdesc_tiny_pseudo_): Likewise. > * config/aarch64/aarch64.c (aarch64_print_operand): Delete useless > check on 'A', 'L', 'G'. Hi, Is there a particular reason for wanting to change this? Cheers /Marcus
Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609
On 14 December 2015 at 11:01, James Greenhalgh <james.greenha...@arm.com> wrote: > On Wed, Dec 09, 2015 at 01:13:20PM +0000, Marcus Shawcroft wrote: >> On 27 November 2015 at 13:01, James Greenhalgh <james.greenha...@arm.com> >> wrote: >> >> > 2015-11-27 James Greenhalgh <james.greenha...@arm.com> >> > >> > * config/aarch64/aarch64-protos.h >> > (aarch64_cannot_change_mode_class): Bring back. >> > * config/aarch64/aarch64.c >> > (aarch64_cannot_change_mode_class): Likewise. >> > * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise. >> > * config/aarch64/aarch64.md (aarch64_movdi_low): Use >> > zero_extract rather than truncate. >> > (aarch64_movdi_high): Likewise. >> > >> > 2015-11-27 James Greenhalgh <james.greenha...@arm.com> >> > >> > * gcc.dg/torture/pr67609.c: New. >> > >> >> + detailed dicussion. In all other cases, we want to be premissive >> >> s/premissive/permissive/ >> >> OK /Marcus > > Thanks. > > This has had a week or so to soak on trunk now, is it OK to backport to GCC > 5 and 4.9? > > The patch applies as-good-as clean, with only a little bit to fix up in > aarch64-protos.h to keep alphabetical order, and I've bootstrapped and tested > the backports with no issue. OK /Marcus
Re: [Patch AArch64] Reinstate CANNOT_CHANGE_MODE_CLASS to fix pr67609
On 27 November 2015 at 13:01, James Greenhalghwrote: > 2015-11-27 James Greenhalgh > > * config/aarch64/aarch64-protos.h > (aarch64_cannot_change_mode_class): Bring back. > * config/aarch64/aarch64.c > (aarch64_cannot_change_mode_class): Likewise. > * config/aarch64/aarch64.h (CANNOT_CHANGE_MODE_CLASS): Likewise. > * config/aarch64/aarch64.md (aarch64_movdi_low): Use > zero_extract rather than truncate. > (aarch64_movdi_high): Likewise. > > 2015-11-27 James Greenhalgh > > * gcc.dg/torture/pr67609.c: New. > + detailed dicussion. In all other cases, we want to be premissive s/premissive/permissive/ OK /Marcus
Re: [AArch64] Emit square root using the Newton series
On 8 December 2015 at 21:35, Evandro Menezeswrote: >Emit square root using the Newton series > >2015-12-03 Evandro Menezes > >gcc/ > * config/aarch64/aarch64-protos.h (aarch64_emit_swsqrt): >Declare new > function. > * config/aarch64/aarch64-simd.md (sqrt2): New >expansion and > insn definitions. > * config/aarch64/aarch64-tuning-flags.def > (AARCH64_EXTRA_TUNE_FAST_SQRT): New tuning macro. > * config/aarch64/aarch64.c (aarch64_emit_swsqrt): Define >new function. > * config/aarch64/aarch64.md (sqrt2): New expansion >and insn > definitions. > * config/aarch64/aarch64.opt (mlow-precision-recip-sqrt): >Expand option > description. > * doc/invoke.texi (mlow-precision-recip-sqrt): Likewise. > > This patch extends the patch that added support for implementing x^-1/2 > using the Newton series by adding support for x^1/2 as well. Hi Evandro, What benchmarking have you done on this patch? /M
Re: [ARM] Fix PR middle-end/65958
On 3 December 2015 at 12:17, Eric Botcazouwrote: >> I can understand this restriction, but... >> >> > + /* See the same assertion on PROBE_INTERVAL above. */ >> > + gcc_assert ((first % 4096) == 0); >> >> ... why isn't this a test that FIRST is aligned to PROBE_INTERVAL? > > Because that isn't guaranteed, FIRST is related to the size of the protection > area while PROBE_INTERVAL is related to the page size. > >> blank line between declarations and code. Also, can we come up with a >> suitable define for 4096 here that expresses the context and then use >> that consistently through the remainder of this function? > > OK, let's use ARITH_BASE. > >> > +(define_insn "probe_stack_range" >> > + [(set (match_operand:DI 0 "register_operand" "=r") >> > + (unspec_volatile:DI [(match_operand:DI 1 "register_operand" "0") >> > +(match_operand:DI 2 "register_operand" "r")] >> > +UNSPEC_PROBE_STACK_RANGE))] >> >> I think this should really use PTRmode, so that it's ILP32 ready (I'm >> not going to ask you to make sure that works though, since I suspect >> there are still other issues to resolve with ILP32 at this time). > > Done. Manually tested for now, I'll fully test it if approved. Looks ok to me. OK /Marcus
Re: [PATCH AArch64]Use aarch64_sync_memory_operand in atomic_store pattern
On 4 December 2015 at 03:34, Bin Chengwrote: > 2015-12-01 Bin Cheng > > * config/aarch64/atomics.md (atomic_store): Use predicate > aarch64_sync_memory_operand. > OK /Marcus
Re: [PATCH][AArch64] Don't allow -mgeneral-regs-only to change the .arch assembler directives
On 04/12/15 14:40, Kyrill Tkachov wrote: Ping. This almost fell through the cracks. https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00055.html Thanks, Kyrill On 01/10/15 14:00, Kyrill Tkachov wrote: Hi all, As part of the SWITCHABLE_TARGET work I inadvertently changed the behaviour of -mgeneral-regs-only with respect to the .arch directives that we emit. The behaviour of -mgeneral-regs-only in GCC 5 and earlier is such that it disallows the usage of FP/SIMD registers but does *not* stop the compiler from emitting the +fp,+simd etc extensions in the .arch directive of the generated assembly. This is to accommodate users who may want to write inline assembly in a file compiled with -mgeneral-regs-only. This patch restores the trunk behaviour in that respect to that of GCC 5 and the documentation for the option is tweaked a bit to reflect that. Bootstrapped and tested on aarch64. Ok for trunk? Thanks, Kyrill 2015-10-01 Kyrylo Tkachov* config/aarch64/aarch64.c (aarch64_override_options_internal): Do not alter target_flags due to TARGET_GENERAL_REGS_ONLY_P. * doc/invoke.texi (AArch64 options): Mention that -mgeneral-regs-only does not affect the assembler directives. 2015-10-01 Kyrylo Tkachov * gcc.target/aarch64/mgeneral-regs_4.c: New test. OK /M
Re: [AArch64] Add register constraints to add3_pluslong
On 4 December 2015 at 19:42, James Greenhalghwrote: > > Hi, > > This patch fixes a bug I spotted in the add3_pluslong insn_and_split > pattern. We need to give register constraints, otherwise the register > allocator can do whatever it likes. This manifests as an ICE on AArch64 > with -mabi=ilp32: > > gcc foo.c -O2 -mabi=ilp32 > > error: could not split insn >} >^ > > (insn:TI 85 95 7 (set (mem/c:DI (plus:DI (reg/f:DI 29 x29) > (const_int 40 [0x28])) [1 %sfp+-65528 S8 A64]) > (plus:DI (plus:DI (reg/f:DI 29 x29) > (const_int 16 [0x10])) > (const_int 65552 [0x10010]))) foo.c:7 95 {*adddi3_pluslong} >(nil)) > > The patch simply constrains the pattern to use w/x registers. > > Bootstrapped on aarch64-none-linux-gnu and cross-tested on aarch64-none-elf > with no issues. > > OK? > > Thanks, > James > > --- > gcc/ > > 2015-12-04 James Greenhalgh > > * config/aarch64/aarch64.md (add3_pluslong): Add register > constraints. > > gcc/testsuite/ > > 2015-12-04 James Greenhalgh > > * gcc.c-torture/compile/20151204.c: New. > +main(int argc, char** argv) Space before (. OK /M
Re: [PATCH 1/5] [AARCH64]: Move #undef into .def files.
On 17 November 2015 at 22:10, Andrew Pinskiwrote: > > This moves the #undef from the header files to the .def files like was done > for builtins.def (https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00662.html). > > OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions. > > Thanks, > Andrew Pinski > > * config/aarch64/aarch64-arches.def (AARCH64_ARCH): #undef at the end. > * config/aarch64/aarch64-cores.def (AARCH64_CORE): Likewise. > * config/aarch64/aarch64-fusion-pairs.def (AARCH64_FUSION_PAIR): Likewise. > * config/aarch64/aarch64-tuning-flags.def (AARCH64_EXTRA_TUNING_OPTION): > Likewise. > * config/aarch64/aarch64-opts.h (AARCH64_CORE): Don't #undef here. > (AARCH64_ARCH): Likewise. > * config/aarch64/aarch64-protos.h (AARCH64_FUSION_PAIR): Likewise. > (AARCH64_EXTRA_TUNING_OPTION): Likewise. > * config/aarch64/aarch64.c (AARCH64_FUION_PAIR): Likewise. > (AARCH64_EXTRA_TUNING_OPTION): Likewise. > (AARCH64_ARCH): Likewise. > (AARCH64_CORE): Likewise. > (AARCH64_OPT_EXTENSION): Likewise. > * config/aarch64/aarch64.h (AARCH64_CORE): Likewise. > * config/aarch64/driver-aarch64.c (AARCH64_OPT_EXTENSION): Likewise. > (AARCH64_CORE): Likewise. > (AARCH64_ARCH): Likewise. > * common/config/aarch64/aarch64-common.c: Likewise. OK Thanks /Marcus
Re: [Patch AArch64] Add support for Cortex-A35
On 16 November 2015 at 14:36, James Greenhalghwrote: > 2015-11-16 James Greenhalgh > > * config/aarch64/aarch64-cores.def (cortex-a35): New. > * config/aarch64/aarch64.c (cortexa35_tunings): New. > * config/aarch64/aarch64-tune.md: Regenerate. > * doc/invoke.texi (-mcpu): Add Cortex-A35 > OK /M
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 11:32, Kyrill Tkachovwrote: > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. > * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): > Delete VOIDmode case. Assert that mode is not VOIDmode. > * config/aarch64/predicates.md (const0_operand): Remove const_double > match. > > 2015-11-09 Kyrylo Tkachov > > PR target/68129 > * gcc.target/aarch64/pr68129_1.c: New test. Hi, This test isn't aarch64 specific, does it need to be in gcc.target/aarch64 ? Cheers /Marcus
Re: [PATCH][AArch64] PR target/68129: Define TARGET_SUPPORTS_WIDE_INT
On 9 November 2015 at 15:45, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > > On 09/11/15 15:34, Marcus Shawcroft wrote: >> >> On 9 November 2015 at 11:32, Kyrill Tkachov <kyrylo.tkac...@arm.com> >> wrote: >> >>> 2015-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR target/68129 >>> * config/aarch64/aarch64.h (TARGET_SUPPORTS_WIDE_INT): Define to 1. >>> * config/aarch64/aarch64.c (aarch64_print_operand, CONST_DOUBLE): >>> Delete VOIDmode case. Assert that mode is not VOIDmode. >>> * config/aarch64/predicates.md (const0_operand): Remove const_double >>> match. >>> >>> 2015-11-09 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> PR target/68129 >>> * gcc.target/aarch64/pr68129_1.c: New test. >> >> Hi, This test isn't aarch64 specific, does it need to be in >> gcc.target/aarch64 ? > > > Not really, here is the patch with the test in gcc.dg/ if that's preferred. OK /Marcus
Re: [PATCH][AArch64][cleanup] Remove uses of CONST_DOUBLE_HIGH, CONST_DOUBLE_LOW
On 9 November 2015 at 11:34, Kyrill Tkachovwrote: > * config/aarch64/aarch64.c (aarch64_simd_valid_immediate): > Remove integer CONST_DOUBLE handling. It should never occur. OK /Marcus
Re: [PATCH][AArch64] Fix insn types
On 20 October 2015 at 17:14, Evandro Menezeswrote: > Kyrill, > > Indeed, the correct log would be: > > The type assigned to some insn definitions was not correct. > > gcc/ > * config/aarch64/aarch64.md > (*movhf_aarch64): Change the type of "mov %0.h[0], %1.h[0] to > "neon_move". > (*movtf_aarch64): Change the type of "fmov %s0, wzr" to "f_mcr". > (*cmov_insn): Change the types of "mov %0, {-1,1}" to > "mov_imm". > (*cmovsi_insn_uxtw): Likewise. > > Thank you, > OK thanks, committed as r229572. /Marcus
Re: [PATCH][AArch64] Replace insn to zero up DF register
On 20 October 2015 at 00:40, Evandro Menezeswrote: > In the existing targets, it seems that it's always faster to zero up a DF > register with "movi %d0, #0" instead of "fmov %d0, xzr". > > This patch modifies the respective pattern. Hi Evandro, This patch changes the generic, u architecture independent instruction selection. The ARM ARM (C3.5.3) makes a specific recommendation about the choice of instruction in this situation and the current implementation in GCC follows that recommendation. Wilco has also picked up on this issue he has the same patch internal to ARM along with an ongoing discussion with ARM architecture folk regarding this recommendation. I'm reluctant to take this patch right now on the basis that it runs contrary to ARM ARM recommendation pending the conclusion of Wilco's discussion with ARM architecture folk. Cheers /Marcus
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 28 October 2015 at 10:07, Kyrill Tkachovwrote: > Hi all, > > This RTL checking error occurs on aarch64 in aarch_accumulator_forwarding > when processing an msubsi insn > with subregs: > (insn 15 14 16 3 (set (reg/v:SI 78 [ i ]) > (minus:SI (subreg:SI (reg/v:DI 76 [ aul ]) 0) > (mult:SI (subreg:SI (reg:DI 83) 0) > (subreg:SI (reg:DI 75 [ _20 ]) 0 schedice.c:10 357 > {*msubsi} > > The register_operand predicate for that pattern allows subregs (I think > correctly). > The code in aarch_accumulator_forwarding doesn't take that into account and > ends up > taking a REGNO of a SUBREG, causing a checking error. > > This patch fixes that by stripping the subregs off the accumulator rtx > before > checking that the inner expression is a REG and taking its REGNO. > > The testcase now works fine with an aarch64-none-elf toolchain configure for > RTL checking. > > The testcase is taken verbatim from the BZ entry for PR 68088. > Since this function is shared between arm and aarch64 I've bootstrapped and > tested it on both > and I'll need ok's for both ports. > > Ok for trunk? rtl.h exposes reg_or_subregno() already doesn't that do what we need here? The test case is not aarch64 specific therefore I think convention is that it should go into a generic directory. Cheers /Marcus
Re: [PATCH][ARM/AArch64] PR 68088: Fix RTL checking ICE due to subregs inside accumulator forwarding check
On 29 October 2015 at 13:50, Kyrill Tkachovwrote: >>> Ok for trunk? >> >> rtl.h exposes reg_or_subregno() already doesn't that do what we need here? > > > reg_or_subregno assumes that what it's passed is REG or a SUBREG. > It will ICE on any other rtx. Here I want to strip the subreg if it is > a subreg, but leave it as it is otherwise. OK, I follow. >> The test case is not aarch64 specific therefore I think convention is >> that it should go into a generic directory. > > > Ok, I'll put it in gcc.dg/ OK with the test case moved. Thanks /Marcus
Re: Fwd: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
On 23 October 2015 at 13:34, H.J. Lu <hjl.to...@gmail.com> wrote: > On Fri, Oct 23, 2015 at 4:54 AM, Marcus Shawcroft > <marcus.shawcr...@gmail.com> wrote: >> Hi, >> >> This patch breaks the distinction between build and host. For example >> consider a configure along these lines: >> >> ./configure --host=aarch64-none-linux-gnu >> --target=aarch64-none-linux-gnu --build=x86_64-pc-linux-gnu >> HJ, could you take a look into this issue? >> > > Try this. Hi, Can we get a review from one of the build machinery maintainers for this patch? Cheers /Marcus
Re: Fwd: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
Hi, This patch breaks the distinction between build and host. For example consider a configure along these lines: ./configure --host=aarch64-none-linux-gnu --target=aarch64-none-linux-gnu --build=x86_64-pc-linux-gnu Will result in: CXX_FOR_BUILD='g++' CXX='aarch64-none-linux-gnu-g++' the gcc/configure fragment: +# Check if -no-pie works. +AC_CACHE_CHECK([for -no-pie option], + [gcc_cv_no_pie], + [saved_LDFLAGS="$LDFLAGS" + LDFLAGS="$LDFLAGS -no-pie" + AC_LINK_IFELSE([int main(void) {return 0;}], + [gcc_cv_no_pie=yes], + [gcc_cv_no_pie=no]) + LDFLAGS="$saved_LDFLAGS"]) +if test "$gcc_cv_no_pie" = "yes"; then + NO_PIE_FLAG="-no-pie" +fi +AC_SUBST([NO_PIE_FLAG]) will check if CXX supports -no-pic and set NO_PIE_FLAG accordingly. The gcc/Makefile.in fragment: @@ -761,6 +769,7 @@ BUILD_LINKERFLAGS = $(BUILD_CXXFLAGS) # Native linker and preprocessor flags. For x-fragment overrides. BUILD_LDFLAGS=@BUILD_LDFLAGS@ +BUILD_LDFLAGS += @NO_PIE_FLAG@ constructs the flags which will ultimately be used to compile the build machines gen* programs. That compilation uses CXX_FOR_BUILD with the NO_PIE_FLAG that was originally probed for CXX. This is incorrect if CXX_FOR_BUILD != CXX HJ, could you take a look into this issue? Thanks /Marcus
Re: Fwd: PING^3: [PATCH]: New configure options that make the compiler use -fPIE and -pie as default option
On 23 October 2015 at 13:34, H.J. Lu <hjl.to...@gmail.com> wrote: > On Fri, Oct 23, 2015 at 4:54 AM, Marcus Shawcroft > <marcus.shawcr...@gmail.com> wrote: >> Hi, >> >> This patch breaks the distinction between build and host. For example >> consider a configure along these lines: >> >> ./configure --host=aarch64-none-linux-gnu >> --target=aarch64-none-linux-gnu --build=x86_64-pc-linux-gnu >> HJ, could you take a look into this issue? >> > > Try this. Thanks HJ. This solves the issue I was seeing. Cheers /Marcus
Re: [PATCH] [AArch64] support -mfentry feature for arm64
On 22 October 2015 at 14:21, Li Binwrote: > From: Jiangjiji > > * gcc/config/aarch64/aarch64.opt: Add a new option. > * gcc/config/aarch64/aarch64.c: Add some new functions and Macros. > * gcc/config/aarch64/aarch64.h: Modify PROFILE_HOOK and FUNCTION_PROFILER. > > Signed-off-by: Jiangjiji > Signed-off-by: Li Bin Hi, Please work with Maxim https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html towards a generic solution to this issue. If, in the future it becomes apparent that his proposal is not viable then we can reconsider a backend centric solution. Cheers /Marcus
Re: [Patch AArch64 63304] Fix issue with global state.
On 16 October 2015 at 12:05, Ramana Radhakrishnanwrote: > 2015-10-15 Ramana Radhakrishnan > > PR target/63304 > * config/aarch64/aarch64.c (aarch64_nopcrelative_literal_loads): New. > (aarch64_expand_mov_immediate): Use > aarch64_nopcrelative_literal_loads. > (aarch64_classify_address): Likewise. > (aarch64_secondary_reload): Likewise. > (aarch64_override_options_after_change_1): Adjust. > * config/aarch64/aarch64.md > (aarch64_reload_movcp): > Use aarch64_nopcrelative_literal_loads. > (aarch64_reload_movcp): Likewise. > * config/aarch64/aarch64-protos.h > (aarch64_nopcrelative_literal_loads): Declare > > 2015-10-15 Jiong Wang > Ramana Radhakrishnan > > PR target/63304 > * gcc.target/aarch64/pr63304-1.c: New test. > +/* Global flag for PC relative loads. */ > +bool aarch64_nopcrelative_literal_loads; Yuk... but I don't have a better suggestion to hand. > +++ b/gcc/testsuite/gcc.target/aarch64/pr63304-1.c In that directory I've been pestering folks to follow the name convention on the wiki for new additions, so _1 please (or change the wiki ;-). Otherwise OK with me. Cheers /Marcus
Re: [PATCH][AArch64] Enable fusion of AES instructions
On 14 October 2015 at 13:30, Wilco Dijkstrawrote: > Enable instruction fusion of dependent AESE; AESMC and AESD; AESIMC pairs. > This can give up to 2x > speedup on many AArch64 implementations. Also model the crypto instructions > on Cortex-A57 according > to the Optimization Guide. > > Passes regression tests. > > ChangeLog: > 2015-10-14 Wilco Dijkstra > > * gcc/config/aarch64/aarch64.c (cortexa53_tunings): Add AES fusion. > (cortexa57_tunings): Likewise. > (cortexa72_tunings): Likewise. > (arch_macro_fusion_pair_p): Add support for AES fusion. > * gcc/config/aarch64/aarch64-fusion-pairs.def: Add AES_AESMC entry. These AArch64 changes are OK > * gcc/config/arm/aarch-common.c (aarch_crypto_can_dual_issue): > Allow virtual registers before reload so early scheduling works. > * gcc/config/arm/cortex-a57.md (cortex_a57_crypto_simple): Use > correct latency and pipeline. > (cortex_a57_crypto_complex): Likewise. > (cortex_a57_crypto_xor): Likewise. > (define_bypass): Add AES bypass. ... but wait for Ramana or Kyrill to respond on these changes before you commit. Cheers /Marcus
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 19 October 2015 at 14:57, Kyrill Tkachovwrote: > 2015-10-19 Kyrylo Tkachov > > * config/aarch64/aarch64.md > (*aarch64_fcvt2_mult): New pattern. > * config/aarch64/aarch64-simd.md > (*aarch64_fcvt2_mult): Likewise. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. > (aarch64_fpconst_pow_of_2): New function. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare > prototype. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. > (aarch64_fp_vec_pow2): Likewise. > > 2015-10-19 Kyrylo Tkachov > > * gcc.target/aarch64/fmul_fcvt_1.c: New test. > * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. +char buf[64]; +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); Prefer snprintf please. + } + [(set_attr "type" "neon_fp_to_int_")] +) + + Superflous blank line here ? + *cost += rtx_cost (XEXP (x, 0), VOIDmode, + (enum rtx_code) code, 0, speed); My understanding is the unnecessary use of enum is now discouraged, (rtx_code) is sufficient in this case. + int count = CONST_VECTOR_NUNITS (x); + int i; + for (i = 1; i < count; i++) Push the int into the for initializer. Push the rhs of the count assignment into the for condition and drop the definition of count. +/* { dg-final { scan-assembler "fcvtzs\tw\[0-9\], s\[0-9\]*.*#2" } } */ I'd prefer scan-assembler-times or do you have a particular reason to avoid it in these tests? Cheers /Marcus
Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp
On 20 October 2015 at 17:26, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > Hi Marcus, > > On 20/10/15 17:05, Marcus Shawcroft wrote: >> >> On 16 October 2015 at 13:58, Kyrill Tkachov <kyrylo.tkac...@arm.com> >> wrote: >>> >>> Hi all, >>> >>> We already support load/store-pair operations on the D-registers when >>> they >>> contain an FP value, but the peepholes/sched-fusion machinery that >>> do all the hard work currently ignore 64-bit vector modes. >>> >>> This patch adds support for fusing loads/stores of 64-bit vector operands >>> into ldp and stp instructions. >>> I've seen this trigger a few times in SPEC2006. Not too many times, but >>> the >>> times it did trigger the code seemed objectively better >>> i.e. long sequences of ldr and str instructions essentially halved in >>> size. >>> >>> Bootstrapped and tested on aarch64-none-linux-gnu. >>> >>> Ok for trunk? >>> >>> Thanks, >>> Kyrill >>> >>> 2015-10-16 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>> >>> * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): >> >> We have several different flavours of fusion in the backend, this one >> is specifically load/stores, perhaps making that clear in the name of >> this predicate will avoid confusion further down the line? > > Thanks for the review, > > This particular type of fusion is called sched_fusion in various > places in the compiler and its implementation in aarch64 is only for > load/store merging (indeed, the only usage of sched_fusion currently > is to merge loads/stores in arm and aarch64). > > So, I think that sched_fusion in the name already conveys the information > that it's the ldp/stp one rather than macro fusion. In fact, there is a > macro fusion of ADRP and an LDR instruction, > so having sched_fusion in the name is actually a better differentiator than > mentioning loads/stores as both types of fusion deal with loads in some way. > > Is it ok to keep the name as is? Thanks for the justification, patch is OK to commit /Marcus
Re: [PATCH][Testsuite] Turn on 64-bit-vector tests for AArch64
On 16 October 2015 at 12:26, Alan Lawrencewrote: > This enables tests bb-slp-11.c and bb-slp-26.c for AArch64. Both of these are > currently passing on little- and big-endian. > > (Tested on aarch64-none-linux-gnu and aarch64_be-none-elf). > > OK for trunk? > > gcc/testsuite/ChangeLog: > > * lib/target-supports.exp (check_effective_target_vect64): Add > AArch64. > --- > gcc/testsuite/lib/target-supports.exp | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index 3088369..bd03108 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -4762,6 +4762,7 @@ proc check_effective_target_vect64 { } { > if { ([istarget arm*-*-*] > && [check_effective_target_arm_neon_ok] > && [check_effective_target_arm_little_endian]) > +|| [istarget aarch64*-*-*] > || [istarget sparc*-*-*] } { > set et_vect64_saved 1 > } > -- > 1.9.1 > OK /Marcus
Re: [PATCH][AArch64] Add support for 64-bit vector-mode ldp/stp
On 16 October 2015 at 13:58, Kyrill Tkachovwrote: > Hi all, > > We already support load/store-pair operations on the D-registers when they > contain an FP value, but the peepholes/sched-fusion machinery that > do all the hard work currently ignore 64-bit vector modes. > > This patch adds support for fusing loads/stores of 64-bit vector operands > into ldp and stp instructions. > I've seen this trigger a few times in SPEC2006. Not too many times, but the > times it did trigger the code seemed objectively better > i.e. long sequences of ldr and str instructions essentially halved in size. > > Bootstrapped and tested on aarch64-none-linux-gnu. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-10-16 Kyrylo Tkachov > > * config/aarch64/aarch64.c (aarch64_mode_valid_for_sched_fusion_p): We have several different flavours of fusion in the backend, this one is specifically load/stores, perhaps making that clear in the name of this predicate will avoid confusion further down the line? > New function. > (fusion_load_store): Use it. > * config/aarch64/aarch64-ldpstp.md: Add new peephole2s for > ldp and stp in VD modes. > * config/aarch64/aarch64-simd.md (load_pair, VD): New pattern. > (store_pair, VD): Likewise. > > 2015-10-16 Kyrylo Tkachov > > * gcc.target/aarch64/stp_vec_64_1.c: New test. > * gcc.target/aarch64/ldp_vec_64_1.c: New test. Otherwise OK /Marcus
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 20 October 2015 at 17:31, Kyrill Tkachov <kyrylo.tkac...@arm.com> wrote: > > On 20/10/15 17:28, Ramana Radhakrishnan wrote: >> >> On Tue, Oct 20, 2015 at 4:26 PM, Marcus Shawcroft >> <marcus.shawcr...@gmail.com> wrote: >>> >>> On 19 October 2015 at 14:57, Kyrill Tkachov <kyrylo.tkac...@arm.com> >>> wrote: >>> >>>> 2015-10-19 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>> >>>> * config/aarch64/aarch64.md >>>> (*aarch64_fcvt2_mult): New pattern. >>>> * config/aarch64/aarch64-simd.md >>>> (*aarch64_fcvt2_mult): Likewise. >>>> * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above >>>> patterns. >>>> (aarch64_fpconst_pow_of_2): New function. >>>> (aarch64_vec_fpconst_pow_of_2): Likewise. >>>> * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): >>>> Declare >>>> prototype. >>>> (aarch64_vec_fpconst_pow_of_2): Likewise. >>>> * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. >>>> (aarch64_fp_vec_pow2): Likewise. >>>> >>>> 2015-10-19 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >>>> >>>> * gcc.target/aarch64/fmul_fcvt_1.c: New test. >>>> * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. >>> >>> +char buf[64]; >>> +sprintf (buf, "fcvtz\\t%%0., %%1., #%d", fbits); >>> >>> Prefer snprintf please >> >> Should we also update our coding standards for this ? i.e. use the non >> `n' versions of the string functions only if you have a very good >> reason. Even more so in the run time support libraries ! > > > My understanding is one should *really* use snprintf if any part of the > string comes from the user. So in this case it probably doesn't make a > difference. However, it's not hard to use the 'n' version and thinking > about the maximum length is probably a good practice to get into. Using snprintf here means that if/when someone fettles around with the pattern in the future they won't silently over run the buffer that they forgot to resize for their new longer pattern. Although in the case the likely hood of that happening seems rather small it is good practice to simply avoid the none n version. Cheers /Marcus
Re: [PATCH][AArch64][1/2] Add fmul-by-power-of-2+fcvt optimisation
On 20 October 2015 at 16:47, Kyrill Tkachovwrote: > Here's the patch updated as per your feedback. > > How's this? > > Thanks, > Kyrill > > 2015-10-20 Kyrylo Tkachov > > * config/aarch64/aarch64.md > (*aarch64_fcvt2_mult): New pattern. > * config/aarch64/aarch64-simd.md > (*aarch64_fcvt2_mult): Likewise. > * config/aarch64/aarch64.c (aarch64_rtx_costs): Handle above patterns. > (aarch64_fpconst_pow_of_2): New function. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/aarch64-protos.h (aarch64_fpconst_pow_of_2): Declare > prototype. > (aarch64_vec_fpconst_pow_of_2): Likewise. > * config/aarch64/predicates.md (aarch64_fp_pow2): New predicate. > (aarch64_fp_vec_pow2): Likewise. > > 2015-10-20 Kyrylo Tkachov > > > * gcc.target/aarch64/fmul_fcvt_1.c: New test. > * gcc.target/aarch64/fmul_fcvt_2.c: Likewise. > OK /Marcus
Re: [PATCH][AArch64 Testsuite][Trivial?] Remove divisions-to-produce-NaN from vdiv_f.c
On 20 October 2015 at 13:40, Alan Lawrencewrote: > The test vdiv_f.c #define's NAN to (0.0 / 0.0). This produces extra scalar > fdiv's, which complicate the scan-assembler testing. We can remove these by > using __builtin_nan instead. > > Tested on AArch64 Linux. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/vdiv_f.c: Use __builtin_nan. OK /Marcus
Re: [PATCH] 2015-10-19 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>
On 4 January 1970 at 00:02, Benedikt Huberwrote: > * config/aarch64/aarch64-builtins.c: Builtins for rsqrt and rsqrtf. > * config/aarch64/aarch64-protos.h: Declare. > * config/aarch64/aarch64-simd.md: Matching expressions for frsqrte and > frsqrts. > * config/aarch64/aarch64-tuning-flags.def: Added recip_sqrt. > * config/aarch64/aarch64.c: New functions. Emit rsqrt estimation code > when > applicable. > * config/aarch64/aarch64.md: Added enum entries. > * config/aarch64/aarch64.opt: Added option -mlow-precision-recip-sqrt. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_common.h: Common > macros for > assembly checks. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_negative_1.c: Make sure > frsqrts and frsqrte are not emitted. > * testsuite/gcc.target/aarch64/rsqrt_asm_check_1.c: Make sure frsqrts > and > frsqrte are emitted. > * testsuite/gcc.target/aarch64/rsqrt_1.c: Functional tests for rsqrt. OK /Marcus
Re: [PATCH] 2015-10-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>
Hi, A few more style nits: > + builtin_decls_data bdda[] = { New line before { > +{double_type_node, "__builtin_aarch64_rsqrt_df", > AARCH64_BUILTIN_RSQRT_DF}, Space after { Space before } > +void aarch64_emit_swrsqrt (rtx, rtx); > + > +tree aarch64_builtin_rsqrt (unsigned int fn, bool md_fn); > + Drop the formal argument names as you did in the first declaration. See my previous comment w.r.t the naming of new test cases in gcc.target/aarch64, at least the following still need s/-/_/ > diff --git a/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check-common.h > b/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check-common.h > diff --git a/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check-negative_1.c > b/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check-negative_1.c > diff --git a/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check_1.c > b/gcc/testsuite/gcc.target/aarch64/rsqrt-asm-check_1.c > +// With -ffast-math these return positive INF. > +// t_double (-0.0, -inf); > +// t_float (-0.0, -inff); > + > +// The reason here is that -ffast-math flushes to zero. > +// t_double (__DBL_MIN__/256, 0X1.00P+515); > +// t_float (__FLT_MIN__/256, 0X1.00P+67); Comment consistently with the rest of the backend ie /* */ Thanks /Marcus
Re: [PATCH] 2015-10-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>
On 16 October 2015 at 15:31, Benedikt Huberwrote: > I introduced this in revision 7 due to a request from James Greenhalgh. > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00963.html > >> Given that this is all so mechanical, I'd have a preference towards >> refactoring this to loop over some structured data. > > Do you mean, that I should get rid of the typedef and leave the struct > without it? > Or should I completely drop the struct? The use of the struct is fine, we are being discouraged from using unnecessary typedefs. Just rewrite it as: struct builtin_decls_data { ... }; The references to the typedef'd name don't need to be modified. Cheers /Marcus
Re: [PATCH] 2015-10-15 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>
On 16 October 2015 at 14:59, Benedikt Huberwrote: > + typedef struct > + { > +tree type_node; > +const char *builtin_name; > +int function_code; > + } builtin_decls_data; Please address Oleg's comment. Cheers /Marcus
Re: [RFC VTV] Fix VTV for targets that have section anchors.
On 09/10/15 10:17, Ramana Radhakrishnan wrote: This started as a Friday afternoon project ... It turned out enabling VTV for AArch64 and ARM was a matter of fixing PR67868 which essentially comes from building libvtv with section anchors turned on. The problem was that the flow of control from output_object_block through to switch_section did not have the same special casing for the vtable section that exists in assemble_variable. Once this was done, I managed to build and test aarch64-none-linux-gnu with --enable-vtable-verify, a similar test was done for armhf. Testing showed no regressions in the gcc/ g++ testsuites for aarch64 and armhf Testing showed no failures in libvtv testsuite for aarch64 but a few more failures - see below. Testing showed 2 failures in libstdc++-v3 testsuite compared to without vtable verification. FAIL: libstdc++-abi/abi_check FAIL: experimental/filesystem/iterators/directory_iterator.cc execution test However both these failures also occur on x86_64 - so I'm content to declare victory on AArch64 as far as basic enablement goes. On ARM I see the following failures that I still need to debug - I can see that the failure is because the write to _ZN4_VTVI1BE12__vtable_mapE does not elicit a SEGV but I need to go further than that. FAIL: libvtv.cc/thunk_vtable_map_attack.cc -O0 -fvtable-verify=std execution test FAIL: libvtv.cc/thunk_vtable_map_attack.cc -O2 -fvtable-verify=std execution test FAIL: libvtv.cc/thunk_vtable_map_attack.cc -O0 -fvtable-verify=preinit execution test FAIL: libvtv.cc/thunk_vtable_map_attack.cc -O2 -fvtable-verify=preinit execution test Questions - 1. Are the generic changes to varasm.c ok ? 2. Can we take the AArch64 support in now, given this amount of testing ? Marcus / Caroline ? + aarch64*-*-linux*) + VTV_SUPPORTED=yes ;; Ramana, Go ahead an add the aarch64 enable once you have the pre-requisite varasm changes approved. Cheers /Marcus 3. Any suggestions / helpful debug hints for VTV debugging (other than turning VTV_DEBUG on and inspecting trace) ? There's an arm*-*-* hunk there but I'm easy about applying that right now and figuring out the issues over time. In case we don't fix it for 6.0 we can rip the support out before release. Thanks, Ramana P.S. (Yes, I'll provide a Changelog :) )
Re: [PATCH, aarch64]: Remove AARCH64_ROUND_UP and AARCH64_ROUND_DOWN defines
On 12 October 2015 at 12:28, Uros Bizjakwrote: > Remove private definitions and use equivalent global macros instead. > > 2015-10-12 Uros Bizjak > > * config/aarch/aarch64.h (AARCH64_ROUND_UP): Remove. > (AARCH64_ROUND_DOWN): Ditto. > * config/aarch64/aarch64.c: Use ROUND_UP instead of AARCH64_ROUND_UP. > > Tested by building a crosscompiler to aarch64-linux-gnu. > > OK for mainline? OK Thanks /Marcus
Re: C PATCH for c/65345 (file-scope _Atomic expansion with floats)
On 6 October 2015 at 12:29, Ramana Radhakrishnanwrote: > Thanks for the explanation Eric, by that explanation I do not see the need to > adjust for TARGET_EXPR or mark_addressable in the backends. > > Here are the patches that I'm testing - I will apply the ARM one after > testing finishes - my previous testing broke because of some other reasons. > > The AArch64 patch cleared testing - ok to apply ? > > PR c/65345 > > * config/aarch64/aarch64-builtins.c > (aarch64_atomic_assign_expand_fenv): Use create_tmp_var_raw instead of > create_tmp_var. OK /Marcus
Re: [AArch64] [TLSIE][2/2] Implement TLS IE for tiny model
On 5 October 2015 at 14:19, Jiong Wangwrote: > 2015-10-05 James Greenhalgh >Jiong Wang > > gcc/ > * config/aarch64/aarch64.md (tlsie_tiny_sidi): Replace "" with "w". > How about a test case? /Marcus
Re: [AArch64][TLSGD][2/2] Implement TLS GD traditional for tiny code model
On 27 August 2015 at 10:54, Jiong Wangwrote: > > As described this is the main implementaion patch. > > 2015-08-26 Jiong Wang > > gcc/ > * configure.ac: Add check for binutils global dynamic tiny code model > relocation support. > * configure: Regenerate. > * config.in: Regenerate. > * config/aarch64/aarch64.md (tlsgd_tiny): New define_insn. > * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New > enumeration SYMBOL_TINY_TLSGD. > (aarch64_symbol_context): New comment on SYMBOL_TINY_TLSGD. > * config/aarch64/aarch64.c (aarch64_classify_tls_symbol): Support > SYMBOL_TINY_TLSGD. > (aarch64_print_operand): Likewise. > (aarch64_expand_mov_immediate): Likewise. > (aarch64_load_symref_appropriately): Likewise. > > gcc/testsuite/ > * lib/target-supports.exp (check_effective_target_aarch64_tlsgdtiny): > New effective check. > * gcc.target/aarch64/tlsgd_small_1.c: New testcase. > * gcc.target/aarch64/tlsgd_tiny_1.c: Likewise. > +#ifdef HAVE_AS_TINY_TLSGD_RELOCS + return SYMBOL_TINY_TLSGD; +#else + return SYMBOL_SMALL_TLSGD; +#endif Rather than introduce blocks of conditional compilation it is better to gate different behaviours with a test on a constant expression. In this case add something like this: #if define(HAVE_AS_TINY_TLSGD_RELOCS) #define USE_TINY_TLSGD 1 #else #define USE_TINY_TLSGD 0 #endif up near the definition of TARGET_HAVE_TLS then write the above fragment without using the preprocessor: return USE_TINY_TLSGD ? SYMBOL_TINY_TLSGD : SYMBOL_SMALL_TLSGD; or similar. @@ -1024,6 +1024,7 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, return; } +case SYMBOL_TINY_TLSGD: case SYMBOL_SMALL_TLSGD: { rtx_insn *insns; @@ -1031,7 +1032,10 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm, rtx resolver = aarch64_tls_get_addr (); start_sequence (); - aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver)); + if (type == SYMBOL_SMALL_TLSGD) + aarch64_emit_call_insn (gen_tlsgd_small (result, imm, resolver)); + else + aarch64_emit_call_insn (gen_tlsgd_tiny (result, imm, resolver)); insns = get_insns (); end_sequence (); Add a separate case statment for SYMBOL_TINY_TLSGD rather than reusing the case statement for SYMBOL_SMALL_TLSGD and then needing to add another test against symbol type within the body of the case statement. +(define_insn "tlsgd_tiny" + [(set (match_operand 0 "register_operand" "") + (call (mem:DI (match_operand:DI 2 "" "")) (const_int 1))) + (unspec:DI [(match_operand:DI 1 "aarch64_valid_symref" "S")] UNSPEC_GOTTINYTLS) + (clobber (reg:DI LR_REGNUM)) + ] + "" + "adr\tx0, %A1;bl\t%2;nop"; + [(set_attr "type" "multiple") + (set_attr "length" "12")]) I don't think the explicit clobber LR_REGNUM is required since your change last September: https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02654.html Thanks /Marcus
Re: [AArch64][TLSGD][1/2] Remove unncessary define_expand for TLS GD traditional
On 27 August 2015 at 10:52, Jiong Wangwrote: > 2015-08-27 Jiong Wang > > gcc/ > * config/aarch64/aarch64.md (tlsgd_small): Delete this define_expand. > (*tlsgd_small): Rename this define_insn to "tlsgd_small"; OK /Marcus
Re: [Patch AArch64] Use default_elf_asm_named_section instead of special cased hook
On 2 October 2015 at 11:08, Ramana Radhakrishnanwrote: > * config/aarch64/aarch64-elf.h (TARGET_ASM_NAMED_SECTION): Use > default_elf_asm_named_section. > * config/aarch64/aarch64.c (aarch64_elf_asm_named_section): Delete. > --- > gcc/config/aarch64/aarch64-elf.h | 2 +- > gcc/config/aarch64/aarch64.c | 74 > > 2 files changed, 1 insertion(+), 75 deletions(-) > > diff --git a/gcc/config/aarch64/aarch64-elf.h > b/gcc/config/aarch64/aarch64-elf.h > index 1ce6343..70aa845 100644 > --- a/gcc/config/aarch64/aarch64-elf.h > +++ b/gcc/config/aarch64/aarch64-elf.h > @@ -154,7 +154,7 @@ ASM_MABI_SPEC > #define TYPE_OPERAND_FMT "%%%s" > > #undef TARGET_ASM_NAMED_SECTION > -#define TARGET_ASM_NAMED_SECTION aarch64_elf_asm_named_section > +#define TARGET_ASM_NAMED_SECTION default_elf_asm_named_section Isn't it sufficient to simply remove the #define completely and rely on the default from elfos.h? /M
Re: [AArch64/testsuite] Add more TLS local executable testcases
On 22 September 2015 at 17:49, Jiong Wangwrote: > 2015-09-22 Jiong Wang > > gcc/testsuite/ >* gcc.target/aarch64/tlsle12_tiny_1.c: New testcase for tiny model. >* gcc.target/aarch64/tlsle24_tiny_1.c: Likewise. >* gcc.target/aarch64/tlsle_sizeadj_tiny_1.c: TLS size truncation test >for tiny model. >* gcc.target/aarch64/tlsle_sizeadj_small_1.c: TLS size truncation test >for small model. > OK /Marcus
Re: [Patch AArch64] Use default_elf_asm_named_section instead of special cased hook
On 2 October 2015 at 14:01, Ramana Radhakrishnanwrote: >>> #undef TARGET_ASM_NAMED_SECTION >>> -#define TARGET_ASM_NAMED_SECTION aarch64_elf_asm_named_section >>> +#define TARGET_ASM_NAMED_SECTION default_elf_asm_named_section >> >> Isn't it sufficient to simply remove the #define completely and rely >> on the default from elfos.h? > > Doh ! you're right - Yeah should be coming in from config/elfos.h given that > we don't support anything other than elf in the aarch64 port. > > Ok to commit without that hunk ? Yes. /Marcus
Re: [Patch AArch64] Improve SIMD concatenation with zeroes
On 02/10/15 09:12, James Greenhalgh wrote: 2015-10-01 James Greenhalgh* config/aarch64/aarch64-simd.md (*aarch64_combinez): Add alternatives for reads from memory and moves from general-purpose registers. (*aarch64_combinez_be): Likewise. 2015-10-01 James Greenhalgh * gcc.target/aarch64/vect_combine_zeroes_1.c: New. OK /Marcus
Re: [Patch 2/2 ARM/AArch64] Add a new Cortex-A53 scheduling model
On 25/09/15 08:59, James Greenhalgh wrote: Hi, This patch introduces a new scheduling model for Cortex-A53. Bootstrapped and tested on arm-none-linux-gnueabi and aarch64-none-linux-gnu and checked with a variety of popular benchmarking and microbenchmarking suites to show a benefit. OK? Thanks, James --- 2015-09-25 James Greenhalgh* config/arm/aarch-common-protos.h (aarch_accumulator_forwarding): New. (aarch_forward_to_shift_is_not_shifted_reg): Likewise. * config/arm/aarch-common.c (aarch_accumulator_forwarding): New. (aarch_forward_to_shift_is_not_shifted_reg): Liekwise. * config/arm/cortex-a53.md: Rewrite. OK aarch64 with Kyrill's comments fixed. /M
Re: [PATCH][AArch64] Add separate insn sched class for vector LDP & STP
On 29/09/15 00:52, Evandro Menezes wrote: In some micro-architectures the insns to load or store pairs of vector registers are implemented rather differently from those affecting lanes in vector registers. Then, it's important that such insns be described likewise differently in the scheduling model. This patch adds the insn types neon_ldp{,_q} and neon_stp{,_q} apart from the current neon_load2_2reg_q and neon_store2_2reg_q types, respectively. Hi, The AArch64 part of this is OK. Please wait for Kyrill or Ramana to comment on ARM side. Cheers /Marcus Thank you, -- Evandro Menezes 0001-AArch64-Add-separate-insn-sched-class-for-vector-LDP.patch From 340249dcd2af8dfce486cb4f62d4eaf285c6a799 Mon Sep 17 00:00:00 2001 From: Evandro MenezesDate: Mon, 28 Sep 2015 15:00:00 -0500 Subject: [PATCH] [AArch64] Add separate insn sched class for vector LDP & STP 2015-09-28 Evandro Menezes gcc/ * config/arm/types.md (neon_ldp, neon_ldp_q, neon_stp, neon_stp_q): add new insn types for vector load and store pairs. s/add/Add/ and likewise the rest of the changelog comments. * config/arm/cortex-a53.md (cortex_a53_f_load_2reg): add insn types "neon_ldp{,_q}". * config/arm/cortex-a57.md (neon_load_c): add insn types "neon_ldp{,_q}". (neon_store_complex): add insn types "neon_stp{,_q}". * config/aarch64/aarch64-simd.md (aarch64_be_movoi): add insn types "neon_{ldp,stp}_q".
Re: [AArch64] Fix Prefetch ICE
On 28 September 2015 at 06:27, Hurugalawadi, Naveenwrote: > Hi Marcus, > > Thanks for the review and comments. > >>> OK and can you back port to 5 ? > > Please find attached the backported patch on gcc-5-branch. > > Regression tested on AArch64 without any issues. > > 2015-09-28 Andrew Pinski > > ChangeLog > > * config/aarch64/aarch64.md (prefetch): > Change the predicate of operand 0 to register_operand. Thank you, please commit it if you have not already. /M
Re: [AArch64] Improve TLS Descriptor pattern to release RTL loop IV opt
On 28 July 2015 at 14:12, Jiong Wangwrote: > > The instruction sequences for preparing argument for TLS descriptor > runtime resolver and the later function call to resolver can actually be > hoisted out of the loop. > > Currently we can't because we have exposed the hard register X0 as > destination of "set". While GCC's RTL data flow infrastructure will > skip or do very conservative assumption when hard register involved in > and thus some loop IV opportunities are missed. This patch feels like we are botching the back end to over come a limitation in RTL IV opt. Isn't the real solution in RTL IV opt ? /Marcus > > This patch add another "tlsdesc_small_pseudo_" pattern, and avoid > expose x0 to gcc generic code. > > Generally, we define a new register class FIXED_R0 which only contains > register > 0, so the instruction sequences generated from the new add pattern is the same > as tlsdesc_small_, while the operand 0 is wrapped as pseudo register > that > RTL IV opt can handle it. > > Ideally, we should allow operand 0 to be any pseudo register, but then > we can't model the override of x0 caused by the function call which is > hidded by the UNSPEC. > > So here, we restricting operand 0 to be x0, the override of x0 can be > reflected to the gcc. > > OK for trunk? > > 2015-07-28 Ramana Radhakrishnan > Jiong Wang > > gcc/ > * config/aarch64/aarch64.d (tlsdesc_small_pseudo_): New pattern. > * config/aarch64/aarch64.h (reg_class): New enumeration FIXED_REG0. > (REG_CLASS_NAMES): Likewise. > (REG_CLASS_CONTENTS): Likewise. > * config/aarch64/aarch64.c (aarch64_class_max_nregs): Likewise. > (aarch64_register_move_cost): Likewise. > (aarch64_load_symref_appropriately): Invoke the new added pattern if > possible. > * config/aarch64/constraints.md (Uc0): New constraint. > > gcc/testsuite. > * gcc.target/aarch64/tlsdesc_hoist.c: New testcase. > > -- > Regards, > Jiong >
Re: [Patch 1/2 AArch64/ARM] Give AArch64 ROR (Immediate) a new type attribute
On 25 September 2015 at 14:19, James Greenhalghwrote: > 2015-09-25 James Greenhalgh > > * config/arm/types.md (type): Add rotate_imm. > * config/aarch64/aarch64.md (*ror3_insn): Split out the > ROR immediate case. > (*rorsi3_insn_uxtw): Likewise. > * config/aarch64/thunderx.md (thunderx_shift): Add rotate_imm. > * config/arm/cortex-a53.md (cortex_a53_alu_shift): Add rotate_imm. > * config/arm/cortex-a57.md (cortex_a53_alu): Add rotate_imm. OK with me. /Marcus
Re: [AArch64] Fix Prefetch ICE
On 24 September 2015 at 07:47, Hurugalawadi, Naveenwrote: > Hi, > > Please find attached the patch that fixes an ICE for prefetch. > > The predicate is too lose for the constraints. Hence, the patch tightens > up the predicate to be exactly as constraint allows, avoids a “reload” > and allows better code generation. > > Submitted on behalf of Andrew Pinski. > > Thanks, > Naveen > > 2015-09-24 Andrew Pinski > > ChangeLog > > * config/aarch64/aarch64.md (prefetch): > Change the predicate of operand 0 to register_operand. Hi, OK and can you back port to 5 ? Thanks /Marcus
Re: [AArch64] Fix vcvt_high_f64_f32 and vcvt_figh_f32_f64 intrinsics.
On 21 September 2015 at 15:38, James Greenhalghwrote: > --- > gcc/ > > 2015-09-21 James Greenhalgh > > * config/aarch64/aarch64-simd.md > > (aarch64_float_truncate_hi_v4sf): Rewrite as an expand. > (aarch64_float_truncate_hi_v4sf_le): New. > (aarch64_float_truncate_hi_v4sf_be): Likewise. > > gcc/testsuite/ > > 2015-09-21 James Greenhalgh > > * gcc.target/aarch64/advsimd-intrinsics/vcvt_high_1.c: New. > +#include + I don't think this include is required, otherwise OK /Marcus
Re: [AArch64/testsuite] Add more TLS local executable testcases
On 26 August 2015 at 14:58, Jiong Wangwrote: > > This patch cover tlsle tiny model tests, tls size truncation for tiny & > small model included also. > > All testcases pass native test. > > OK for trunk? > > 2015-08-26 Jiong Wang > > gcc/testsuite/ > * gcc.target/aarch64/tlsle12_tiny_1.c: New testcase for tiny model. > * gcc.target/aarch64/tlsle24_tiny_1.c: Likewise. > * gcc.target/aarch64/tlsle_sizeadj_tiny_1.c: TLS size truncation test > for tiny model. > * gcc.target/aarch64/tlsle_sizeadj_small_1.c: TLS size truncation test > for small model. Hi Jiong These tests are fragile if the test suite is run with an explicit -mcmodel=* option. I think they need appropriate dg-skip-if directives to ensure they only run with the expected model. /Marcus
Re: [PATCH 14/15][ARM/AArch64 Testsuite]Add test of vcvt{,_high}_i{f32_f16,f16_f32}
On 25 August 2015 at 14:57, Alan Lawrencewrote: > Sorry - wrong version posted. The hunk for add_options_for_arm_neon_fp16 has > moved to the previous patch! This version also fixes some whitespace issues. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c: New. > * lib/target-supports.exp > (check_effective_target_arm_neon_fp16_hw_ok): New. > --- > .../aarch64/advsimd-intrinsics/vcvt_f16.c | 98 > ++ > gcc/testsuite/lib/target-supports.exp | 15 > 2 files changed, 113 insertions(+) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > > diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > new file mode 100644 > index 000..a2cfd38 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vcvt_f16.c > +#include We should avoid dependencies on external header files. In this case I can't see that anything in math.h is needed any way. Otherwise OK with me. /Marcus
Re: [AArch64] Handle const address in aarch64_print_operand
On 8 September 2015 at 16:03, Jiong Wangwrote: > 2015-09-08 Jiong Wang > > gcc/ > * config/aarch64/aarch64.c (aarch64_print_operand): Add "CONST" > support. OK /Marcus
Re: [Aarch64][target/PR 67143][5.2] Backport correct constraints for atomic operations.
On 9 September 2015 at 12:43, Matthew Wahabwrote: > gcc/ > 2015-09-09 Matthew Wahab > > Backport from mainline > 2015-08-14 Matthew Wahab > > PR target/67143 > * config/aarch64/atomics.md (atomic_): Replace > 'lconst_atomic' with 'const_atomic'. > (atomic_fetch_): Likewise. > (atomic__fetch): Likewise. > * config/aarch64/iterators.md (lconst-atomic): Move below > 'const_atomic'. > (const_atomic): New. > > gcc/testsuite/ > 2015-09-09 Matthew Wahab > > Backport from mainline > 2015-08-14 Matthew Wahab > Matthias Klose > > PR target/67143 > * gcc.c-torture/compile/pr67143.c: New > * gcc.target/aarch64/atomic-op-imm.c > (atomic_fetch_add_negative_RELAXED): New. > (atomic_fetch_sub_negative_ACQUIRE): New. OK for gcc-5 branch /Marcus
Re: [AArch64] Delete aarch64_symbol_context which is not used
On 8 September 2015 at 16:00, Jiong Wangwrote: > > The concept of aarch64_symbol_context is not used in AArch64, this patch > remove it and all relevant code. > > ok for trunk? > > 2015-09-08 Jiong. Wang > > gcc/ > * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Delete. > * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Likewise. > (aarch64_cannot_force_const_mem): Likewise. > (aarch64_classify_address): Likewise. > (aarch64_classify_symbolic_expression): Likewise. > (aarch64_print_operand): Likewise. > (aarch64_classify_symbol): Likewise. > (aarch64_mov_operand_p): Likewise. > * config/aarch64/predicates.md (aarch64_valid_symref): Likewise. > (aarch64_mov_operand): Likewise. OK /Marcus
Re: [PATCH] 2015-09-03 Benedikt Huber <benedikt.hu...@theobroma-systems.com> Philipp Tomsich <philipp.toms...@theobroma-systems.com>
Hi, Thanks for your work on this. There are a bunch of predominantly style nits in line below. My none nit comments on this patch are: This should be left turned off for all cores where we have not seen benchmark numbers to indicate that this optimization is a benefit, we can take patches for each core in the future once numbers are available. Given the absence of numbers for some of the cores and the significant impact on one core this should remain disabled by default on generic. We don't necessarily want a proliferation of user orientated tuning options without good reason, instead the per core default tuning behaviour should make the right choice per core, hence I think -mrecip should be dropped. We already have the -moverride= mechanism to provide a developer orientated mechanism to override the per core tuning flag. Further comments inline Thanks /Marcus On 7 September 2015 at 11:40, Benedikt Huberwrote: --- a/gcc/config/aarch64/aarch64-tuning-flags.def +++ b/gcc/config/aarch64/aarch64-tuning-flags.def @@ -29,4 +29,5 @@ AARCH64_TUNE_ to give an enum name. */ AARCH64_EXTRA_TUNING_OPTION ("rename_fma_regs", RENAME_FMA_REGS) +AARCH64_EXTRA_TUNING_OPTION ("mrecip_default_enabled", MRECIP_DEFAULT_ENABLED) This name is exposed via the -moverride=tune= option, perhaps a better name would be: +AARCH64_EXTRA_TUNING_OPTION ("recip_sqrt", RECIP_SQRT) > +/* Add builtins for reciprocal square root. */ > +void > +aarch64_add_builtin_rsqrt (void) > +{ > + tree fndecl = NULL; > + tree ftype = NULL; > + > + tree V2SF_type_node = build_vector_type (float_type_node, 2); > + tree V2DF_type_node = build_vector_type (double_type_node, 2); > + tree V4SF_type_node = build_vector_type (float_type_node, 4); > + > + ftype = build_function_type_list (double_type_node, double_type_node, > NULL_TREE); Line length exceeds 80 characters. > + fndecl = add_builtin_function ("__builtin_aarch64_rsqrt_df", > +ftype, AARCH64_BUILTIN_RSQRT_DF, BUILT_IN_MD, NULL, NULL_TREE); > + aarch64_builtin_decls[AARCH64_BUILTIN_RSQRT_DF] = fndecl; > + > + ftype = build_function_type_list (float_type_node, float_type_node, > NULL_TREE); Line length exceeds 80 characters. > +/* Function to expand reciprocal square root builtins. */ > +static rtx > +aarch64_expand_builtin_rsqrt (int fcode, tree exp, rtx target) > +{ > + rtx pat; > + tree arg0 = CALL_EXPR_ARG (exp, 0); > + rtx op0 = expand_normal (arg0); > + > + enum insn_code c; > + > + switch (fcode) > +{ > + case AARCH64_BUILTIN_RSQRT_DF: > +c = CODE_FOR_rsqrt_df2; break; Leading blocks of 8 spaces should be TAB's, likewise the rest of the patch. > +/* Return builtin for reciprocal square root. */ > +tree > +aarch64_builtin_rsqrt (unsigned int fn, bool md_fn) Blank line between function comments and function, likewise through the rest of the patch please. > diff --git a/gcc/config/aarch64/aarch64-opts.h > b/gcc/config/aarch64/aarch64-opts.h > index bf6bb7b..f8e79cb 100644 > --- a/gcc/config/aarch64/aarch64-opts.h > +++ b/gcc/config/aarch64/aarch64-opts.h > @@ -73,4 +73,11 @@ enum aarch64_code_model { >AARCH64_CMODEL_LARGE > }; > > +/* Each core can have -mrecip enabled or disabled by default. */ > +enum aarch64_mrecip { { on new line please. > + AARCH64_MRECIP_OFF = 0, > + AARCH64_MRECIP_ON, > + AARCH64_MRECIP_DEFAULT, Trailing , will give a "comma at end of enumerator list" warning every time this file is included, drop the comma please. > +/* Function to decide when to use > + * reciprocal square root builtins. */ Drop the leading * on follow on comment lines, here and elsewhere in the patch please. > +/* Select reciprocal square root initial estimate > + * insn depending on machine mode. */ > +rsqrte_type get_rsqrte_type (enum machine_mode mode) New line between type and function name please, likewise for other instance in this patch. > + for (int i = 0; i < iterations; ++i) > +{ > + rtx x1 = gen_reg_rtx (mode); > + rtx x2 = gen_reg_rtx (mode); > + rtx x3 = gen_reg_rtx (mode); > + emit_set_insn (x2, gen_rtx_MULT (mode, x0, x0)); > + > + emit_insn ((*get_rsqrts_type (mode)) (x3, xsrc, x2)); > + > + emit_set_insn (x1, gen_rtx_MULT (mode, x0, x3)); > + x0 = x1; > +} > + > + emit_move_insn (dst, x0); > + return; Superflous return, drop it please. > +#undef TARGET_BUILTIN_RECIPROCAL > +#define TARGET_BUILTIN_RECIPROCAL aarch64_builtin_reciprocal > + The rest of these TARGET_* defines are in alphabetical order, please insert the new one in order. > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md > + > +mlow-precision-recip-sqrt > +Common Var(flag_mrecip_low_precision_sqrt) Optimization > +Run fewer approximation steps to reduce latency and precision. > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > +@item -mlow-precision-recip-sqrt > +@item -mno-low-precision-recip-sqrt > +@opindex
Re: [AArch64] Implement copysign[ds]f3
On 16 September 2015 at 08:40, James Greenhalghwrote: > > Hi, > > This patch adds expanders for copysigndf3 and copysignsf3 to the AArch64 > backend. These use the BSL/BIT/BIF insn to save us from the default > expansion pattern. > > Bootstrapped on aarch64-none-linux-gnu with no issues, and checked > the performance to show a slight improvement to FP routines using > copysign or SIGN from fortran. > > OK? > > Thanks, > James > > --- > gcc/ > > 2015-09-16 James Greenhalgh > > * config/aarch64/aarch64.md (copysigndf3): New. > (copysignsf3): Likewise. > > gcc/testsuite/ > > 2015-09-16 James Greenhalgh > > * gcc.target/aarch64/copysign_1.c: New. > * gcc.target/aarch64/copysign_2.c: New. > OK /Marcus
Re: [RFC AArch64][PR 63304] Handle literal pools for functions 1 MiB in size.
On 27 July 2015 at 15:33, Ramana Radhakrishnan ramana.radhakrish...@foss.arm.com wrote: DATE Ramana Radhakrishnan ramana.radhakrish...@arm.com PR target/63304 * config/aarch64/aarch64.c (aarch64_expand_mov_immediate): Handle nopcrelative_literal_loads. (aarch64_classify_address): Likewise. (aarch64_constant_pool_reload_icode): Define. (aarch64_secondary_reload): Handle secondary reloads for literal pools. (aarch64_override_options): Handle nopcrelative_literal_loads. (aarch64_classify_symbol): Handle nopcrelative_literal_loads. * config/aarch64/aarch64.md (aarch64_reload_movcpALLTF:modeP:mode): Define. (aarch64_reload_movcpVALL:modeP:mode): Likewise. * config/aarch64/aarch64.opt: New option mnopc-relative-literal-loads * config/aarch64/predicates.md (aarch64_constant_pool_symref): New predicate. * doc/invoke.texi (mnopc-relative-literal-loads): Document. This looks OK to me. It needs rebasing, but OK if the rebase is trival. Default on is fine. Hold off on the back ports for a couple of weeks. Cheers /Marcus
Re: [PATCH][AARCH64]Fix for branch offsets over 1 MiB
On 25 August 2015 at 14:12, Andre Vieira andre.simoesdiasvie...@arm.com wrote: gcc/ChangeLog: 2015-08-07 Ramana Radhakrishnan ramana.radhakrish...@arm.com Andre Vieira andre.simoesdiasvie...@arm.com * config/aarch64/aarch64.md (*condjump): Handle functions 1 Mib. (*cboptabmode1): Likewise. (*tboptabmode1): Likewise. (*cboptabmode1): Likewise. * config/aarch64/iterators.md (inv_cb): New code attribute. (inv_tb): Likewise. * config/aarch64/aarch64.c (aarch64_gen_far_branch): New. * config/aarch64/aarch64-protos.h (aarch64_gen_far_branch): New. gcc/testsuite/ChangeLog: 2015-08-07 Andre Vieira andre.simoesdiasvie...@arm.com * gcc.target/aarch64/long_branch_1.c: New test. OK /Marcus
Re: [AArch64][TLSLE][1/3] Add the option -mtls-size for AArch64
On 25 August 2015 at 15:15, Jiong Wang jiong.w...@arm.com wrote: 2015-08-25 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.opt (mtls-size): New entry. * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. (aarch64_override_options_internal): Call initialize_aarch64_tls_size. * doc/invoke.texi (AArch64 Options): Document -mtls-size. OK Thanks /Marcus
Re: [AArch64] [TLSIE][1/2] Rename test source file for reuse
On 19 June 2015 at 10:15, Jiong Wang jiong.w...@arm.com wrote: Rename test source from tlsle.c into tls.c for reuse purpose. tls.c will be used as test source file for all TLS test, we just need to specify different tls options in different testcases. 2015-06-19 Jiong Wang jiong.w...@arm.com gcc/testsuite/ * gcc.target/aarch64/tlsle.c: Rename to tls.c * gcc.target/aarch64/aarch64/tlsle12.c: Update source file name. * gcc.target/aarch64/aarch64/tlsle24.c: Ditto. * gcc.target/aarch64/aarch64/tlsle32.c: Ditto. OK Thanks /Marcus
Re: [AArch64] [TLSIE][2/2] Implement TLS IE for tiny model
On 19 June 2015 at 10:15, Jiong Wang jiong.w...@arm.com wrote: Currently, TLS IE is supported on small model only. This patch implement TLS Initial-exec model support for AArch64 tiny memory model. Under tiny model, we only allow 1M loadable segment size, one single ldr instruction is enough for addressing the got entry for TLS IE directly. The code sequence is: A: mrs tp, tpidr_el0 B0: ldr t0, :gottprel:x1 R_AARCH64_TLSIE_LD_GOTTPREL_PREL19 x1 B1: add t0, t0, tp B0 and B1 should not be scheduled, as the pattern will be recognized later for linker IE model to LE model optimization. 2015-06-19 Marcus Shawcroft marcus.shawcr...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (UNSPEC_GOTTINYTLS): New UNSPEC. (tlsie_tiny_mode): New define_insn. (tlsie_tiny_sidi): Ditto. * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Define SYMBOL_TINY_TLSIE. (aarch64_symbol_context): New comment for SYMBOL_TINY_TLSIE. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Support SYMBOL_TINY_TLSIE. (aarch64_expand_mov_immediate): Ditto. (aarch64_print_operand): Ditto. (arch64_classify_tls_symbol): Ditto. gcc/testsuite/ * gcc.target/aarch64/tlsie_tiny.c: New testcase. -- Regards, Jiong OK /Marcus
Re: [AArch64][TLSLE][1/3] Add the option -mtls-size for AArch64
On 19 August 2015 at 15:26, Jiong Wang jiong.w...@arm.com wrote: 2015-08-19 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.opt (mtls-size): New entry. * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. (aarch64_override_options_internal): Call initialize_aarch64_tls_size. * doc/invoke.texi (AArch64 Options): Document -mtls-size. -- Regards, Jiong +case AARCH64_CMODEL_TINY: + /* The maximum TLS size allowed under tiny is 1M. */ + if (aarch64_tls_size 20) + aarch64_tls_size = 20; The only valid values of aarch64_tls_size handled/expected by the remainder of the patch set is 12,24,32,48 so setting the value to 20 here doesn;t make sense. /Marcus
Re: [AArch64][TLSLE][2/3] Add the option -mtls-size for AArch64
2015-08-19 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename SYMBOL_TLSLE to SYMBOL_TLSLE24. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Likewise (aarch64_expand_mov_immediate): Likewise (aarch64_print_operand): Likewise (aarch64_classify_symbol): Likewise OK /Marcus
Re: [AArch64][TLSLE][3/3] Implement local executable mode for all memory model
2015-08-19 Marcus Shawcroft marcus.shawcr...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (initialize_aarch64_tls_size): Set default tls size for tiny, small, large memory model. (aarch64_load_symref_appropriately): Support new symbol types. (aarch64_expand_mov_immediate): Likewise. (aarch64_print_operand): Likewise. (aarch64_classify_tls_symbol): Likewise. * config/aarch64/aarch64-protos.h (aarch64_symbol_context): Likewise. (aarch64_symbol_type): Likewise. * config/aarch64/aarch64.md (tlsle): Deleted. (tlsle12_mode): New define_insn. (tlsle24_mode): Likewise. (tlsle32_mode): Likewise. (tlsle48_mode): Likewise. * doc/sourcebuild.texi (AArch64-specific attributes): Document aarch64_tlsle32. gcc/testsuite/ * lib/target-supports.exp (check_effective_target_aarch64_tlsle32): New test directive. * gcc.target/aarch64/tlsle_1.x: New test source. * gcc.target/aarch64/tlsle12.c: New testcase. * gcc.target/aarch64/tlsle24.c: New testcase. * gcc.target/aarch64/tlsle32.c: New testcase. -- OK /Marcus
Re: [AArch64] Break -mcpu tie between the compiler and assembler
On 20 August 2015 at 09:15, James Greenhalgh james.greenha...@arm.com wrote: 2015-08-19 James Greenhalgh james.greenha...@arm.com * common/config/aarch64/aarch64-common.c (AARCH64_CPU_NAME_LENGTH): Delete. (aarch64_option_extension): New. (all_extensions): Likewise. (processor_name_to_arch): Likewise. (arch_to_arch_name): Likewise. (all_cores): New. (all_architectures): Likewise. (aarch64_get_extension_string_for_isa_flags): Likewise. (aarch64_rewrite_selected_cpu): Change to rewrite CPU names to architecture names. * config/aarch64/aarch64-protos.h (aarch64_get_extension_string_for_isa_flags): New. * config/aarch64/aarch64.c (aarch64_print_extension): Delete. (aarch64_option_print): Get the string to print from aarch64_get_extension_string_for_isa_flags. (aarch64_declare_function_name): Likewise. * config/aarch64/aarch64.h (BIG_LITTLE_SPEC): Rename to... (MCPU_TO_MARCH_SPEC): This. (ASM_CPU_SPEC): Use it. (BIG_LITTLE_SPEC_FUNCTIONS): Rename to... (MCPU_TO_MARCH_SPEC_FUNCTIONS): ...This. (EXTRA_SPEC_FUNCTIONS): Use it. OK /Marcus
Re: [Patch] Add to the libgfortran/newlib bodge to detect ftruncate support in ARM/AArch64/SH
On 20 August 2015 at 09:31, James Greenhalgh james.greenha...@arm.com wrote: Hi, Steve's patch in 2013 [1] to fix the MIPS newlib/libgfortran build causes subtle issues for an ARM/AArch64 newlib/libgfortran build. The problem is that ARM/AArch64 (and SH) define a stub function for ftruncate, which we would previously have auto-detected, but which is not part of the hardwiring Steve added. Continuing the tradition of building bodge on bodge on bodge, this patch hardwires HAVE_FTRUNCATE on for ARM/AArch64/SH, which does fix the issue I was seeing. This is the second breakage I'm aware of due to the introduction of this hardwire code, the first being related to strtold. My recollection is that it is only the mips target that requires the newlib API hardwiring. Ideally we should rely only on the AC_CHECK_FUNCS_ONCE probe code and avoid the hardwire entirely. Perhaps a better approach for trunk would be something along the lines of: case ${host}--x${with_newlib} in mips*--xyes) hardwire_newlib=1;; esac if test ${hardwire_newlib:-0} -eq 1; then ... existing AC_DEFINES hardwire code else ... existing AC_CHECK_FUNCS_ONCE probe code fi In effect limiting the hardwire to just the target which is unable to probe. For backport to 4.9 and 5 I think James' more conservative patch is probably more appropriate. What do folks think? Cheers /Marcus
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
On 18 August 2015 at 10:25, Alex Velenko alex.vele...@arm.com wrote: On 31/07/15 12:04, Alex Velenko wrote: On 29/07/15 23:14, Jeff Law wrote: On 07/28/2015 12:18 PM, Alex Velenko wrote: On 21/04/15 06:27, Jeff Law wrote: On 04/20/2015 01:09 AM, Shiva Chen wrote: Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks. This looks pretty good. I expanded the comment for the new function a bit and renamed the function in an effort to clarify its purpose. From reviewing can_replace_by, it seems it should have been handling this case, but clearly wasn't due to implementation details. I then bootstrapped and regression tested the patch on x86_64-linux-gnu where it passed. I also instrumented that compiler to see how often this code triggers. During a bootstrap it triggers a couple hundred times (which is obviously a proxy for cross jumping improvements). So it's triggering regularly on x86_64, which is good. I also verified that this fixes BZ64916 for an arm-non-eabi toolchain configured with --with-arch=armv7. Installed on the trunk. No new testcase as it's covered by existing tests. Thanks,, jeff Hi, I see this patch been committed in r56 on trunk. Is it okay to port this to fsf-5? It's not a regression, so backporting it would be generally frowned upon. If you feel strongly about it, you should ask Jakub, Joseph or Richi (the release managers) for an exception to the general policy. jeff Hi Jakub, Can this commit be ported to fsf-5? It fixed gcc.target/arm/pr43920-2.c at the time, so I think it is a good idea to port. Please, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 Kind regards, Alex Ping! Currently this test is passed on fsf-trunk, but not passed on fsf-5, so I think it is a regression on fsf-5: That does not make it a regression, it is only a regression if a version prior to 5 passes, how does this test behave on 4.9? Cheers /Marcus
Re: ira.c update_equiv_regs patch causes gcc/testsuite/gcc.target/arm/pr43920-2.c regression
On 18 August 2015 at 10:25, Alex Velenko alex.vele...@arm.com wrote: On 31/07/15 12:04, Alex Velenko wrote: On 29/07/15 23:14, Jeff Law wrote: On 07/28/2015 12:18 PM, Alex Velenko wrote: On 21/04/15 06:27, Jeff Law wrote: On 04/20/2015 01:09 AM, Shiva Chen wrote: Hi, Jeff Thanks for your advice. can_replace_by.patch is the new patch to handle both cases. pr43920-2.c.244r.jump2.ori is the original jump2 rtl dump pr43920-2.c.244r.jump2.patch_can_replace_by is the jump2 rtl dump after patch can_replace_by.patch Could you help me to review the patch? Thanks. This looks pretty good. I expanded the comment for the new function a bit and renamed the function in an effort to clarify its purpose. From reviewing can_replace_by, it seems it should have been handling this case, but clearly wasn't due to implementation details. I then bootstrapped and regression tested the patch on x86_64-linux-gnu where it passed. I also instrumented that compiler to see how often this code triggers. During a bootstrap it triggers a couple hundred times (which is obviously a proxy for cross jumping improvements). So it's triggering regularly on x86_64, which is good. I also verified that this fixes BZ64916 for an arm-non-eabi toolchain configured with --with-arch=armv7. Installed on the trunk. No new testcase as it's covered by existing tests. Thanks,, jeff Hi, I see this patch been committed in r56 on trunk. Is it okay to port this to fsf-5? It's not a regression, so backporting it would be generally frowned upon. If you feel strongly about it, you should ask Jakub, Joseph or Richi (the release managers) for an exception to the general policy. jeff Hi Jakub, Can this commit be ported to fsf-5? It fixed gcc.target/arm/pr43920-2.c at the time, so I think it is a good idea to port. Please, see https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64916 Kind regards, Alex Ping! Currently this test is passed on fsf-trunk, but not passed on fsf-5, so I think it is a regression on fsf-5: That does not make it a regression, it is only a regression if a version prior to 5 passes, how does this test behave on 4.9? Cheers /Marcus
Re: [Aarch64] Adjust tests to take LSE extension into account.
On 18 August 2015 at 09:51, Matthew Wahab matthew.wa...@foss.arm.com wrote: gcc/testsuite 2015-08-18 Matthew Wahab matthew.wa...@arm.com * gcc.target/aarch64/atomic-comp-swap-release-acquire.c: Adjust dg-options to disable LSE extensions. * gcc.target/aarch64/atomic-op-acq_rel.c: Likewise. * gcc.target/aarch64/atomic-op-acquire.c: Likewise. * gcc.target/aarch64/atomic-op-char.c: Likewise. * gcc.target/aarch64/atomic-op-consume.c: Likewise. * gcc.target/aarch64/atomic-op-imm.c: Likewise. * gcc.target/aarch64/atomic-op-int.c: Likewise. * gcc.target/aarch64/atomic-op-long.c: Likewise. * gcc.target/aarch64/atomic-op-relaxed.c: Likewise. * gcc.target/aarch64/atomic-op-seq_cst.c: Likewise. * gcc.target/aarch64/atomic-op-release.c: Likewise. * gcc.target/aarch64/atomic-op-short.c: Likewise. * gcc.target/aarch64/sync-comp-swap.c: Likewise. * gcc.target/aarch64/sync-op-acquire.c: Likewise. * gcc.target/aarch64/sync-op-full.c: Likewise. * gcc.target/aarch64/sync-op-release.c: Likewise. OK /Marcus
Re: [PATCH][AARCH64]Add backend combine_bfi pattern.
On 5 August 2015 at 17:46, Renlin Li renlin...@arm.com wrote: Hi Kyrill, On 30/07/15 17:08, Kyrill Tkachov wrote: Hi Renlin, On 30/07/15 16:50, Renlin Li wrote: Hi all, This insn should match the following similar rtx pattern and remove the redundant zero_extend operation if the width of zero_extract and inner-size of zero_extend totally match. (set (zero_extract:SI (reg/i:SI 0 x0) (const_int 8 [0x8]) (const_int 0 [0])) (zero_extend:SI (reg:QI 1 x1 [ y ]))) aarch64-none-elf regression tests Okay. Okay to commit? Regards, Renlin gcc/ChangeLog: 2015-07-30 Renlin Li renlin...@arm.com * config/aarch64/aarch64.md (combine_bfi): New pattern. gcc/testsuite/ChangeLog: 2015-07-30 Renlin Li renlin...@arm.com * gcc.target/aarch64/combine-bfi.c: New. +(define_insn *combine_bfiGPI:modeALLX:mode + [(set (zero_extract:GPI (match_operand:GPI 0 register_operand +r) + (match_operand 1 const_int_operand n) + (match_operand 2 const_int_operand n)) + (zero_extend:GPI (match_operand:ALLX 3 register_operand r)))] + UINTVAL (operands[1]) == ALLX:sizen + bfi\\t%w0, %w3, %2, %1 + [(set_attr type bfm)] +) I notice we don't have any other patterns in aarch64 that start with combine_*. Would it be better to name them something like *aarch64_bfiGPI:modeALLX:mode4 instead? Thanks for the suggestion. I have adjust the patch accordingly. Regards, Renlin gcc/ChangeLog: 2015-08-05 Renlin Li renlin...@arm.com * config/aarch64/aarch64.md (*aarch64_bfiGPI:modeALLX:mode4): New pattern. gcc/testsuite/ChangeLog: 2015-08-05 Renlin Li renlin...@arm.com * gcc.target/aarch64/combine-bfi.c: New. For new test cases in this directory, follow the guidance given on the wiki here https://gcc.gnu.org/wiki/TestCaseWriting, specifically add the suffix _N, hence combine_bfi_1.c OK with that change. /Marcus
Re: [PATCH][AARCH64] Make arm_align_max_stack_pwr.c and arm_align_max_pwr.c compile testcase, instead of execution.
On 28 July 2015 at 16:51, Renlin Li renlin...@arm.com wrote: 2015-07-28 Renlin Li renlin...@arm.com * gcc.target/aarch64/arm_align_max_pwr.c: Make it a compile test case, check the assembly. * gcc.target/aarch64/arm_align_max_stack_pwr.c: Likewise. Hi, #include stdio.h #include assert.h Test cases should not rely on external headers, see https://gcc.gnu.org/wiki/HowToPrepareATestcase and it looks like neither of these headers are actually required so OK with the includes removed (and tested). Cheers /Marcus
Re: [PATCH][AArch64][8/14] Implement TARGET_OPTION_VALID_ATTRIBUTE_P
On 21 July 2015 at 16:37, James Greenhalgh james.greenha...@arm.com wrote: On Thu, Jul 16, 2015 at 04:20:59PM +0100, Kyrill Tkachov wrote: +static bool +aarch64_process_one_target_attr (char *arg_str, const char* pragma_or_attr) +{ + bool ret; + bool invert = false; + + int len = strlen (arg_str); + + if (len == 0) +{ + error (malformed target %s, pragma_or_attr); + return false; +} + + char *str_to_check = (char *) alloca (len + 1); Seems to go against your approach earlier in the patch series of XSTRDUP, it would be nice to stay consistent. Agreed, we should be consistent. Each of the other instances of alloca/xstrdup that I've seen in this patch series are actually superflous, we simply copy the string, use it then bin it, better to remove the dup completely. In the choice between xstrdup() and alloca(), since alloca() is an option, better to choose alloca() and remove the need for each of the explicit free() calls in the exit paths. The alloca() route makes it less likely that we accidentally introduce a space leak in the future. Cheers /Marcus
Re: [PATCH][AArch64] Fix LINUX_TARGET_LINK_SPEC to be consistent with ARM
On 22 July 2015 at 18:13, Szabolcs Nagy szabolcs.n...@arm.com wrote: Same as https://gcc.gnu.org/ml/gcc-patches/2015-04/msg01387.html but for AArch64. -dynamic-linker is only passed to the linker if !static !shared. -rdynamic handling is changed too to be consistent with arm: only pass -export-dynamic if !static. 2015-07-22 Szabolcs Nagy szabolcs.n...@arm.com PR target/65711 * config/aarch64/aarch64-linux.h (LINUX_TARGET_LINK_SPEC): Move -dynamic-linker within %{!static %{!shared, and -rdynamic within %{!static. OK and I think we should backport this to 5 and 4.9 Cheers /Marcus
Re: [PATCH][AArch64] elf toolchain does not pass -shared linker option
On 22 July 2015 at 18:22, Szabolcs Nagy szabolcs.n...@arm.com wrote: 2015-07-22 Szabolcs Nagy szabolcs.n...@arm.com * config/aarch64/aarch64-elf-raw.h (LINK_SPEC): Handle -h, -static, -shared, -symbolic, -rdynamic. OK, this should be back ported to 5 and 4.9 aswell. Thanks /Marcus
Re: [AArch64][1/2] Mark GOT related MEM rtx as const to help RTL loop IV
On 7 July 2015 at 13:33, Jiong Wang jiong.w...@arm.com wrote: 2015-07-06 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Mark mem as READONLY and NOTRAP for PIC symbol. gcc/testsuite/ * gcc.target/aarch64/got_mem_hoist.c: New test. Looks, OK to me. Follow the guidance on the wiki here https://gcc.gnu.org/wiki/TestCaseWriting when naming new test cases and add _1 suffix. Otherwise OK. /Marcus
Re: [PATCH][AArch64] PR target/66731 Fix fnmul insn with -frounding-math
On 6 July 2015 at 09:20, Szabolcs Nagy szabolcs.n...@arm.com wrote: 2015-07-06 Szabolcs Nagy szabolcs.n...@arm.com * gcc.target/aarch64/fnmul-1.c: New. * gcc.target/aarch64/fnmul-2.c: New. * gcc.target/aarch64/fnmul-3.c: New. * gcc.target/aarch64/fnmul-4.c: New. +float +foo_s (float a, float b) +{ + /* { dg-final { scan-assembler fnmul\\ts\[0-9\]+, s\[0-9\]+, s\[0-9\]+ } } */ + return -(a * b); +} Indentation should set at two spaces. /Marcus
Re: [Patch ARM-AArch64/testsuite Neon intrinsics: vget_lane
On 2 July 2015 at 14:44, Christophe Lyon christophe.l...@linaro.org wrote: Hi, Here is the missing test for ARM/AArch64 AdvSIMD intrinsic: vget_lane. Tested on arm, armeb, aarch64 and aarch64_be targets (using QEMU). The tests all pass, expect on armeb where vgetq_lane_s64 and vgetq_lane_u64 fail. I haven't investigated in details yet. OK for trunk? 2015-07-02 Christophe Lyon christophe.l...@linaro.org * gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: New testcase. OK /Marcus
Re: [AArch64][TLSLE][2/N] Rename tlsle_small to tlsle
On 20 May 2015 at 12:19, Jiong Wang jiong.w...@arm.com wrote: Similar to the rename from SYMBOL_SMALL_TPREL to SYMBOL_TLSLE, this patch rename the rtl pattern name. ok for trunk? 2015-05-19 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (tlsle_small): Rename to tlsle. (tlsle_small_mode): Rename to tlsle_mode. * config/aarc64/aarch64.c (aarch64_load_symref_appropriately): Use new pattern name. OK /Marcus
Re: [AArch64][TLSLE][3/N] Add UNSPEC_TLSLE
On 20 May 2015 at 12:21, Jiong Wang jiong.w...@arm.com wrote: Add new unspec name UNSPEC_TLSLE, use it for all tlsle pattern. ok for trunk? 2015-05-19 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.md (UNSPEC_TLSLE): New enumeration. (tlsle): Use new unspec name. (tlsle_mode): Ditto. OK /Marcus
Re: [Aarch64] Expand +rdma documentation, small changes to march and mcpu text.
On 22 June 2015 at 17:17, Matthew Wahab matthew.wa...@arm.com wrote: Hello, The documentation for the ARMv8.1 +rdma option doesn't mention that enabling it also implies enabling Adv.SIMD. This patch fixes that. The documentation for the -march and -mcpu options are also a little messy, this patch tries to make the text clearer and adds a (texinfo) link to the subsection documenting the feature modifiers. Tested by checking the html output. Ok for trunk? Matthew 2015-06-22 Matthew Wahab matthew.wa...@arm.com * doc/invoke.texi (Aarch64 Options, -march): Split out arch and feature description, split out the native option, add a link to the feature documentation, rearrange and slightly rewrite text. (Aarch64 options, -mcpu): Likewise. (Aarch64 options, Feature Modifiers): Add an anchor. Mention +rdma implies Adv. SIMD. OK /Marcus
Re: [AArch64][TLSLE][N/N] Implement local executable mode for all memory model
On 21 May 2015 at 17:49, Jiong Wang jiong.w...@arm.com wrote: 2015-05-14 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.c (aarch64_print_operand): Support tls_size. * config/aarch64/aarch64.md (tlsle): Choose proper instruction sequences. (tlsle_mode): New define_insn. (tlsle_movsym_mode): Ditto. * config/aarch64/constraints.md (Uta): New constraint. (Utb): Ditto. (Utc): Ditto. (Utd): Ditto. gcc/testsuite/ * gcc.target/aarch64/tlsle.c: New test source. * gcc.target/aarch64/tlsle12.c: New testcase. * gcc.target/aarch64/tlsle24.c: New testcase. * gcc.target/aarch64/tlsle32.c: New testcase. case SYMBOL_TLSLE: - asm_fprintf (asm_out_file, :tprel_lo12_nc:); + if (aarch64_tls_size = 12) +/* Make sure TLS offset fit into 12bit. */ +asm_fprintf (asm_out_file, :tprel_lo12:); + else +asm_fprintf (asm_out_file, :tprel_lo12_nc:); break; Use the existing classify_symbol mechanism we use throughout the aarch64 backend. Specifically rename SYMBOL_TLSLE as SYMBOL_TLSLE24 and introduce the 3 missing flavours then use the symbol classification to control behaviour such as this modifier selection. Cheers /Marcus
Re: [AArch64][TLSLE][1/N] Rename SYMBOL_SMALL_TPREL to SYMBOL_TLSLE
On 20 May 2015 at 11:56, Jiong Wang jiong.w...@arm.com wrote: 2015-05-19 Marcus Shawcroft marcus.shawcr...@arm.com Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64-protos.h (arch64_symbol_type): Rename SYMBOL_SMALL_TPREL to SYMBOL_TLSLE. (aarch64_symbol_context): Ditto. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Ditto. (aarch64_expand_mov_immediate): Ditto. (aarch64_print_operand): Ditto. (aarch64_classify_tls_symbol): Ditto. -- Regards, Jiong OK /Marcus
Re: [AArch64][TLSLE][5/N] Recognize -mtls-size
On 21 May 2015 at 17:44, Jiong Wang jiong.w...@arm.com wrote: This patch add -mtls-size option for AArch64. This option let user to do finer control on code generation for various TLS model on AArch64. For example, for TLS LE, user can specify smaller tls-size, for example 4K which is quite usual, to let AArch64 backend generate more efficient instruction sequences. Currently, -mtls-size accept all integer, then will translate it into 12(4K), 24(16M), 32(4G), 48(256TB) based on the value. no functional change. ok for trunk? 2015-05-20 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64.opt (mtls-size): New entry. * config/aarch64/aarch64.c (initialize_aarch64_tls_size): New function. * doc/invoke.texi (AArch64 Options): Document -mtls-size. +mtls-size= +Target RejectNegative Joined UInteger Var(aarch64_tls_size) Init(24) +Specifies size of the TLS data area, default size is 16M. Accept any integer, but the value +will be transformed into 12(4K), 24(16M), 32(4G), 48(256TB) + Can we follow the mechanism used by rs6000 and limit the accepted values here using an Enum to just the valid values: 12, 24, 32, 48? +@item -mtls-size=@var{size} +@opindex mtls-size +Specify the size of TLS area. You can specify smaller value to get better code +generation for TLS variable access. Currently, we accept any integer, but will +turn them into 12(4K), 24(16M), 32(4G), 48(256TB) according to the integer +value. + How about: Specify bit size of immediate TLS offsets. Valid values are 12, 24, 32, 48. Thanks /Marcus
Re: [Patch AArch64 3/4] De-const-ify struct tune_params
On 23 June 2015 at 09:49, James Greenhalgh james.greenha...@arm.com wrote: Hi, If we want to overwrite parts of this structure, we're going to need it to be more malleable than it is presently. Run through and remove const from each of the members, create a non-const tuning structure we can modify, and set aarch64_tune_params to always point to this new structure. Change the -mtune parsing code to take a copy of the tuning structure in use rather than just taking the reference from within the processor struct. Change all the current users of aarch64_tune_params which no longer need to dereference a pointer. Checked on aarch64-none-linux-gnueabi with no issues. OK? Thanks, James --- 2015-06-23 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64-protos.h (tune_params): Remove const from members. (aarch64_tune_params): Remove const, change to no longer be a pointer. * config/aarch64/aarch64.c (aarch64_tune_params): Remove const, change to no longer be a pointer, initialize to generic_tunings. (aarch64_min_divisions_for_recip_mul): Change dereference of aarch64_tune_params to member access. (aarch64_reassociation_width): Likewise. (aarch64_rtx_mult_cost): Likewise. (aarch64_address_cost): Likewise. (aarch64_branch_cost): Likewise. (aarch64_rtx_costs): Likewise. (aarch64_register_move_cost): Likewise. (aarch64_memory_move_cost): Likewise. (aarch64_sched_issue_rate): Likewise. (aarch64_builtin_vectorization_cost): Likewise. (aarch64_override_options): Take a copy of the selected tuning struct in to aarch64_tune_params, rather than just setting a pointer, change dereferences of aarch64_tune_params to member accesses. (aarch64_override_options_after_change): Change dereferences of aarch64_tune_params to member access. (aarch64_macro_fusion_p): Likewise. (aarch_macro_fusion_pair_p): Likewise. * config/aarch64/cortex-a57-fma-steering.c (gate): Likewise. OK /Marcus
Re: [AArch64][1/2] Rename SYMBOL_SMALL_GOT to SYMBOL_SMALL_GOT_4G
On 26 June 2015 at 10:23, Jiong Wang jiong.w...@arm.com wrote: OK. Reworked this patch. Removed those redundant memory model check by adding new symbol classification. Patch splitted into two: * [1/2] Rename SYMBOL_SMALL_GOT to SYMBOL_SMALL_GOT_4G * [2/2] Implement -fpic for -mcmodel=small This is the first patch which only renmae SYMBOL_SMALL_GOT into SYMBOL_SMALL_GOT_4G. Please review, Thanks. 2015-06-26 Jiong Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64-protos.h (aarch64_symbol_type): Rename SYMBOL_SMALL_GOT to SYMBOL_SMALL_GOT_4G. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Ditto. (aarch64_expand_mov_immediate): Ditto. (aarch64_print_operand): Ditto. (aarch64_classify_symbol): Ditto. OK, thanks Jiong. /Marcus
Re: [AArch64][2/2] Implement -fpic for -mcmodel=small
On 26 June 2015 at 10:32, Jiong Wang jiong.w...@arm.com wrote: This patch respin https://gcc.gnu.org/ml/gcc-patches/2015-05/msg01804.html. A new symbol classification SYMBOL_SMALL_GOT_28K added to represent symbol which needs go through GOT table and it's under -fpic/-mcmodel-small. the _28K suffix can reflect the symbol's attribute better, and by introducing this new symbol type, we could avoid checking aarch64_cmodel at some extent though still needs the check somewhere. All other code logic not changed. OK for trunk? Thanks. 2015-06-26 Jiong. Wang jiong.w...@arm.com gcc/ * config/aarch64/aarch64-protos.h (aarch64_symbol_type): New type SYMBOL_SMALL_GOT_28K. * config/aarch64/aarch64.md: (ldr_got_small_mode): Support new GOT relocation modifiers. (unspec): New enum UNSPEC_GOTMALLPIC28K. (ldr_got_small_28k_mode): New. (ldr_got_small_28k_sidi): New. * config/aarch64/iterators.md (got_modifier): New mode iterator. * config/aarch64/aarch64-otps.h (aarch64_code_model): New model. * config/aarch64/aarch64.c (aarch64_load_symref_appropriately): Support SYMBOL_SMALL_GOT_28K. (aarch64_rtx_costs): Add costs for new instruction sequences. (initialize_aarch64_code_model): Initialize new model. (aarch64_classify_symbol): Recognize new model and new symbol classification. (aarch64_asm_preferred_eh_data_format): Support new model. (aarch64_load_symref_appropriately): Generate new instruction sequences for -fpic. (TARGET_USE_PSEUDO_PIC_REG): New definition. (aarch64_use_pseudo_pic_reg): New function. gcc/testsuite/ * gcc.target/aarch64/pic-small.c: New testcase. OK, Thanks Jiong. Could you prepare a NEWS entry for this change? Cheers /Marcus
Re: [Patch AArch64 4/4] Add -moverride tuning command, and wire it up for control of fusion and fma-steering
On 23 June 2015 at 09:49, James Greenhalgh james.greenha...@arm.com wrote: Hi, This final patch adds support for the new command line option -moverride. The purpose of this command line is to allow expert-level users of the compiler, and those comfortable with experimenting with the compiler, *unsupported* full access to the tuning structures used in the AArch64 back-end. For now, we only enable command-line access to the fusion pairs to enable and whether or not to use the Cortex-A57 FMA register renaming pass. Though in future we can expand this further. With this patch, you might write something like: -moverride=fuse=adrp+add.cmp+branch:tune=rename_fma_regs To enable fusion of adrp+add and cmp+branch and to enable the fma-rename pass. I've bootstrapped and tested the patch set on aarch64-none-linux-gnu with BOOT_CFLAGS set to the example string above, and again in the standard configuration with no issues. OK? Thanks, James --- 2015-06-23 James Greenhalgh james.greenha...@arm.com * config/aarch64/aarch64.opt: (override): New. * doc/invoke.texi (override): Document. * config/aarch64/aarch64.c (aarch64_flag_desc): New (aarch64_fusible_pairs): Likewise. (aarch64_tuning_flags): Likewise. (aarch64_tuning_override_function): Likewise. (aarch64_tuning_override_functions): Likewise. (aarch64_parse_one_option_token): Likewise. (aarch64_parse_boolean_options): Likewise. (aarch64_parse_fuse_string): Likewise. (aarch64_parse_tune_string): Likewise. (aarch64_parse_one_override_token): Likewise. (aarch64_parse_override_string): Likewise. (aarch64_override_options): Parse the -override string if it is present. +static const struct aarch64_tuning_override_function + aarch64_tuning_override_functions[] = The indentation looks odd here, but otherwise OK /Marcus